It's not unfair that Stockfish and all other traditional chess engines are designed in a way that makes it very hard to take effective advantage of massive parallelism so they have to stick with CPUs. MTCS instead of alpha-beta and (what must be) an incredibly expensive evaluation function are the reason AlphaZero is able to use the TPUs effectively, it'd be ridiculous to give Stockfish thousands of cores and say that the systems are equally powerful - at that point, the Stockfish cluster is far more expensive and uses far more power than the 4 TPUs.
Also, the CPU performance numbers in #14 are off. My dual-core laptop can do 198.4 GFLOPS. Skylake-SP can do twice the flops per clock cycle, and Intel's biggest chip, the Xeon Platinum 8180M, has 28 cores at just over 200 GFLOPS each, for a total of 5.7344 TFLOPS (assuming it can keep running at the maximum all-core turbo speed, which is admittedly unlikely. 4.48 TFLOPS if running at stock speed).
But even these numbers are far off from what Stockfish is actually capable of using. It doesn't use floating point - there's no FMA instruction for integers, you can halve that FLOPS number right away. It operates on 64-bit values rather than 32-bit, another halving (though not an actual reduction on the amount of computation done - also note that the TPU's quoted FLOPs appear to be 16-bit operations, you might as well halve those before comparing to 32-bit flops in the first place). It doesn't use the vector registers - divide by at least 4 (512 bits' worth of ops * 2 execution ports -> 64-bit ops * at most 4 ports).
My best guess is that you could have both programs doing about the same amount of pure number-crunching per second if you gave Stockfish about 2000 cores. But this system would be far more expensive and draw far more power than the one used by AlphaZero, so this time Stockfish would clearly be the gorilla to AlphaZero's mouse.
Also, the CPU performance numbers in #14 are off. My dual-core laptop can do 198.4 GFLOPS. Skylake-SP can do twice the flops per clock cycle, and Intel's biggest chip, the Xeon Platinum 8180M, has 28 cores at just over 200 GFLOPS each, for a total of 5.7344 TFLOPS (assuming it can keep running at the maximum all-core turbo speed, which is admittedly unlikely. 4.48 TFLOPS if running at stock speed).
But even these numbers are far off from what Stockfish is actually capable of using. It doesn't use floating point - there's no FMA instruction for integers, you can halve that FLOPS number right away. It operates on 64-bit values rather than 32-bit, another halving (though not an actual reduction on the amount of computation done - also note that the TPU's quoted FLOPs appear to be 16-bit operations, you might as well halve those before comparing to 32-bit flops in the first place). It doesn't use the vector registers - divide by at least 4 (512 bits' worth of ops * 2 execution ports -> 64-bit ops * at most 4 ports).
My best guess is that you could have both programs doing about the same amount of pure number-crunching per second if you gave Stockfish about 2000 cores. But this system would be far more expensive and draw far more power than the one used by AlphaZero, so this time Stockfish would clearly be the gorilla to AlphaZero's mouse.