OMFG! STOCKFISH NOOO! ( AlphaZero vs Stockfish ) Thoughts

All the talk all over the web about equal or level playing fields is missing the point.

The point is that there is this approach to chess and other games, neural nets for policy and eval, that hasn't worked very well for chess in the past (Matthew Lai's Giraffe was the best, but it was close to a thousand points behind the best engines).

Many people were skeptical that the approach could ever work for chess, just in principle because of how different the game was than Go or other games for which the approach had some moderate success.

What this shows is that the approach can do quite well indeed in chess.

Sure, achieving this level of strength required Google's TPUs, special purpose hardware for neural network inference, but the point is that the approach can actually work.

What's more, using Google's TPUs (for playing, where it only used 4) is not quite the same as using some monstrous cluster of ridiculous hardware.

Google's TPUs actually don't use a lot of electricity, so by one metric, power utilization, SF and AlphaZero were actually on a roughly even playing field anyway.

It's not AlphaZero's fault that SF was written to run on general-purpose CPUs that are less efficient for what SF does than TPUs are for what AlphaZero does :)

Again, quibbling about things like the time control, opening books, hardware, hash settings, etc. just misses the point.

Even if AlphaZero had just gotten within a couple hundred points of SF it would have been an amazing result for the reasons describe earlier; the whole neural network paradigm was suspected by many to simply be inapt for chess, and it turns out that it can actually work quite well.

On the one hand, sure, it's not like AlphaZero played like God (it performed somewhere in the neighborhood of 65-100 points over SF; I don't think in two years when SF has improved itself by 100 points anyone was going to proclaim the death of chess) and we shouldn't get too worshipful of it; on the other hand, we shouldn't get overly defensive about current engines like SF and look for excuses.

It's an impressive result that indicates neural nets not only can work well for chess (again, this was in doubt previously), but that they can do so self-trained with relatively little electricity and time invested (the amount of electricity used by the 5,000 TPUs for training is still a LOT less than that used by all the computers in the SF testing framework over the last few years).

Having said that, the paper shows that AlphaZero's improvement in chess flattened out really quickly, so some of the intuitions about the approach being less apt for chess than for a game like Go might hold some water. There certainly wasn't any indication that it was going to get measurably stronger if it trained for another couple days; most of its improvement came in what was probably the first 75 minutes or so of self-play and training.

Of course, this was just a first try with a very general-purpose approach; it's quite possible that with improvements to the approach it could do even better, but I'm not sure we'll get to see such an attempt.

Let's just take it for what it is, and await the release of the full paper :)

notfast102

#12

If they was both set up equally why not put that in the statement? Can anyone here reading this tell me they don't find this a little suspicious?

SlowSlug

#13

a_pleasant_illusion +1

To me, the news is pretty exciting, as there is some indication (in the games), that the proof-of-concept did work out fine. And some lessons can be learnt. (see www.twitch.tv/videos/207499168 Video by Jon-Ludwig Hammer)

Apart from that, i notice strong emotional reactions across the enthousiasts (for different reasons though), which makes it difficult to engage into serious considerations. And the fact, that DeepMind (a.k.a. Google) did not release the full information does add to the problem considerably. Google uses to be very much in the business of making money from hiding information (a.k.a. to only show, what it intends to let us know), which might raise suspicions about this being an intentional marketing campaing to raise stock prices. And we (the population of this planet) are involving ourselves by trusting our own dreams about AI more than clearly understanding the reported facts. Very difficult to stay clear on the issue!

...just saying... Try to keep some awareness of our inner motions triggered by that insufficient piece of information, including the interpretations, wishes, fears, and so on.

petri999

#14

#12 In paper the tell exactly what was used. No problem there . And beating stockfish was not the news but fact that they could generalize the AlphaGoZero trainign to other games. Google is NOT interested in chess. That is just an another way to measure the actual innovation and then move on to something that matters

JoseSilva edited

#15

+1 @a_pleasant_illusion, @petri999

JMTromp

#16

i like how it play's do,. seen some games of alpha and it i think the thinking type that chess players need to control the 4 centrum squares is a little out dated,..

petri999

#17

It is outdated if you have a positional understanding of AplhaZero together with calculation precision. For rest of the folks mimicin that style may result in disaster.

no_bullet_thanks

#18

yep @a_pleasant_illusion nice post, however regarding the hardware, i dont think thats fair, it is also no Stockfishs fault that he has been designed by humans to run on hardware which is not that great.

A fair comparision of the chess knowledge is only if such factors get substracted as far as possible.

Which is btw why i also think that the human-computer matches played so far were unfair. The human had no access to opening books, endgame tablebases, he could not analyze on a board, he got tired.

a_pleasant_illusion edited

#19

@no_bullet_thanks

Well, that's its own thorny kettle of fish, to mix all sorts of metaphors :)

Humans, especially stronger ones, have a lot of stuff memorized, including openings and theoretical endings.

Goodness knows I don't figure out Philidor or Lucena positions from scratch every time; I just remember the solution :)

The thing that's different is not the mere fact that engines remember these things so much as it is that they do it a lot more reliably and at a scale we can't.

That's thorny, though, because there can be substantial differences in ability between humans on these things as well, but we don't have any problems just rolling that into our general measure of playing ability.

You simply can't control for some memory-independent, hardware-independent chess skill. That's not how anything, humans included, work.

Some things you can control, like on engine rating lists using the same number of cores, same hash, same opening book, etc., but once you start comparing between radically different approaches that can't even use the same hardware (AlphaZero vs SF, engines vs humans), controlling for all that's just impossible.

For me it's much simpler and more interesting to just compare the strength of the moves generated and ignore all the other stuff.

After all, if engines were sentient they could complain that our brains allow us to unfairly do some massive and sophisticated pattern matching that CPUs are incredibly ill-equipped to do :)

Once radically different approaches and hardware are in play, it's a lot easier to just compare the whole package.

AlphaZero on 4 TPUs plays stronger chess at 1 min/move than SF on 64 cores with 1 GB hash at 1 min/move.

Questions about whether those conditions are "fair" don't even really have objectively correct answers; there aren't any unproblematic metrics to compare them.

The other interesting point on the specific AlphaZero vs SF hardware question is that chess engines simply do not scale well at all past the 32-64 core range. Even if you let SF use however many cores someone supposes is fair, it's highly unlikely the result changes, just because SF is going to get negligible strength increases past 64 cores.

Just my $0.02, of course. There are nearly as many opinions on these things as there are people, I'd wager :)

no_bullet_thanks

#20

This smells a bit like throwing sand in the light to me. Yes, it is not trivial to do so, but people should at least try to get equal conditions, otherwise you can say, Porsches are better than Humans at 100 meter sprint because lets just look at the whole package...

This topic has been archived and can no longer be replied to.