lichess.org
Donate

Opinion on Markovian model, which attempts to answer who is the best player in history?

According to the Study (http://content.iospress.com/articles/icga-journal/icg0012) (Warning: Math intensive), the best players in history are

1. Magnus Carlsen
2. Vladimir Kramnik
3. Bobby Fischer
4. Garry Kasparov
5. Viswanathan Anand

(Carlsen, Kramnik and Fischer were extremely close, with a slightly larger margin between Fischer and Kasparov)

and the 5 "worst" world champions were

1. Wilhelm Steinitz (By far)
2. Alexander Alekhine
3. Max Euwe
4. Mikhail Tal
5. Jose Raul Capablanca

I, as a Fischer fan was disappointed to Fischer at 3rd ((although, his best years were ahead of him, all of these players, other than Fischer and Carlsen peaked in their mid 30s, or later. Also he had to pay against 5 world champions regularly and 3-4 other world champion caliber players (Keres, Korchnoi, Larsen, Geller), as well as the fact he had no coaching, and did it all by himself and did not have access to engines, so in my mind, he's still the best)), but objectively speaking this does make sense. Another somewhat surprising assertion was that Kasparov played his best world championship match against Kramnik, bur Kramnik was simply better. This model was also accurate in predicting how matches would turn out between players that did play.

Thoughts?
I'm not sure I like this study.

I did not read it very closely, but it seems to propose a more complicated version of using centipawn loss to measure playing strength. I'm not sure this complexity actually adds anything. For example, it doesn't seem to be any better than ELO at predicting World Championship results, and I couldn't find a convincing validation of the measure. There's no inherent reason a Markov chain should give more accurate results. (I'm happy to be corrected on this, because as I said, I only skimmed the paper.)

The standard criticisms of using comparison with "perfect play" to judge players also apply.

Ah, this old thing. It was mentioned on talkchess shortly after being published.

There are a couple things about it that are unfortunate, though in general it is quite an interesting work.

First, this (on the strength of their SF, estimated from a bunch of guesswork around processor speeds and SSDF and CCRL ratings):

"The question of whether this 3150 rating, which has only be computed through games with other computer programs, is comparable to the ratings of human players is not easy to answer."

Actually it is. They are not comparable, full stop.

Then this (I fully admit, I'm being silly and pointing out a typo, but I find it funny):

"In 2014, Hiraku Nakamura (2800 ELO) played two games against a “crippled” STOCKFISH  (no opening database and no endgame tablebase) with white and pawn odds, lost one game and drew the other."

I don't know who Hiraku Nakamura is :)

Aside from some other typos I found this study quite interesting and a solid piece of work when I read it a while back, and I maintain that opinion now.

I do think it substantially improves on the similar work that had been done in the past, even beyond the obvious improvement of a much better engine on better hardware.

Even thought some of their metrics don't necessarily outperform ELO for some matches, even those metrics also don't underperform ELO, and that's actually not insignificant.

Being able to achieve similar predictive results as ELO just from analysis of positions and moves is actually quite nice; it means there's promise for improved versions of such an approach to assess strength more quickly than traditional rating systems, because of the much increased data set (instead of just one data point per game, you have many).

Something similar was shown a long time ago for games in which computers more quickly became stronger than humans, like Scrabble, where evaluations of player strength converged much more quickly with this kind of "deviation from best play" analysis because you get so much more than 1 data point per game.

On that note, the Markovian predictor actually did do noticeably better than ELO on the world championship matches (average off by 2.8 percentage points instead of 5 percentage points on the 11 with ELO predictions).

I did think they went a bit overboard on testing some of the tactical complexity indicators, as some of those results could have been inferred on general grounds, without all the computational effort (most notably, time-to-depth in a particular position is much more an effect of whether an engine guesses correctly on its initial bounds, and thus how much it fails low and high, than on how tactically complex the position is).

Still, empirical confirmation is nice, so this is a very minor complaint.

In general, though, I do think that whole piece is a bit misguided.

Engines and humans take very different approaches to positions, so what is complicated for a human may be incredibly simple by any metric for an engine.

Composed-very-hard-mate-in-2 puzzles are a good example of this. Any decent engine will find the mates immediately with no indication that the position is especially complex, but some of those are very devilish for even extremely strong humans.

Engines and human brains just approach chess rather differently :)

So, for complexity-for-humans, I think we'd have to take a completely different approach to measuring; using an existing chess engine is incredibly unlikely to ever result in a decent correlation.

@BigGreenShrek

As to the added complexity, I think the general idea is actually quite nice.

The idea is that you're losing a lot of information if you just look at every move and say "Average centipawn loss is this much". It makes some sense that some players might make bigger mistakes in better positions (overconfidence, overaggresion, or something) and other players might play especially well in slightly worse positions (tenacious defenders and the like), or other interesting combinations.

This extra information might prove useful, and at least in this case, given that the Markovian predictor generally outperformed the other conformance indicators, it seems that intuition is correct.

Now, there may be other unexplored approaches that are simpler and better, but that's the nice thing about such research.

It might inspire you to come up with an even better idea :)

@a_pleasant_illusion I have nothing against complexity if it serves a useful purpose. It's a nice idea. I'd just like to see more rigorous validation.

Further, I'm not sure the "margin of error" here is really small enough to be confident in the precise ranking given by the paper. I'm happy to agree that modern players are generally stronger than older ones, but this is fairly obvious as chess theory has advanced enormously in the last hundred years.
#5 "In general, though, I do think that whole piece is a bit misguided."

That's quite generous! That paper makes audacious claims such as "The database is probably the weakest point of this study" when their choice of data model is dubious (see table 2).
@BigGreenShrek

Sure, some of the conclusions with which their model agrees are fairly uncontroversial.

That can hardly be a knock against the system, though :)

Testing a new predictive model against generally accepted truths is not a bad start.

Again, it's not so much the sorts of conclusions about chess players they reach with the model (Carlsen, Kasparov, Kramnik, and Capablanca at their peaks were really, really, really strong, and players got better over time) that are interesting, as it is the fact that they have an automated analysis system that gives these fairly reasonable results.

With enough effort and improvement, such an automated analysis system could prove very useful.

It's also true that, as you say, we simply don't have enough data to confidently answer the question "Who really was the best ever?"; inferences about players who never played each other at their peaks from other evidence will almost always fall short of conclusive (an exception might be something like Carlsen vs Steinitz...I think I know how that goes :) )

Of course, with as few games as human players play, even when players play each other at their peaks, results are usually within any reasonable error bars.

Humans just don't play enough games to reliably discriminate between players whose "true" ratings over an infinite number of games might converge at something like 20 points apart, especially given the fact that the strength of a human player changes so often.

This ties back into the other point I've repeatedly mentioned.

The point of this particular paper, despite the clickbait-style title to grab attention, is less about figuring out who "really" was the best chess player than it is about showing that there are ways to do automated analysis of games that have at least as much predictive power as rating.

That, and not the more particular claims with which their findings are consistent (chess players now are better than 60 years ago, Kasparov was better than Petrosian, etc.) is the key finding.

Viewed in that light I think the article is quite useful and interesting.

Viewed merely as an attempt to answer the question they chose as a title, I agree that it's far from convincing :)
@Toadofsky

I definitely agree there's likely a lot of room for improvement.

I'm not sure I see anything glaring about Table 2 that suggests a big weakness; what's the issue you detect there?
Each player has distinct strengths and weaknesses that cannot be captured using a bunch of matrix vector operations.

This topic has been archived and can no longer be replied to.