- Blind mode tutorial
lichess.org
Donate

ELO rating

@corvusmellori said in #10:

For example, if A scores 60% against B, and B scores 60% against C, then A had better score 84% against C.

I get what you mean, but your numbers aren't even close to correct. If A is 70 points stronger (expected score: 59.9%) than B and B is 70 points stronger than C, A is assumed to score against C like he's 140 points stronger: 69%. To score 84% against somebody, you'd have to be 241 points stronger than them.

@corvusmellori said in #10: > For example, if A scores 60% against B, and B scores 60% against C, then A had better score 84% against C. I get what you mean, but your numbers aren't even close to correct. If A is 70 points stronger (expected score: 59.9%) than B and B is 70 points stronger than C, A is assumed to score against C like he's 140 points stronger: 69%. To score 84% against somebody, you'd have to be 241 points stronger than them.

@PurpleInferno said in #11:

... If A is 70 points stronger (expected score: 59.9%) than B and ...
Where does one obtain these sorts of numbers?

@PurpleInferno said in #11: > ... If A is 70 points stronger (expected score: 59.9%) than B and ... Where does one obtain these sorts of numbers?

@tpr said in #6:

#3
"I'd be surprised if they invented a 4000 Elo bot that Stockfish / a 3500 bot was not able to draw against at least 10%"

  • Such a bot will emerge in the next 40 years.

Nah, I'm confident if you give me a small team of GM researchers and a 2025 stockfish engine, I can draw with white against God himself at least one in every four games, over a long enough match.

But speculation is cheap, admittedly

@tpr said in #6: > #3 > "I'd be surprised if they invented a 4000 Elo bot that Stockfish / a 3500 bot was not able to draw against at least 10%" > * Such a bot will emerge in the next 40 years. Nah, I'm confident if you give me a small team of GM researchers and a 2025 stockfish engine, I can draw with white against God himself at least one in every four games, over a long enough match. But speculation is cheap, admittedly

@kindaspongey said in #12:

Where does one obtain these sorts of numbers?

https://chess.stackexchange.com/q/19653/672 has some explanations, or the wikipedia page on Elo also has the formula and table https://en.wikipedia.org/wiki/Elo_rating_system

@kindaspongey said in #12: > Where does one obtain these sorts of numbers? https://chess.stackexchange.com/q/19653/672 has some explanations, or the wikipedia page on Elo also has the formula and table https://en.wikipedia.org/wiki/Elo_rating_system

#11
"transitive"

  • No, the rating system does not assume transitivity. It only assumes a Gaussian spread of playing strength.
#11 "transitive" * No, the rating system does not assume transitivity. It only assumes a Gaussian spread of playing strength.

Elo is not a fixed measure but a relative one.

It can compare two data points for relative strength difference, but not fixed strength as the rating pool is floating (can shift between generations).

This is why you can't compare current Elo ratings to Elo ratings 50 years ago.

There is no maximum as the rating pool can change over time through rating inflation.

Elo is not a fixed measure but a relative one. It can compare two data points for relative strength difference, but not fixed strength as the rating pool is floating (can shift between generations). This is why you can't compare current Elo ratings to Elo ratings 50 years ago. There is no maximum as the rating pool can change over time through rating inflation.

@tpr said in #15:

#11
"transitive"

  • No, the rating system does not assume transitivity. It only assumes a Gaussian spread of playing strength.
    No it works just fine regardless of distribution. . Is assumes logistic distribution of outcomes relative difference 10^rating values of players.

This is true enough for differences between player who are likely to meet each other in tournament. This sort implicitly proven by fact that rating do predict who wins more oftern but rigorous testing has never been done - well some pro gamblers use Elo style ratings but details what they actually do are not public information. How well works between say 2800 and 1400 is fairly hard to say as it quite hard to measure. But with FIDE addition of this 400 diff rule it would not work on that big difference

@tpr said in #15: > #11 > "transitive" > * No, the rating system does not assume transitivity. It only assumes a Gaussian spread of playing strength. No it works just fine regardless of distribution. . Is assumes logistic distribution of outcomes relative difference 10^rating values of players. This is true enough for differences between player who are likely to meet each other in tournament. This sort implicitly proven by fact that rating do predict who wins more oftern but rigorous testing has never been done - well some pro gamblers use Elo style ratings but details what they actually do are not public information. How well works between say 2800 and 1400 is fairly hard to say as it quite hard to measure. But with FIDE addition of this 400 diff rule it would not work on that big difference

@kindaspongey said in #12:

Where does one obtain these sorts of numbers?
The wikipedia page linked in comment #14 contains the formula for relation between rating difference and expected mean result. For practical purpose and quick reference you can use the tables from FIDE Rating Regulations (https://handbook.fide.com/chapter/B022024), articles 8.1.1 and 8.1.2, which are used for calculation of rating updates and rating performance.

(For the sake of completeness, Glicko-2 used by lichess is much more complex than ELO but it is based on the same model for relation between rating difference and expected mean result.)

@kindaspongey said in #12: > Where does one obtain these sorts of numbers? The wikipedia page linked in comment #14 contains the formula for relation between rating difference and expected mean result. For practical purpose and quick reference you can use the tables from FIDE Rating Regulations (https://handbook.fide.com/chapter/B022024), articles 8.1.1 and 8.1.2, which are used for calculation of rating updates and rating performance. (For the sake of completeness, Glicko-2 used by lichess is much more complex than ELO but it is based on the same model for relation between rating difference and expected mean result.)

"Glicko-2 used by lichess is much more complex than ELO"
Elo assumes the rating deviation the same for all players. That was to simplify calculations as in 1970 there was no computer power like now and many calculations were done by hand.
Elo, Glicko-1 and Glicko-2 are all simplifications of the Kalman filter.
https://en.wikipedia.org/wiki/Kalman_filter

"Glicko-2 used by lichess is much more complex than ELO" Elo assumes the rating deviation the same for all players. That was to simplify calculations as in 1970 there was no computer power like now and many calculations were done by hand. Elo, Glicko-1 and Glicko-2 are all simplifications of the Kalman filter. https://en.wikipedia.org/wiki/Kalman_filter

@corvusmellori and @petri999 already touched the core issue with answering the question: The curve connecting the expected score to the Elo difference which is the basis for all the algorithms (Elo, Glicko, etc) starts breaking down somewhere above 70% expected draw rate in equal matchups, and stops any resemblance to the smooth logistic function around 90% draw rate due to the high prevalence of draws. This is also the reason why TCEC and other competitions for modern engines have switched to using unbalanced openings to give 50% W/D chances.

Basically: Since the regular starting position is balanced with a minor white advantage of +0.2, it takes a lot of cumulated inaccuracies to reach/cross the +1.0 or -1.0 boundaries, and a modern engine on good hardware simply doesn't make enough mistakes for that to happen regularly even if the opponent doesn't make any mistakes at all.

Now, to the original question: if we consider "Elo" to be at classical time control and decent hardware from the regular starting position, the Elo ceiling of perfect play is somewhere around 3400, and there aren't even 20 Elo to gain over current Stockfish (which also hasn't gained more than 5 Elo since SF14 either under these conditions.)

However, if we consider "Elo" to be measured in game pairs from 50% W/D positions as it is being done in TCEC, there is no theoretical limit, it just becomes increasingly harder to find suitable openings where the two engines don't just agree on a win or a draw.

@corvusmellori and @petri999 already touched the core issue with answering the question: The curve connecting the expected score to the Elo difference which is the basis for all the algorithms (Elo, Glicko, etc) starts breaking down somewhere above 70% expected draw rate in equal matchups, and stops any resemblance to the smooth logistic function around 90% draw rate due to the high prevalence of draws. This is also the reason why TCEC and other competitions for modern engines have switched to using unbalanced openings to give 50% W/D chances. Basically: Since the regular starting position is balanced with a minor white advantage of +0.2, it takes a lot of cumulated inaccuracies to reach/cross the +1.0 or -1.0 boundaries, and a modern engine on good hardware simply doesn't make enough mistakes for that to happen regularly even if the opponent doesn't make any mistakes at all. Now, to the original question: if we consider "Elo" to be at classical time control and decent hardware from the regular starting position, the Elo ceiling of perfect play is somewhere around 3400, and there aren't even 20 Elo to gain over current Stockfish (which also hasn't gained more than 5 Elo since SF14 either under these conditions.) However, if we consider "Elo" to be measured in game pairs from 50% W/D positions as it is being done in TCEC, there is no theoretical limit, it just becomes increasingly harder to find suitable openings where the two engines don't just agree on a win or a draw.

This topic has been archived and can no longer be replied to.