- Blind mode tutorial
lichess.org
Donate

All about the Elo Rating System

@Vlad_G92 said in #10:

I attended the FIDE QC meeting last Friday and can tell you that Moiseenko and Sutovsky are particularly opposed to the idea of overhauling the rating system and switching from Elo to Glicko. As I had voting rights in the open meeting, and per my earlier article, quoted here, I was fully in favor of modernizing the system. The commission members were mostly split on the issue. There will be no massive change before 2028, and the results of the rating decay questionnaire haven't been made public yet, as far as I know.

Thank you for the information. I really admire your in depth blog about how the FIDE rating changes are not working and how the rating system should be modernized.

@Vlad_G92 said in #10: > I attended the FIDE QC meeting last Friday and can tell you that Moiseenko and Sutovsky are particularly opposed to the idea of overhauling the rating system and switching from Elo to Glicko. As I had voting rights in the open meeting, and per my earlier article, quoted here, I was fully in favor of modernizing the system. The commission members were mostly split on the issue. There will be no massive change before 2028, and the results of the rating decay questionnaire haven't been made public yet, as far as I know. Thank you for the information. I really admire your in depth blog about how the FIDE rating changes are not working and how the rating system should be modernized.

In fact, it is necessary to modernize the rating method, especially for countries that have the classic characteristic of "deflated" Elo.

In fact, it is necessary to modernize the rating method, especially for countries that have the classic characteristic of "deflated" Elo.

I think I agree with everything in the article, minus one dumb pedantic point:

  1. Elo assumes that everyone has the same variability in their rating

No, Elo is just a simpler model than Glicko, because it was first. Glicko etc. expand that model to account for what we now know. I haven't seen anything in the Elo model which requires players to have the same variability in playing strength, it just isn't an attribute accounted for by the simpler model.

I think I agree with everything in the article, minus one dumb pedantic point: > 1. Elo assumes that everyone has the same variability in their rating No, Elo is just a simpler model than Glicko, because it was first. Glicko etc. expand that model to account for what we now know. I haven't seen anything in the Elo model which requires players to have the same variability in playing strength, it just isn't an attribute accounted for by the simpler model.

@Toadofsky said in #13:

I think I agree with everything in the article, minus one dumb pedantic point:

  1. Elo assumes that everyone has the same variability in their rating

No, Elo is just a simpler model than Glicko, because it was first. Glicko etc. expand that model to account for what we now know. I haven't seen anything in the Elo model which requires players to have the same variability in playing strength, it just isn't an attribute accounted for by the simpler model.

## But Mr. Elo says that player performances are modeled by a normal distribution with a standard deviation of 200 points.

Quotes below:

"From general experience in sports we know that the stronger player does not invariably outperform the weaker. A player has good days and bad, good tournaments and bad. By and large at any point in his career, a player will perform around some average level. Deviations from this level occur, large deviations less fre­quently than small ones. These facts suggest the basic assumption of the Elo system. It is best stated in the formal terms of statistics: The many performances of an individual will be normally distributed, when evaluated on an appropriate scale. Extensive investigation (Elo 1965, McClintock 1977) bore out the validity of the assumption. Alternative assumptions are dis­cussed in 8.72."

Arpad Elo, The Rating of Chessplayers, Past and Present, Second Edition, (1986) (p.7)

"Statistical and probability theory provides a widely used measure of these performance spreads, a measure which has worked quite well for many other natural phenomena which vary on a measur­able basis. This well known concept is standard deviation, a measurement of spread which encompasses the central bulk- about two-thirds-of an individual's performances. It is shown graphically at 1.35 and its derivation is explained at 9.3. It provides almost the ideal major interval for the rating scale, to define the class described and desired. In the Elo system, the class interval C is quantitatively defined at C = 1 σ σ is the Greek letter sigma, the usual symbol for the unit of a standard deviation." (Class Interval is 200 points)

Arpad Elo, The Rating of Chessplayers, Past and Present, Second Edition, 1986 (p.5-6).

# So to sum up, Mr. Elo states that players performances are modeled by a normal distribution with a standard deviation of 200 points. So is he not assuming that everyone has the same variability in their performances?

What do you think, @Toadofsky?

@Toadofsky said in #13: > I think I agree with everything in the article, minus one dumb pedantic point: > > 1. Elo assumes that everyone has the same variability in their rating > > No, Elo is just a simpler model than Glicko, because it was first. Glicko etc. expand that model to account for what we now know. I haven't seen anything in the Elo model which requires players to have the same variability in playing strength, it just isn't an attribute accounted for by the simpler model. ## But Mr. Elo says that player performances are modeled by a normal distribution with a standard deviation of 200 points. Quotes below: >*"From general experience in sports we know that the stronger player does not invariably outperform the weaker. A player has good days and bad, good tournaments and bad. By and large at any point in his career, a player will perform around some average level. Deviations from this level occur, large deviations less fre­quently than small ones. These facts suggest the basic assumption of the Elo system. It is best stated in the formal terms of statistics: The many performances of an individual will be normally distributed, when evaluated on an appropriate scale. Extensive investigation (Elo 1965, McClintock 1977) bore out the validity of the assumption. Alternative assumptions are dis­cussed in 8.72."* [Arpad Elo, The Rating of Chessplayers, Past and Present, Second Edition, (1986)](https://gwern.net/doc/statistics/order/comparison/1978-elo-theratingofchessplayerspastandpresent.pdf) (p.7) >*"Statistical and probability theory provides a widely used measure of these performance spreads, a measure which has worked quite well for many other natural phenomena which vary on a measur­able basis. This well known concept is standard deviation, a measurement of spread which encompasses the central bulk- about two-thirds-of an individual's performances. It is shown graphically at 1.35 and its derivation is explained at 9.3. It provides almost the ideal major interval for the rating scale, to define the class described and desired. In the Elo system, the class interval C is quantitatively defined at C = 1 σ σ is the Greek letter sigma, the usual symbol for the unit of a standard deviation."* (Class Interval is 200 points) [Arpad Elo, The Rating of Chessplayers, Past and Present, Second Edition, 1986](https://gwern.net/doc/statistics/order/comparison/1978-elo-theratingofchessplayerspastandpresent.pdf) (p.5-6). # So to sum up, Mr. Elo states that players performances are modeled by a normal distribution with a standard deviation of 200 points. So is he not assuming that everyone has the same variability in their performances? What do you think, @Toadofsky?

@RuyLopez1000 said in #14:

So to sum up, Mr. Elo states that players performances are modeled by a normal distribution with a standard deviation of 200 points. So is he not assuming that everyone has the same variability in their performances?

What do you think, @Toadofsky?

What do I think about what? Prof. Elo states that players' performances are modeled by a normal distribution, but that doesn't explicitly say anything about variability. He also says, "Deviations from this level occur, large deviations less fre­quently than small ones," but that's not equivalent to:

Elo assumes that everyone has the same variability in their rating

Reading Glickman's papers reinforces my point.

@RuyLopez1000 said in #14: > So to sum up, Mr. Elo states that players performances are modeled by a normal distribution with a standard deviation of 200 points. So is he not assuming that everyone has the same variability in their performances? > > What do you think, @Toadofsky? What do I think about what? Prof. Elo states that players' performances are modeled by a normal distribution, but that doesn't explicitly say anything about variability. He also says, "Deviations from this level occur, large deviations less fre­quently than small ones," but that's not equivalent to: > Elo assumes that everyone has the same variability in their rating Reading Glickman's papers reinforces my point.

@Toadofsky

I agree with you now. Edited the blog to reflect that it's the fact that Elo doesn't take into account variability, rather than that Elo assumes equal variability for everyone.

Thanks for the feedback.

@Toadofsky I agree with you now. Edited the blog to reflect that it's the fact that Elo doesn't take into account variability, rather than that Elo assumes equal variability for everyone. Thanks for the feedback.
<Comment deleted by user>

weakness of the Elo system: game order bias

A weakness of the Elo system is the rating depends on the order of games. Suppose I play 8 games in a day against an equally rated opponent. In one case, I lose the first four, then win the next four. In this case my rating drops, the opponent rating increases, and the system concludes I am a weaker player. As a result with each subsequent loss, I lose less rating than the game before. When I start winning, I am an underdog, and so I gain more than half the available points. By the end of the day I have gained rating, despite the fact I've won half my games against an equally rated opponent, exactly the expected result.

In contrast, if I win four then lose four, I'll lose rating, because on average during the day my rating has been higher, and the system concludes during the day the expected result is >50% wins. Again, at the beginning of the day we were equally rated, and based on that, 4 wins is the expected result. It should make no difference in what order those wins occur.

A better system would calculate strength not just in forward time but also reverse time. If I lose to a player who then goes on a 10 game winning streak, that player was likely under-rated, and the strength estimate for that inital game should be revised if the time difference is sufficiently small 9what's sufficiently small? That's a key consideration.) But for games played within a tight interval, it's reasonable to assume a player's strength hasn't changed much, and order shouldn't matter.

I have tried coding this sort of system and it's difficult to get it to converge on a self-consistent solution. In contrast, the Elo formula is simple. It can be calculated in ones head with even rudimentary understanding of exponentials. But it does have a weakness of being prone to players being over-rated or under-rated as evidenced by future performance.

**weakness of the Elo system: game order bias** A weakness of the Elo system is the rating depends on the order of games. Suppose I play 8 games in a day against an equally rated opponent. In one case, I lose the first four, then win the next four. In this case my rating drops, the opponent rating increases, and the system concludes I am a weaker player. As a result with each subsequent loss, I lose less rating than the game before. When I start winning, I am an underdog, and so I gain more than half the available points. By the end of the day I have gained rating, despite the fact I've won half my games against an equally rated opponent, exactly the expected result. In contrast, if I win four then lose four, I'll lose rating, because on average during the day my rating has been higher, and the system concludes during the day the expected result is >50% wins. Again, at the beginning of the day we were equally rated, and based on that, 4 wins is the expected result. It should make no difference in what order those wins occur. A better system would calculate strength not just in forward time but also reverse time. If I lose to a player who then goes on a 10 game winning streak, that player was likely under-rated, and the strength estimate for that inital game should be revised if the time difference is sufficiently small 9what's sufficiently small? That's a key consideration.) But for games played within a tight interval, it's reasonable to assume a player's strength hasn't changed much, and order shouldn't matter. I have tried coding this sort of system and it's difficult to get it to converge on a self-consistent solution. In contrast, the Elo formula is simple. It can be calculated in ones head with even rudimentary understanding of exponentials. But it does have a weakness of being prone to players being over-rated or under-rated as evidenced by future performance.

@djconnel

When calculating rating changes, FIDE uses the official rating that a player has at the start of the month. So in a 8 game match between equal players, the rating change would be equal for each game. So it would bounce back.

With a delta of 5 points for each game we have -20 points for the first half, +20 points for the second half and vice versa). The players would be viewed as having an equal rating all the way, instead of their ratings changing after each game.

@djconnel When calculating rating changes, FIDE uses the official rating that a player has at the start of the month. So in a 8 game match between equal players, the rating change would be equal for each game. So it would bounce back. With a delta of 5 points for each game we have -20 points for the first half, +20 points for the second half and vice versa). The players would be viewed as having an equal rating all the way, instead of their ratings changing after each game.

While Glicko does a good job of getting one's rating closer to their true performance Glicko has too much potential for overshoot in the process. And this can be strategically mined (abused) to attempt to hit various rating targets required for titles (2200, 2300, etc.). There is also potential for it to be abused for rating spots, historical records, etc. At least much more so than the existing Elo system.

While Glicko does a good job of getting one's rating closer to their true performance Glicko has too much potential for overshoot in the process. And this can be strategically mined (abused) to attempt to hit various rating targets required for titles (2200, 2300, etc.). There is also potential for it to be abused for rating spots, historical records, etc. At least much more so than the existing Elo system.