The mechanics and maths of the glicko2 system

I have a couple to begin with, if people want to add more, numbering any subsequent questions might make it easier to answer:

Why are the top atomic players' R/Ds so volatile? Is it because they tend to play lots of lower rated players and very few of a comparable grade?
What happens in glicko2 terms to the LiChess playerbase when a player joins LiChess, cheats, takes a lot of points off law-abiding citizens and then has their account suspended? Does this deflate active players' ratings and make the law-abiding player pool averagely weaker?
What conditions have to be met to make a particular variant's ratings inflate? I'm not entirely sure, it might just be player improvement in a 'frontier' variant, but it seems to me that atomic players' ratings have been inflating recently.
What affect if any does a sudden wave of new, green (i.e. noob, but a nicer word) players have on a variant, including bullet, blitz or classical. Is it easy to spot these trends?

If someone mathsy finds answering these fun, I'd be much obliged if you could explain and answer these questions.

Hi, I am not a mathematically minded individual, but I figured that some people actually enjoy answering maths questions, so this topic was intended to be a place that anyone could ask a glicko2 related question. I have a couple to begin with, if people want to add more, numbering any subsequent questions might make it easier to answer: 1) Why are the top atomic players' R/Ds so volatile? Is it because they tend to play lots of lower rated players and very few of a comparable grade? 2) What happens in glicko2 terms to the LiChess playerbase when a player joins LiChess, cheats, takes a lot of points off law-abiding citizens and then has their account suspended? Does this deflate active players' ratings and make the law-abiding player pool averagely weaker? 3) What conditions have to be met to make a particular variant's ratings inflate? I'm not entirely sure, it might just be player improvement in a 'frontier' variant, but it seems to me that atomic players' ratings have been inflating recently. 4) What affect if any does a sudden wave of new, green (i.e. noob, but a nicer word) players have on a variant, including bullet, blitz or classical. Is it easy to spot these trends? If someone mathsy finds answering these fun, I'd be much obliged if you could explain and answer these questions.

jimj12

Pretty much. Probably the reason they have such a high rating in the first place; atomic is full of opening traps that newer players don't know.
It won't deflate anyone's ratings, but rather change the percentile that rating corresponds to. For example if a 2000 rated player was in the top 5%, but then hundreds of cheaters gained a rating above 2000, that 2000 player might now only be in the top 10%. If all the cheaters are banned, the effect should be reversed.
An example of rating inflation: in one time period a rating of 1500 corresponds to top 50%, but in the next period, 1600 corresponds to top 50%. Lichess has the data to determine if there is rating inflation or not, but we can't tell just by seeing ratings increase. I don't know what would cause ratings to actually inflate, apart from the obvious 'everyone gets better'
If a wave of noobs enter the variant pool, and are on average worse than the average current variant player, then they will push current players up the bell curve; to a higher percentile and rating. This would not be rating inflation

1. Pretty much. Probably the reason they have such a high rating in the first place; atomic is full of opening traps that newer players don't know. 2. It won't deflate anyone's ratings, but rather change the percentile that rating corresponds to. For example if a 2000 rated player was in the top 5%, but then hundreds of cheaters gained a rating above 2000, that 2000 player might now only be in the top 10%. If all the cheaters are banned, the effect should be reversed. 3. An example of rating inflation: in one time period a rating of 1500 corresponds to top 50%, but in the next period, 1600 corresponds to top 50%. Lichess has the data to determine if there is rating inflation or not, but we can't tell just by seeing ratings increase. I don't know what would cause ratings to actually inflate, apart from the obvious 'everyone gets better' 4. If a wave of noobs enter the variant pool, and are on average worse than the average current variant player, then they will push current players up the bell curve; to a higher percentile and rating. This would not be rating inflation

Illion

Some of the top players don't play for long periods, increasing their RD due to inactivity. This also partially explains why the usual matchup is much higher-rated versus lower-rated. Also, there is a distinct skill gap between the very topmost players and the rest - I feel like I can barely nick a game or two out of 10 over them, if at all.
Because the Glicko system is not zero-sum, it's a little more complicated. However, I imagine the effect of cheaters on the rating pool is almost negligible for a different reason, that they are so outnumbered by legitimate humans that their impact is next to zero. Maybe in variants the effect could be more pronounced, but engines for Horde, atomic, 3C etc are not that readily available for the wider public either. (It actually takes some effort to find and compile them.)
Related to 1 I think. When RD is high, and strong players return to play weaker players, them winning puts points into the rating pool than the weaker players remove. In the long run this causes some inflation. Another effect is I think the sudden influx of decent new players into the atomic pool (due to more tournaments, or perhaps the other way around.) They are better than the average fresh player (their rating is ~2000), but from what I see they still struggle with basic endgame technique and miss opening kills. Not sure this is "everyone getting better", or maybe I'm biased.
I sort of covered this under 3, but another effect I think happens is this. In a new batch of players, after some games there will be rating winners (rating >1500) and there will be rating losers (rating <1500). Of those who lost rating, more probably will be driven to quit, especially in a variant as frustrating as atomic where you can lose every game in under 10 moves, for 20, 30 games. (When I play anonymously now I tend to open with 1. d4 as white to avoid dishing it out too badly.) This has the effect of pushing the rating distribution upwards.

Overall I think rating trends are less correlated with individual skill levels than with large-scale player pool behaviours. If everyone gets equally better in a closed group, ratings won't budge. It's effects like new player influx, or the splitting of the player pool into noticeably distinct skill groups that drive these trends.

1. Some of the top players don't play for long periods, increasing their RD due to inactivity. This also partially explains why the usual matchup is much higher-rated versus lower-rated. Also, there is a distinct skill gap between the very topmost players and the rest - I feel like I can barely nick a game or two out of 10 over them, if at all. 2. Because the Glicko system is not zero-sum, it's a little more complicated. However, I imagine the effect of cheaters on the rating pool is almost negligible for a different reason, that they are so outnumbered by legitimate humans that their impact is next to zero. Maybe in variants the effect could be more pronounced, but engines for Horde, atomic, 3C etc are not that readily available for the wider public either. (It actually takes some effort to find and compile them.) 3. Related to 1 I think. When RD is high, and strong players return to play weaker players, them winning puts points into the rating pool than the weaker players remove. In the long run this causes some inflation. Another effect is I think the sudden influx of decent new players into the atomic pool (due to more tournaments, or perhaps the other way around.) They are better than the average fresh player (their rating is ~2000), but from what I see they still struggle with basic endgame technique and miss opening kills. Not sure this is "everyone getting better", or maybe I'm biased. 4. I sort of covered this under 3, but another effect I think happens is this. In a new batch of players, after some games there will be rating winners (rating >1500) and there will be rating losers (rating <1500). Of those who lost rating, more probably will be driven to quit, especially in a variant as frustrating as atomic where you can lose every game in under 10 moves, for 20, 30 games. (When I play anonymously now I tend to open with 1. d4 as white to avoid dishing it out too badly.) This has the effect of pushing the rating distribution upwards. Overall I think rating trends are less correlated with individual skill levels than with large-scale player pool behaviours. If everyone gets equally better in a closed group, ratings won't budge. It's effects like new player influx, or the splitting of the player pool into noticeably distinct skill groups that drive these trends.

RealKool

The volatility (V) measure is not applied to RD but to the rating.
http://www.glicko.net/glicko/glicko2.pdf
Although we knew that rating in Glicko2 has lower and upper values thru RD.

RD is mainly affected by number of games.

I checked the top 3 atomic players and all have 0.05 volatility.

This V variable is not properly described in the above link IMO. It only says if it is low, the player performs at consistent level. If it is high the player is having an erratic performance. Question is what are typical low values that one would consider to have a consistent level of performance? How about high?

It was sampled that for unrated player one can start at 0.06 with a condition attached (this value depends on the particular application), I don't know what that means.

Now look at those values 0.05 to 0.06, there is no sense on those numbers :)

It should have been:
0.0 to 0.02 = Very consistent
0.03 to 0.05 = Consistent
... and so on.

In this system there is also called a system constant tau,
""" which constrains the change in volatility over time, needs to be
set prior to application of the system."""

1. The volatility (V) measure is not applied to RD but to the rating. http://www.glicko.net/glicko/glicko2.pdf Although we knew that rating in Glicko2 has lower and upper values thru RD. RD is mainly affected by number of games. I checked the top 3 atomic players and all have 0.05 volatility. This V variable is not properly described in the above link IMO. It only says if it is low, the player performs at consistent level. If it is high the player is having an erratic performance. Question is what are typical low values that one would consider to have a consistent level of performance? How about high? It was sampled that for unrated player one can start at 0.06 with a condition attached (this value depends on the particular application), I don't know what that means. Now look at those values 0.05 to 0.06, there is no sense on those numbers :) It should have been: 0.0 to 0.02 = Very consistent 0.03 to 0.05 = Consistent ... and so on. In this system there is also called a system constant tau, """ which constrains the change in volatility over time, needs to be set prior to application of the system."""

RealKool

@ #1

I think this is a good time to create a simulation
Create 2000 or so good players with different Rating, RD and V
Create 2% of 2000 or 40 cheaters that all start at 1500/350/0.06 (R, RD, V), so total players are now 2040
Create first tour where each player will play 50 games each
Calculate Glicko2 rating for that first period using the 50 games each per player
Create 2nd tour, each will play 50 games
Calculate Glicko2 rating for this 2nd period.
Calculate the average rating with cheaters
Now create a system where there are no cheaters, create good players as in (1)
Follow (3)
Follow (4, 5, 6) Simulation is done here,
Calculate the average rating without cheaters
Compare (7) and (11)
Compare the Max in (6) and the max in (10 after 2nd period)
Remove the cheaters in (6)
Get average rating of the remaining good
Compare (11) and (15)

Simulation assumptions:

A cheater will get a 90% chance to win a game vs good player (does not matter if strong or not)
A Cheater vs Cheater have equal chances of winning that is 50% even if one cheater has higher rating after the first rating period.
For a good player vs a good player, the winning percentage will be based on scoring probability using FIDE table 8.1b
https://www.fide.com/fide/handbook.html?id=172&view=article

@ #1 2. I think this is a good time to create a simulation 1. Create 2000 or so good players with different Rating, RD and V 2. Create 2% of 2000 or 40 cheaters that all start at 1500/350/0.06 (R, RD, V), so total players are now 2040 3. Create first tour where each player will play 50 games each 4. Calculate Glicko2 rating for that first period using the 50 games each per player 5. Create 2nd tour, each will play 50 games 6. Calculate Glicko2 rating for this 2nd period. 7. Calculate the average rating with cheaters 8. Now create a system where there are no cheaters, create good players as in (1) 9. Follow (3) 10. Follow (4, 5, 6) Simulation is done here, 11. Calculate the average rating without cheaters 12. Compare (7) and (11) 13. Compare the Max in (6) and the max in (10 after 2nd period) 14. Remove the cheaters in (6) 15. Get average rating of the remaining good 16. Compare (11) and (15) Simulation assumptions: 1. A cheater will get a 90% chance to win a game vs good player (does not matter if strong or not) 2. A Cheater vs Cheater have equal chances of winning that is 50% even if one cheater has higher rating after the first rating period. 3. For a good player vs a good player, the winning percentage will be based on scoring probability using FIDE table 8.1b https://www.fide.com/fide/handbook.html?id=172&view=article

Caustic

Thanks for your answers everyone.

#5, this is a greatly practical answer, how easy is this kind of thing to simulate? Are there programs that perform this type of simulation, or would this require a lot of maths?

Thanks for your answers everyone. #5, this is a greatly practical answer, how easy is this kind of thing to simulate? Are there programs that perform this type of simulation, or would this require a lot of maths?

RealKool

I actually plan to do it like Lichess, that is after 1 game by each player calculate the new R/RD/V immediately so that for the next game they already have an updated R/RD/V to use.

I have not known if this was done before.
This does not require too much math.

Simulation is just generating a random number with some conditions.

For example only, player A vs player B, and player A (a cheater) has a 90% scoring probability. Generate a random number between 1 to 100.
If generated number is from 1 to 8, player B wins.
If generated number is from 9 to 10, result is a draw.
If generated number is from 11 to 100, player A wins.

Win/Loss/Draw stats
B wins = 8/100 = 8%
Draw = 2/100 = 2%
A wins = 90/100 = 90% [11 to 100 number = 90 counts]

Do the same for other players but with different scoring probability.
Then it is just a matter of saving these info to get a summary of ratings, RD, V and others.

Mark has provided a library on how to calculate 2 new R/RD/V given 2 players with old R/RD/V and result. Or player 'A' vs more opponents, and calculate new R/RD/V of player 'A'. http://www.glicko.net/glicko.html I will use the one written in python. I actually plan to do it like Lichess, that is after 1 game by each player calculate the new R/RD/V immediately so that for the next game they already have an updated R/RD/V to use. I have not known if this was done before. This does not require too much math. Simulation is just generating a random number with some conditions. For example only, player A vs player B, and player A (a cheater) has a 90% scoring probability. Generate a random number between 1 to 100. If generated number is from 1 to 8, player B wins. If generated number is from 9 to 10, result is a draw. If generated number is from 11 to 100, player A wins. Win/Loss/Draw stats B wins = 8/100 = 8% Draw = 2/100 = 2% A wins = 90/100 = 90% [11 to 100 number = 90 counts] Do the same for other players but with different scoring probability. Then it is just a matter of saving these info to get a summary of ratings, RD, V and others.

jimj12

How will you account for game selection? People will play those that are similar rank themselves. If the cheaters quickly populate the top ranks, then the majority of their games will be against each other. At this point, you cannot simply assume a 50% win rate between cheaters. The one with a better engine will win 100% of the time.

And, because of this, only the first few games that it takes for the cheaters to populate the top ranks and subsequently play each other will affect the ratings of the majority of non cheaters (in your sample, very few non-cheaters will be able to compete with these cheaters)

So, the average rating of the non-cheaters shouldn't change much at all; a new cheater will get to high ratings in very few games if they win 90% of their provisional games. What will change, however, is "real" rankings, given by a percentile. A top 10% player before will be lower than that after

#7 How will you account for game selection? People will play those that are similar rank themselves. If the cheaters quickly populate the top ranks, then the majority of their games will be against each other. At this point, you cannot simply assume a 50% win rate between cheaters. The one with a better engine will win 100% of the time. And, because of this, only the first few games that it takes for the cheaters to populate the top ranks and subsequently play each other will affect the ratings of the majority of non cheaters (in your sample, very few non-cheaters will be able to compete with these cheaters) So, the average rating of the non-cheaters shouldn't change much at all; a new cheater will get to high ratings in very few games if they win 90% of their provisional games. What will change, however, is "real" rankings, given by a percentile. A top 10% player before will be lower than that after

Caustic

#8 People are different though right? I'm sure a huge percentage do what you say, however, some players almost exclusively pick on players with a much lower rating, while there are probably a select and hardy few who mainly pick on higher players. I wonder how you would account for this variance in fixture selection in the simulation of ratings.

jimj12

#10

Well I'd assume a cheater probably isn't going to waste their limited games until they get banned playing against noobs, but you're right; it's possible.

There is no way to simulate it anywhere near accurately; everything you set will be arbitrary and wrong. The only way I can see that would get you what you want is to observe Lichess and collect data, though you'd need to be able to code and perform statistical analyses

Well I'd assume a cheater probably isn't going to waste their limited games until they get banned playing against noobs, but you're right; it's possible. There is no way to simulate it anywhere near accurately; everything you set will be arbitrary and wrong. The only way I can see that would get you what you want is to observe Lichess and collect data, though you'd need to be able to code and perform statistical analyses

This topic has been archived and can no longer be replied to.