- Blind mode tutorial
lichess.org
Donate

The mechanics and maths of the glicko2 system

If you can't even prove your assumptions have a basis in reality, then your simulation is utterly worthless

If you can't even prove your assumptions have a basis in reality, then your simulation is utterly worthless

@21

""" If you can't even prove your assumptions have a basis in reality, then your simulation is utterly worthless """

He he he, simulation is a fun thing, don't get too serious about it.

@21 """ If you can't even prove your assumptions have a basis in reality, then your simulation is utterly worthless """ He he he, simulation is a fun thing, don't get too serious about it.

Yes, don't get heated, whether it bears any relevance to what happens on LiChess or not, this simulation is awesome, and surely has some merit.

There are some good points that have been raised that are good too, and may have a bearing on future experiments. I too have seen games where cheaters play cheaters, or at least where both have 0 0 0, average r/l 8 and both players are flagged.

Still, this is great, I'm intrigued to see what happens.

So, someone correct me if I'm wrong, but theoretically, a cheater with a perfect atomic engine (doesn't exist, I know) comes in, takes 20-something points off a 2000+ player (and gains about 600 points themselves), takes a few tens of points off the next set of 1800-2300 atomers (decreasing every time as their rating increases towards 2500 or something and the R/D comes down) gets caught after about 15 games and has their account suspended and flagged. Unless their account is closed, they're still hanging around, but are not part of a rated player-pool, meaning that the available players have had about 100-150 points taken from them combined. Say there are 1000 atomic players, does this mean that averagely, the player pool's rating has gone down by 0.1 per head?

If my maths or logic is godawful, please don't flame me, just point out the wrong bits, cheers.

Yes, don't get heated, whether it bears any relevance to what happens on LiChess or not, this simulation is awesome, and surely has some merit. There are some good points that have been raised that are good too, and may have a bearing on future experiments. I too have seen games where cheaters play cheaters, or at least where both have 0 0 0, average r/l 8 and both players are flagged. Still, this is great, I'm intrigued to see what happens. So, someone correct me if I'm wrong, but theoretically, a cheater with a perfect atomic engine (doesn't exist, I know) comes in, takes 20-something points off a 2000+ player (and gains about 600 points themselves), takes a few tens of points off the next set of 1800-2300 atomers (decreasing every time as their rating increases towards 2500 or something and the R/D comes down) gets caught after about 15 games and has their account suspended and flagged. Unless their account is closed, they're still hanging around, but are not part of a rated player-pool, meaning that the available players have had about 100-150 points taken from them combined. Say there are 1000 atomic players, does this mean that averagely, the player pool's rating has gone down by 0.1 per head? If my maths or logic is godawful, please don't flame me, just point out the wrong bits, cheers.

It is good that you mention about a cheater that is just suspended. A player whose rating that is still in the system but could not play rated game. As opposed to a cheater whose account is closed and its rating no longer counts in the calculation of the average.

Currently my player structure is:

  1. Idn, I'd number
  2. Name, name of the player
  3. Ra, rating of the player
  4. Rd, rating deviation
  5. Vol, volatility
  6. Ch, cheater mark, o for no cheater and 1 for cheater
  7. Oph, opponent history, this will track down the players' opponent, every game will be recorded, including player info color of opponent, cheat mark and result of the encounter.
  8. Urh , this is called updated rating history, this is the rating update after every game or every after rating period. Latest rating data of a player can be found here.

Regarding (6) or ch, I will revise this to have the following values.
Ch=0 not a cheater
Ch=1 a cheater
2 a cheater that is suspended
3 a cheater whose account is closed.

So during pairing I will skip pairing a player when ch >= 2.

I am working now of how to generate a result given scoring probability. Scoring probability could be 0.9 vs 0.1. Or 0.60 vs 0.40 based on rating differences. There 3 possible results, 1, 0.5 and 0.0.
The scoring probability can be broken down into,
Win percentage
Draw percentage
Lose percentage for a given player.
For example in 0.6 vs 0.4, there must a draw percentage so that we can also generate a result of 0.5, reliably.
Assume 25 percent or 0.25.
So we can say
Win rate = 0.6 - 0.25 x 0.5 = 0.475
So then we can have for player with scoring rate of 0.6,
Win rate = 0.475
Draw rate = 0.25
That data can then be used to generate result vs a 0.4 opponent.

But wait, we assume a 0.2 draw rate. So we can make an estimate that when players are close in strength the draw rate is high. So we can scale the draw rate by calculating their rating differences.

But this is not what has happened to atomic variant, the draw rate there Can be smaller.
This simulator is intended for standard variant but can be used by other variant like atomic but the draw rate should considered carefully.

It is good that you mention about a cheater that is just suspended. A player whose rating that is still in the system but could not play rated game. As opposed to a cheater whose account is closed and its rating no longer counts in the calculation of the average. Currently my player structure is: 1. Idn, I'd number 2. Name, name of the player 3. Ra, rating of the player 4. Rd, rating deviation 5. Vol, volatility 6. Ch, cheater mark, o for no cheater and 1 for cheater 7. Oph, opponent history, this will track down the players' opponent, every game will be recorded, including player info color of opponent, cheat mark and result of the encounter. 8. Urh , this is called updated rating history, this is the rating update after every game or every after rating period. Latest rating data of a player can be found here. Regarding (6) or ch, I will revise this to have the following values. Ch=0 not a cheater Ch=1 a cheater 2 a cheater that is suspended 3 a cheater whose account is closed. So during pairing I will skip pairing a player when ch >= 2. I am working now of how to generate a result given scoring probability. Scoring probability could be 0.9 vs 0.1. Or 0.60 vs 0.40 based on rating differences. There 3 possible results, 1, 0.5 and 0.0. The scoring probability can be broken down into, Win percentage Draw percentage Lose percentage for a given player. For example in 0.6 vs 0.4, there must a draw percentage so that we can also generate a result of 0.5, reliably. Assume 25 percent or 0.25. So we can say Win rate = 0.6 - 0.25 x 0.5 = 0.475 So then we can have for player with scoring rate of 0.6, Win rate = 0.475 Draw rate = 0.25 That data can then be used to generate result vs a 0.4 opponent. But wait, we assume a 0.2 draw rate. So we can make an estimate that when players are close in strength the draw rate is high. So we can scale the draw rate by calculating their rating differences. But this is not what has happened to atomic variant, the draw rate there Can be smaller. This simulator is intended for standard variant but can be used by other variant like atomic but the draw rate should considered carefully.

Score generation for cheater vs cheater.
A. Without considering each rating we can say both have scoring rate of 0.5 or 50%.
Player1 vs player2

Player1
Score rate, 0.5
Draw rate, 0.7
Win rate, 0.5 - 0.7x0.5 = 0.15
Lose rate, 1 - 0.7 - 0.15 = 0.15

B. Considering the rating of existing rating of each player
Say player1 has a higher rating than player2 and the resulting score rate is 0.7 vs 0.3 based from rating difference.

Player1
Score rate, 0.7
Draw rate, 0.5 we assume a high draw rate because they are playing engines.
Win rate, 0.7 - 0.5x 0.5 = 0.45
Lose rate, 1 - 0.5 - 0.45 = 0.05

But a new cheater with rating of 1500 is well capable of beating if not equalize, the cheater with highest rating, so to make a correction factor, we may say that the max score rate difference could be set to 0.2 for example, so it could be 0.6 vs 0.4. And the draw rate is from mid to high range.

Score generation for cheater vs cheater. A. Without considering each rating we can say both have scoring rate of 0.5 or 50%. Player1 vs player2 Player1 Score rate, 0.5 Draw rate, 0.7 Win rate, 0.5 - 0.7x0.5 = 0.15 Lose rate, 1 - 0.7 - 0.15 = 0.15 B. Considering the rating of existing rating of each player Say player1 has a higher rating than player2 and the resulting score rate is 0.7 vs 0.3 based from rating difference. Player1 Score rate, 0.7 Draw rate, 0.5 we assume a high draw rate because they are playing engines. Win rate, 0.7 - 0.5x 0.5 = 0.45 Lose rate, 1 - 0.5 - 0.45 = 0.05 But a new cheater with rating of 1500 is well capable of beating if not equalize, the cheater with highest rating, so to make a correction factor, we may say that the max score rate difference could be set to 0.2 for example, so it could be 0.6 vs 0.4. And the draw rate is from mid to high range.

Actually back to #7, that's an interesting post. But how would you estimate expected win/draw/loss ratio based on rating differences?

Actually back to #7, that's an interesting post. But how would you estimate expected win/draw/loss ratio based on rating differences?

@26

Given rating difference get the scoring probability using fide table.
But I use the formula,
P1 vs p2,
Sp1 = 1 / (1 + 10 ^ ((r2 - r1)/400))
Where:
Sp1=scoring probability of player 1, which sometimes I also call score rate of player 1.
r1=rating of player 1
r2=rating of player2

So basically a player has scoring rate against his opponent based on rating difference.
Then break down the scoring rate into win/draw/loss ratio. Have a look on post 24 and 25, those are my initial thought on how to get this going. Perhaps also I will look the actual draw ratio of those Fide players.

Winratio=scorerate-drawratio x 0.5
Lossratio=1-winratio-drawratio

And draw ratio has a maximum.
Drawratiomax=(1-scorerate) x 2

Draw ratio of player1=drawratio of player2.

@26 Given rating difference get the scoring probability using fide table. But I use the formula, P1 vs p2, Sp1 = 1 / (1 + 10 ^ ((r2 - r1)/400)) Where: Sp1=scoring probability of player 1, which sometimes I also call score rate of player 1. r1=rating of player 1 r2=rating of player2 So basically a player has scoring rate against his opponent based on rating difference. Then break down the scoring rate into win/draw/loss ratio. Have a look on post 24 and 25, those are my initial thought on how to get this going. Perhaps also I will look the actual draw ratio of those Fide players. Winratio=scorerate-drawratio x 0.5 Lossratio=1-winratio-drawratio And draw ratio has a maximum. Drawratiomax=(1-scorerate) x 2 Draw ratio of player1=drawratio of player2.

Game generation simulation approach approximating the arena format.

  1. From a pool of 10k players select randomly players that will participate for example, 200 players.
  2. When selecting a player assign a minimum number of games that a player is allowed to play by randomly generating a number between 1 to 20 for example.
    Addplayer (KJKDW, 18)
    Which means player KJKDW is allowed to play a minimum of 18 games.
  3. each player will get a tour number that is from 1 to 200.
    Starting at player1 find the candidate opponent of this player by going thu the 199 other players.
  4. Pairing point system, from a point of 0 to 100,
  • when the candidate opponent is close in rating pairing point is 60. Scale this based on rating difference. But it's max point for this pairing criteria is 60.
  • when candidate opponent has not played before with this player1 add pairing point of 5.
  • when players have already accumulated score, pairing point will be added with max value of 80 to candidate that have a close score in the tour.
    Do this to all players then find the player with the highest pairing point. That player will become the opponent of player1.
  • do the same for player2 and others.
Game generation simulation approach approximating the arena format. 1. From a pool of 10k players select randomly players that will participate for example, 200 players. 2. When selecting a player assign a minimum number of games that a player is allowed to play by randomly generating a number between 1 to 20 for example. Addplayer (KJKDW, 18) Which means player KJKDW is allowed to play a minimum of 18 games. 3. each player will get a tour number that is from 1 to 200. Starting at player1 find the candidate opponent of this player by going thu the 199 other players. 4. Pairing point system, from a point of 0 to 100, * when the candidate opponent is close in rating pairing point is 60. Scale this based on rating difference. But it's max point for this pairing criteria is 60. * when candidate opponent has not played before with this player1 add pairing point of 5. * when players have already accumulated score, pairing point will be added with max value of 80 to candidate that have a close score in the tour. Do this to all players then find the player with the highest pairing point. That player will become the opponent of player1. * do the same for player2 and others.

well all these maths treatise are fine but we shouldn't lose sight that this is an approximation to the true distribution of chess ratings. It is never accurate.

Nobody applies Bayesian probability to any of this stuff, so when a player "outperforms" we label them a cheat because we no nothing about the probability of probabilities of Gaussian distribution.

well all these maths treatise are fine but we shouldn't lose sight that this is an approximation to the true distribution of chess ratings. It is never accurate. Nobody applies Bayesian probability to any of this stuff, so when a player "outperforms" we label them a cheat because we no nothing about the probability of probabilities of Gaussian distribution.

michuk,

Would you care to elaborate what is this true distribution of chess ratings?

michuk, Would you care to elaborate what is this true distribution of chess ratings?

This topic has been archived and can no longer be replied to.