Rating variance over a large number of games

jarro_278327 Sept 202451 viewsEnglish (US)

How I learned to stop caring about rating and love the game: a demonstration of probability and rating changes.

I often think about how much my rating should change, and how many games I expect to win or lose in a row. We hear a lot about "tilt" or the opposite, being on a good streak. But how much of that is expected variance rather than having a bad day, or being super focused in the right mind set.

We are constantly being told by some of the "experts" that we should avoid playing under less than optimal conditions. We shouldn't play when we are tired, or hungry, or distracted by kids. We shouldn't play more than some number of games in a row so that we don't lose rating from tilt, never resign because you never know when you might still win. We should only play when we can be laser focused and are in the exact right frame of mind. All of this so that we might be rated 100 or 200 points higher depending on who you ask. Those things are probably not wrong, and playing when tired is probably not a great way to ensure you play your best, but if you just want to have fun is it really the most important thing to focus on?

If you're playing to learn and just enjoy the game, does it really matter what your exact rating is? I say just have fun and learn from your games.

As a computer programmer I thought that I could approach an answer to these questions by just doing some simulations. So I wrote the most basic of Python code possible that somewhat simulated what the rating change of a new account on Lichess could look like. There are a couple of assumptions that I made

1. The true rating of this player is constant.
2. You play enough games quickly that time is not relevant, and your rating becomes stable quickly.
3. Rating can be modeled as a purely probabilistic process.

A note on the third point: this is the assumption of the mathematical model of rating used by everyone. I don't know how correct it is, but it is nevertheless the basis of the systems actually used, and it serves to illustrate the points made below quite well.

The code I wrote simulated a series of wins and losses for a hypothetical player with an arbitrary "true" rating (I picked 967). They started their online rating at 1500 with a high rating uncertainty, and were paired against players with a random rating between 100 below and 100 above their online rating, to roughly simulate the pairing process that I observe on Lichess. The actual outcome was then determined using a random number and the standard definition of rating (detailed here if you are interested https://www.chessprogramming.org/Match_Statistics#Elo-Rating_.26_Win-Probability). I assumed that your rating change stops getting smaller at 5 points, since that is approximately what I have observed on Lichess once your rating is stable. I continued the simulation for 500 games, and plotted the results. For one such run we have the following graph:

The line through the middle is the "true" rating. This is basically a random walk that is weighted towards the true rating line. We can see that there are some wild rating changes initially as the deviation stabilises, but there are some interesting patterns that come out once it has stabilised. There are runs of both wins and losses, and also periods of less variance. But we can also see that the lowest and highest rating have a difference of over 100 points.

Here is another random run

Again you can see periods of trends in both directions. In this case even due to the end point, the rating is much higher than the player's true rating; if this was a real player, they might see their peak rating at close to 1050, then lose 100 points and get discouraged thinking that they were playing badly.

The effect of this is probably even more pronounced when you take your opponent into account. You might be at your peak, playing someone rated 100 points higher than you who is at the bottom of your rating. So rather than complaining that you lost or blame your environment, maybe you just got crushed by someone with a true strength 300 points higher than you. Without too much more effort, random pairings all being simulated at the same time could be played against each other to see if this graph changes when taking more players into account.

So rather than stressing about rating, and playing in ideal conditions, just play the game to have fun and learn. Enjoy finding the tactics and the beautiful checkmates after sacking material. Play as many games in a row as you like and late at night, whatever floats your boat. Don't get stressed about a losing streak and enjoy the fact that you might play weaker players for a while.

Your rating is going to fluctuate a lot, and unless you become the next Magnus Carlsen overnight you will lose a lot of games too.

Discuss this blog post in the forum

Rating variance over a large number of games

More blog posts by jarro_2783