- Blind mode tutorial
lichess.org
Donate

Converting Chess.com to Lichess Ratings: A New Data-Driven Method

  1. Using the "precision" (see my comment at your previous blogpost) cannot address some of the problems you mention, e.g. people often taking one of the accounts more seriously.

  2. To achieve actual consistency, you would have to not only filter only games with a specific time control but to also filter only players who play that time control in majority of their games. Otherwise their rating may not reflect their relative strength in that particular time control but rather their relative strength in completely different time controls.

  3. As already pointed out in my comment at your previous post, you did not present data showing how reliable the relation between rating and "precision" actually is but you use it as if you knew that the variation is negligible. But is it actually?

  4. You repeat the same mistake here: again, you only present the mean values with no information about the distribution of the actual values and not giving any idea how reliable those results are.

1. Using the "precision" (see my comment at your previous blogpost) cannot address some of the problems you mention, e.g. people often taking one of the accounts more seriously. 2. To achieve actual consistency, you would have to not only filter only games with a specific time control but to also filter only players who play that time control in majority of their games. Otherwise their rating may not reflect their relative strength in that particular time control but rather their relative strength in completely different time controls. 3. As already pointed out in my comment at your previous post, you did not present data showing how reliable the relation between rating and "precision" actually is but you use it as if you knew that the variation is negligible. But is it actually? 4. You repeat the same mistake here: again, you only present the mean values with no information about the distribution of the actual values and not giving any idea how reliable those results are.

Wow, you took the time to read both of my articles? That really means a lot — thank you @mkubecek !

  1. It's true that individual motivation or mindset can differ between accounts, and that's something no dataset can fully capture. But the goal of the analysis isn’t to perfectly match individual ratings between Chess.com and Lichess. Instead, it's about showing that on average, players rated X on one platform perform similarly — in terms of precision — to players rated Y on the other. That gives us a useful benchmark: if you're Xon Lichess, you can expect to face opponents on Chess.com with comparable playing strength around Y, even if individual cases vary.

  2. On time controls — you're right that player strength can depend on format. That’s why both platforms group time controls into broad categories like bullet, blitz, and rapid. It’s not perfect, but it’s practical. Within those ranges, most players stick to similar rhythms, and the rating pools stay relatively consistent. Filtering only for players who specialize exclusively in one time control would be interesting — but also very limiting in terms of sample size. So again, the focus here is on large-scale trends, not edge cases.

  3. Regarding precision — I totally hear you. It's not meant to be a flawless measure. Precision reflects how close a move is to engine-recommended play, and that's not always the same as practical strength, especially in sharp positions. But when we look at hundreds of thousands of games, precision becomes a meaningful proxy: it's not about saying how any one player plays, but how whole rating groups tend to behave. And the patterns are surprisingly consistent.

  4. On distribution: I get the point. But when you’re comparing average strength between platforms, it’s really the mean that matters — since ratings are calibrated to reflect expected performance over time. Knowing the spread could definitely add nuance, though, and it’s something I might explore further. That said, the main takeaway — that certain rating bands align across sites — still stands.

Thanks again for all the thought you’ve put into this. — I’m always happy to continue the discussion!

Wow, you took the time to read both of my articles? That really means a lot — thank you @mkubecek ! 1. It's true that individual motivation or mindset can differ between accounts, and that's something no dataset can fully capture. But the goal of the analysis isn’t to perfectly match *individual* ratings between Chess.com and Lichess. Instead, it's about showing that on average, players rated X on one platform perform similarly — in terms of precision — to players rated Y on the other. That gives us a useful benchmark: if you're Xon Lichess, you can expect to face opponents on Chess.com with comparable playing strength around Y, even if individual cases vary. 2. On time controls — you're right that player strength can depend on format. That’s why both platforms group time controls into broad categories like bullet, blitz, and rapid. It’s not perfect, but it’s practical. Within those ranges, most players stick to similar rhythms, and the rating pools stay relatively consistent. Filtering only for players who specialize *exclusively* in one time control would be interesting — but also very limiting in terms of sample size. So again, the focus here is on large-scale trends, not edge cases. 3. Regarding precision — I totally hear you. It's not meant to be a flawless measure. Precision reflects how close a move is to engine-recommended play, and that's not always the same as *practical* strength, especially in sharp positions. But when we look at hundreds of thousands of games, precision becomes a meaningful proxy: it's not about saying how any *one* player plays, but how whole rating groups tend to behave. And the patterns are surprisingly consistent. 4. On distribution: I get the point. But when you’re comparing average strength between platforms, it’s really the *mean* that matters — since ratings are calibrated to reflect expected performance over time. Knowing the spread could definitely add nuance, though, and it’s something I might explore further. That said, the main takeaway — that certain rating bands align across sites — still stands. Thanks again for all the thought you’ve put into this. — I’m always happy to continue the discussion!

Wow! Great stuff, @lucb3 !

Particularly grateful for the easy to use conversion code. I will integrate it somewhere in LiChess Tools for sure.

However, from what I understand from your posts, what you did was:

  • calculate average precision for moves on a bunch of accounts
  • create a statistical function that estimates the rating for both chess com and Lichess based on the precision
  • used precision as an intermediate value for conversion between Lichess and chess com.

I find it a LOT more useful to convert from rating to precision and vice versa. It means you can show a player "You played with 86% precision, equivalent to a 2300 rating" or "you have an average rating of 1600, meaning your average move precision is around 73%".

In my honest opinion, the idea of converting anything Lichess into anything ChessCom is almost not useful at all.

But again, good job!

Wow! Great stuff, @lucb3 ! Particularly grateful for the easy to use conversion code. I will integrate it somewhere in LiChess Tools for sure. However, from what I understand from your posts, what you did was: - calculate average precision for moves on a bunch of accounts - create a statistical function that estimates the rating for both chess com and Lichess based on the precision - used precision as an intermediate value for conversion between Lichess and chess com. I find it a LOT more useful to convert from rating to precision and vice versa. It means you can show a player "You played with 86% precision, equivalent to a 2300 rating" or "you have an average rating of 1600, meaning your average move precision is around 73%". In my honest opinion, the idea of converting anything Lichess into anything ChessCom is almost not useful at all. But again, good job!

Thanks a lot for taking the time to explore the potential @TotalNoob69!

Personally, almost everyone around me only ever talks about their Chess.com rating, while I only play on Lichess — that was the #1 reason I wanted to work on this converter in the first place.

If you're interested, I can ping you when I’ve done the same thing for Blitz and Bullet!

Regarding your idea of showing players their estimated rating or precision — why not, it’s definitely “better than nothing.” But as @mkubecek pointed out, there's a lot of variance, so it doesn’t mean much on an individual level. That said, Chess.com does something similar in their game review with the “you played like a [rating]” label, and people clearly find value in it even if it's not accurate at all.

One important caveat though: these stats only have real value when comparing two players of roughly the same rating. If there’s more than a ~50 Elo difference, the precision-based comparison becomes basically meaningless.

If you're really interested, I could share an equation to convert your precision into an estimated Elo.

Thanks a lot for taking the time to explore the potential @TotalNoob69! Personally, almost everyone around me only ever talks about their Chess.com rating, while I only play on Lichess — that was the #1 reason I wanted to work on this converter in the first place. If you're interested, I can ping you when I’ve done the same thing for Blitz and Bullet! Regarding your idea of showing players their estimated rating or precision — why not, it’s definitely “better than nothing.” But as @mkubecek pointed out, there's a lot of variance, so it doesn’t mean much on an individual level. That said, Chess.com does something similar in their game review with the “you played like a [rating]” label, and people clearly find value in it even if it's not accurate at all. One important caveat though: these stats only have real value when comparing two players of roughly the same rating. If there’s more than a ~50 Elo difference, the precision-based comparison becomes basically meaningless. If you're really interested, I could share an equation to convert your precision into an estimated Elo.

Maybe we could define two separate functions: one for "similar Elo" matchups and another for "large Elo gap" cases.

We could also distinguish between the winner and the loser in each game to get slightly more accurate precision estimates (though the overall variance would still remain high, of course).

Would be fun to dig into that further!

Maybe we could define two separate functions: one for "similar Elo" matchups and another for "large Elo gap" cases. We could also distinguish between the winner and the loser in each game to get slightly more accurate precision estimates (though the overall variance would still remain high, of course). Would be fun to dig into that further!

@lucb3 I assume you mean players "play with their food" when facing significantly lower rated opponents. So those games should not count.

@lucb3 I assume you mean players "play with their food" when facing significantly lower rated opponents. So those games should not count.

@lucb3 said in #5:

If you're really interested, I could share an equation to convert your precision into an estimated Elo.
I am very interested.

@lucb3 said in #5: > If you're really interested, I could share an equation to convert your precision into an estimated Elo. I am very interested.

@TotalNoob69 said in #7:

@lucb3 I assume you mean players "play with their food" when facing significantly lower rated opponents. So those games should not count.

And in that case, when someone plays against a lower or higher rated opponent, regardless of why, they play as if the calculated rating. I see no conflict.

@TotalNoob69 said in #7: > @lucb3 I assume you mean players "play with their food" when facing significantly lower rated opponents. So those games should not count. And in that case, when someone plays against a lower or higher rated opponent, regardless of why, they play as if the calculated rating. I see no conflict.

@lucb3 said in #6:

Maybe we could define two separate functions: one for "similar Elo" matchups and another for "large Elo gap" cases.
Or maybe rather investigate the accuracy depending on the combination of both ratings (i.e. the output would be a function of two variables rather than one).

IMHO the biggest problem with larger rating differences would be that in those games the lower rated player often makes an early mistake so that the evaluation jumps away from zero and from that moment on, the opponent's accuracy stays very high even if he/she plays not nearly as precisely as he/she would need in an equal game. This would make the conversion from accuracy to (estimated) rating quite unstable and unreliable.

@lucb3 said in #6: > Maybe we could define two separate functions: one for "similar Elo" matchups and another for "large Elo gap" cases. Or maybe rather investigate the accuracy depending on the combination of both ratings (i.e. the output would be a function of two variables rather than one). IMHO the biggest problem with larger rating differences would be that in those games the lower rated player often makes an early mistake so that the evaluation jumps away from zero and from that moment on, the opponent's accuracy stays very high even if he/she plays not nearly as precisely as he/she would need in an equal game. This would make the conversion from accuracy to (estimated) rating quite unstable and unreliable.