- Blind mode tutorial
lichess.org
Donate

How to estimate your FIDE rating (conversion formula inside)

@dudeski_robinson #19 thank you for your answer.

I am not so sure any more if different things are measured. Isnt it like this:

Lets say elometer.net would have made the people from the dutch tournament drive bycicle instead of solving chess puzzles. (i have quoted here what they did: https://lichess.org/forum/general-chess-discussion/testing-a-players-strength#2). Then the ELO ratings get aligned to the results how fast they can drive and later the people doing the online test must also drive bycicle and voila, we can calculate their ratings correctly... You may say, thats not valid and i agree.

But isnt the same true, at least a bit, for solving chess puzzles? Solving a chess puzzle is a different thing than playing a game against a human opponent. There are factors like adrenaline etc. We all know the case of the error we make and after the game or (worse) immediately after the move we see the better move and ask, why didnt i see that obvious move?

These factors, like the impression the opponent makes on us, etc, are not tested when solving chess puzzles. So your method is more precise in that aspect as you compare the same thing, while the elometer.net test is less precise.

Regarding the second point, yes, indeed, with the huge number and the filtering steps you did and the fact that ratings can as well be too high or too low, it may have no big effect.

Actually your result is the lower barrier of the confidence interval of the elometer test, at least for me.

@dudeski_robinson #19 thank you for your answer. I am not so sure any more if different things are measured. Isnt it like this: Lets say elometer.net would have made the people from the dutch tournament drive bycicle instead of solving chess puzzles. (i have quoted here what they did: https://lichess.org/forum/general-chess-discussion/testing-a-players-strength#2). Then the ELO ratings get aligned to the results how fast they can drive and later the people doing the online test must also drive bycicle and voila, we can calculate their ratings correctly... You may say, thats not valid and i agree. But isnt the same true, at least a bit, for solving chess puzzles? Solving a chess puzzle is a different thing than playing a game against a human opponent. There are factors like adrenaline etc. We all know the case of the error we make and after the game or (worse) immediately after the move we see the better move and ask, why didnt i see that obvious move? These factors, like the impression the opponent makes on us, etc, are not tested when solving chess puzzles. So your method is more precise in that aspect as you compare the same thing, while the elometer.net test is less precise. Regarding the second point, yes, indeed, with the huge number and the filtering steps you did and the fact that ratings can as well be too high or too low, it may have no big effect. Actually your result is the lower barrier of the confidence interval of the elometer test, at least for me.

My rating is comes around 1701 but real elo is 1301 400 points gap, either I am underrated as I dont play many fide tourneys(5 till now) or I am overrated here

My rating is comes around 1701 but real elo is 1301 400 points gap, either I am underrated as I dont play many fide tourneys(5 till now) or I am overrated here

@impruuve

Not quite. Assuming that their maximum likelihood estimator is unbiased (mine is), the two models should target the same output, since they are both estimated using the same dependent variable. Changing the input variable should only affect the confidence we have in our point estimate.

If my formula undershoots true FIDE ratings, it means that Lichess user profiles systematically under-state their true FIDE rating.

If elometer overshoots true FIDE ratings, it means that elometer users systematically over-state their true FIDE rating.

I don't think there's any sense in which one corresponds to the "lower bound" of the other.

@impruuve Not quite. Assuming that their maximum likelihood estimator is unbiased (mine is), the two models should target the same output, since they are both estimated using the same dependent variable. Changing the input variable should only affect the confidence we have in our point estimate. If my formula undershoots true FIDE ratings, it means that Lichess user profiles systematically under-state their true FIDE rating. If elometer overshoots true FIDE ratings, it means that elometer users systematically over-state their true FIDE rating. I don't think there's any sense in which one corresponds to the "lower bound" of the other.

@dudeski_robinson Please comment my example from #21 in plain english and why you think it is misleading, if you think it is.

@dudeski_robinson Please comment my example from #21 in plain english and why you think it is misleading, if you think it is.

Think about it this way:

Puzzle skill = chess skill + randomness
Lichess rating = chess skill + randomness

The confidence/precision of your forecast is going to depend on the amount of noise/randomness in your measure. But, on average, you should get similar results with either measure. (Assuming noise is actually random.)

I don't understand your hiking example.

Think about it this way: Puzzle skill = chess skill + randomness Lichess rating = chess skill + randomness The confidence/precision of your forecast is going to depend on the amount of noise/randomness in your measure. But, on average, you should get similar results with either measure. (Assuming noise is actually random.) I don't understand your hiking example.

@Dawny

bingo. you need dozens and dozens of games for rating especially in elo system.

some people who post ratings have little games in fide or national ratings.

but also I believe these formulas work better for higher rated people (though I cant prove it).

@Dawny bingo. you need dozens and dozens of games for rating especially in elo system. some people who post ratings have little games in fide or national ratings. but also I believe these formulas work better for higher rated people (though I cant prove it).

@dudeski_robinson #25 i would guess that, compared to ELO, the rating gained by playing games on lichess has the least randomness, followed by the puzzle solving method (effects from practical games are missing), followed by the bycicle driving method, which has nothing at all to do with chess and is therefore just randomness in this context. I wonder why you think that solving puzzles gives comparable results like playing practical games (and why you dont understand the bycicle example). Guess we misunderstand each other. Thats life.

@dudeski_robinson #25 i would guess that, compared to ELO, the rating gained by playing games on lichess has the least randomness, followed by the puzzle solving method (effects from practical games are missing), followed by the bycicle driving method, which has nothing at all to do with chess and is therefore just randomness in this context. I wonder why you think that solving puzzles gives comparable results like playing practical games (and why you dont understand the bycicle example). Guess we misunderstand each other. Thats life.

@impruuve

Obviously, the bicycle method will be (almost) pure randomness in this context, so the model built on that basis will be useless.

My point was that you can have two models with different inputs that produce the same results ON AVERAGE. The model with the more "noisy" predictor, will make more mistakes in individual cases, but if you average out over all predictions, it will produce the same average as the other (more precise) model.

You need to distinguish between two features of the competing models: correctness (being right on average) and precision (making small/large mistakes in individual cases).

Statisticians call the first property "unbiasedness". If the two models are unbiased (which they probably are), then they should produce the same results ON AVERAGE over all possible predictions. Because of that, it makes no sense to think of one set of predictions as a lower bound for the other set of predictions.

As I explained above, there are some features of the data that could make the results diverge (e.g., if Lichess or Elometer systematically over/under-report their FIDE ratings). But the simple fact that the two models use different inputs does not imply that the models will produce different results (on average), since both models use the same dependent/outcome variable.

(Of course, if Elometer users systematically over-report their FIDE ratings -- or if elometer people don't make adjustment for fake ratings and outliers like I do -- then the dependent/outcome variable is no longer the same. Again, this might explain the disparity.)

@impruuve Obviously, the bicycle method will be (almost) pure randomness in this context, so the model built on that basis will be useless. My point was that you can have two models with different inputs that produce the same results ON AVERAGE. The model with the more "noisy" predictor, will make more mistakes in individual cases, but if you average out over all predictions, it will produce the same average as the other (more precise) model. You need to distinguish between two features of the competing models: correctness (being right on average) and precision (making small/large mistakes in individual cases). Statisticians call the first property "unbiasedness". If the two models are unbiased (which they probably are), then they should produce the same results ON AVERAGE over all possible predictions. Because of that, it makes no sense to think of one set of predictions as a lower bound for the other set of predictions. As I explained above, there are some features of the data that could make the results diverge (e.g., if Lichess or Elometer systematically over/under-report their FIDE ratings). But the simple fact that the two models use different inputs does not imply that the models will produce different results (on average), since both models use the same dependent/outcome variable. (Of course, if Elometer users systematically over-report their FIDE ratings -- or if elometer people don't make adjustment for fake ratings and outliers like I do -- then the dependent/outcome variable is no longer the same. Again, this might explain the disparity.)

So time trouble and other aspects of the game are noise and become irrelevant on average if there are enough participants, but which can have an effect in the individual case. OK.

So time trouble and other aspects of the game are noise and become irrelevant on average if there are enough participants, but which can have an effect in the individual case. OK.

Thanks!

Though I still think it's a bit overrated, since I've got 1812 points from the formula, but I don't think I can play for more than 1700 in real FIDE ratings.

Thanks! Though I still think it's a bit overrated, since I've got 1812 points from the formula, but I don't think I can play for more than 1700 in real FIDE ratings.

This topic has been archived and can no longer be replied to.