How to estimate your FIDE rating (conversion formula inside) • page 2/36 • General Chess Discussion • lichess.org

You cannot deduce from obvious outliers that a formula is wrong.
Concrete, you probably played hundred times more games online than offline so your otb is plain "wrong" resp. out-of-date.

impruuve

#12

My estimated FIDE rating differs from that estimated by elometer.net by around 150 points (2200 here, 2350 there).

This could have two reasons:

1) elometer.net does not include factors like adrenaline.
2) Quite a few of the given FIDE ratings may be outdated and therefore lower than they should be.

Still, interesting data.

legend

#13

all these formulas/websites which attempt to guess
give me high ratings which is just not realistic.

dudeski_robinson edited

#14

@legend

How do you know it's not realistic? Especially if several different prediction methods converge on a similar outcome, it sounds to me like you may have an unrealistic assessment of your own skills.

Think about it this way: If the formula gives a wrong prediction in your case, it basically means that compared to 3000 other Lichess players, you systematically over-perform online (even accounting for nerves, level of competition and other factors that every player faces).

Perhaps that's the case, but you'd be an exception.

legend

#15

I think I am the exception :)

Jacob531 edited

#16

Great estimation tool! I have heard that elometer tends to be a miss a bit on the high side, so yours seems to be either correct or very almost so. Getting out my calculator....

impruuve

#17

These two tools measure different things. This measures the playing strength and elometer.net measures the problem solving strength, which is probably a bit higher.

However, the input data for this method is not as reliable as that of elometer.net (assuming that one inputs correct data there.) because we dont know which given FIDE ratings are correct and if they are outdated.

I guess the correct rating is somewhere in the middle between those two.

mathtuition88

#18

Cool formula. Interesting that the "weight" associated to blitz seems more than that of classical.

But how about those who only play blitz / only play classical? Do you exclude them out of your data?

Very nice formula, it helps quite a bit to "benchmark" ourselves.

dudeski_robinson edited

#19

@impruuve

Thanks for your thoughtful post. I just want to clarify a few things.

My formula and elometer produce estimates of the exact same thing: FIDE ratings. In principle, the final number produced by Elometer and the final number produced by my formula should be interpreted in the exact same way.

Think about the three quantities involved:

1. Observed measure of performance
2. Benchmark FIDE rating
3. Predicted FIDE rating

Both Elometer and my formula compare some measure of #1 to a self-reported measure of #2, in order to make predictions about #3. The quantities #2 and #3 are the same in both models. What differs is only the type of input data in #1: I used observed game results whereas elometer uses puzzles results. So, again, the estimated of #3 that the two models produced should be interpreted in the same way.

You argue that elometers benchmark FIDE rating is of higher quality. In particular, you claim that:

(a) Lichess self-reported FIDE ratings may be incorrect
(b) Lichess self-reported FIDE ratings may be outdated

(a) is obviously true from the graph that I linked to above. Some users report a 3000 rating; that can’t be correct. But it’s important to realize that Elometer ALSO uses self-reported FIDE ratings as their benchmark (see element #2 above). So there is nothing that tells us that their data is more reliable than mine. Why would people be more truthful in their self-reports over there? In fact, the reasons elometer might be overshooting may be that several people’s self-reports are inflated. Unfortunately, elometer is not transparent about how they deal with outliers and fake self-reports. In this thread, I have discussed several ways to exclude outliers, and I can show that the results are not sensitive to “bad input data”.

(b) is a more tricky issue. Imagine that user X improves 50 FIDE points and 50 Lichess points, but forgets to update her user self-reported FIDE rating in her user profile. In that case, my formula will tend to *underestimate* the FIDE ratings of other players. In contrast, if user Y loses 50 FIDE points and 50 Lichess points, but forgets to update her profile, my formula will *overestimate* the FIDE ratings of other players. If there are similar numbers of players moving in both directions, they cancel each other out, and the formula remains accurate.

Moreover, it’s important to realize that baby Karjakin was a huge outlier. Some young people’s FIDE ratings change really quickly, but that’s definitely not the most common situation. Most people’s chess skills move extremely slowly over time.

Finally, as I explained above, I took several steps to remove outliers. If some kid improves 400 FIDE and Lichess points, but forgets to update her user profile, that data point is automatically flagged as an outlier and excluded from the analysis. Again, not a big deal at all.

In sum, Elometer and my formula measure the exact same concept, and I’m not 100% convinced that my input data is worse than theirs (especially since we don’t know how much data they have), or that it should matter in the estimation.

dudeski_robinson

#20

@mathtuition88 thanks

Yep. Essentially, I only consider Lichess ratings calculated on the basis of at least 50 games (in each category).