- Blind mode tutorial
lichess.org
Donate

How to estimate your FIDE rating (conversion formula inside)

What is a "strong correlation" ?
+/- 100 points ?
+/- 200 points ?
The OP downloaded every game played here. Claims to have used a weeks comp time to search all 300,000 members profiles to have found 3000 members who volunteered a FIDE rating. Used this data to propose a formula to predict a new players FIDE OTB rating based on their online rating.

What is the point of the exercise if not to predict an reasonably close approximation? He then goes on to qualify, by stating the result given by the formula has a wide range for error.

This is all quite entertaining. Hope you're all enjoying the discussion.

Oh... the formula .38 x (n) + .48 x (n+1) + 187 (a constant) = A
as a mathematical principle depicting a correlation between n and A
would be laughed out of any "statistics" class, except the one given by the OP of course.

What is a "strong correlation" ? +/- 100 points ? +/- 200 points ? The OP downloaded every game played here. Claims to have used a weeks comp time to search all 300,000 members profiles to have found 3000 members who volunteered a FIDE rating. Used this data to propose a formula to predict a new players FIDE OTB rating based on their online rating. What is the point of the exercise if not to predict an reasonably close approximation? He then goes on to qualify, by stating the result given by the formula has a wide range for error. This is all quite entertaining. Hope you're all enjoying the discussion. Oh... the formula .38 x (n) + .48 x (n+1) + 187 (a constant) = A as a mathematical principle depicting a correlation between n and A would be laughed out of any "statistics" class, except the one given by the OP of course.

An example of a statistically flawed formula:

This was found today at CC. It shows how a "premise" based on given facts, can give a false result.

"On a side note, one member wrote to Chess.com and called FIDE's system of using 12 rating lists to calculate an average rating "statistically flawed" if it is intended to create a measure of each player’s performance over the whole of 2017. The reason is that it gives results at the start of the year a much greater weight than results at the end of the year.

"[This] analysis is mostly correct, and one reason why I always calculate the average of 12 lists, with the not yet published lists using the live ratings from 2700chess as the best estimate," Bennedik commented to Chess.com by email. "For example, let's say player A has 2700 in January, and 2800 in February, and player B has 2800 in January and 2700 in February. If you only take the average of the first two lists they are equal. Where as if you take 12 lists, player A is a clear favorite."

So, using the average of 12 rating lists is not measuring the "best performance of the year" but to measure that is not easy. Bennedik: "Personally I think FIDE just intended to create an interesting criteria based on ratings, and believe that they succeeded with that."

This is about who qualifies for the Candidates with the highest rating.

An example of a statistically flawed formula: This was found today at CC. It shows how a "premise" based on given facts, can give a false result. "On a side note, one member wrote to Chess.com and called FIDE's system of using 12 rating lists to calculate an average rating "statistically flawed" if it is intended to create a measure of each player’s performance over the whole of 2017. The reason is that it gives results at the start of the year a much greater weight than results at the end of the year. "[This] analysis is mostly correct, and one reason why I always calculate the average of 12 lists, with the not yet published lists using the live ratings from 2700chess as the best estimate," Bennedik commented to Chess.com by email. "For example, let's say player A has 2700 in January, and 2800 in February, and player B has 2800 in January and 2700 in February. If you only take the average of the first two lists they are equal. Where as if you take 12 lists, player A is a clear favorite." So, using the average of 12 rating lists is not measuring the "best performance of the year" but to measure that is not easy. Bennedik: "Personally I think FIDE just intended to create an interesting criteria based on ratings, and believe that they succeeded with that." This is about who qualifies for the Candidates with the highest rating.

I'll appreciate the approximation formula and I will use and spread it but not discuss it for the time being.

PS: There's a famous old joke in Germany, it goes something like this, hope I won't spoil it too much:
A guy is driving along the highway and there's a broadcast on the radio: "Be careful! There's one ghost driver (wrong-way driver) underway!". Then this guy disgusted: "One? There are hundreds!"

I'll appreciate the approximation formula and I will use and spread it but not discuss it for the time being. PS: There's a famous old joke in Germany, it goes something like this, hope I won't spoil it too much: A guy is driving along the highway and there's a broadcast on the radio: "Be careful! There's one ghost driver (wrong-way driver) underway!". Then this guy disgusted: "One? There are hundreds!"

So all this time you aren't arguing with the math, you are arguing with the reliability of self reported data points. Why are you wasting our time moving goal posts and/or lacking clear communication of your concerns? Also the constant value was adjusted to 163.29, not 187, as seen several posts into this thread. Just to be clear on that.

So we all agree the math is sound, it's the actual data that you reject. That's fine, this exercise wasn't meant for mathematical journals or anything like that, it was a hobby project geared toward a community. If we want to really get into accuracy we could aim for controls like reliability and precision and verification, requesting full names of players, ensuring all data points used had hundreds of games (and dozens of recent games) of blitz/classical on Lichess and FIDE ect to get a more accurate data set. Do you want to pay for that time consuming project? I certainly don't.

The formula here given now does give a close approximation for the known samples I have access to, which isn't surprising as they are most likely used in the OP data. Its actually within reason given all facts in this case and the math part is legitimate. It certainly seems more reliable than some other online indicators people use. You can reject this based on questioning the data set, but I'd challenge you to prove the data used wildly wrong based on actual data, not outcomes and personal hypothesis as you are now. There is no proof given yet that under 1500 rated lichess users should have lower FIDE ratings than their lichess ratings, that's a hypothesis you have but you'd have to gather data to prove it first. There have been other projects on various sites like this however and the results suggested FIDE ratings were higher at sub 1500 online levels because those players tended to be underrated due to faster improvement than their OTB ratings could keep up with. It's not until higher levels that the active player tends to reverse that. But I've not spent significant time on this outside of reading others findings and personal antidote information on the USCF/ online circuit. I'd be interested to see a more in-depth analysis but understand nobody qualified probably has the time to take on such a thank less project that'll get accosted ultimately by others as inaccurate/unreliable. All said, you're welcome to attempt a more thorough data subset to prove the OP wrong, but the data used supports his formula given. Use it or not as seen fit.

-Jordan

So all this time you aren't arguing with the math, you are arguing with the reliability of self reported data points. Why are you wasting our time moving goal posts and/or lacking clear communication of your concerns? Also the constant value was adjusted to 163.29, not 187, as seen several posts into this thread. Just to be clear on that. So we all agree the math is sound, it's the actual data that you reject. That's fine, this exercise wasn't meant for mathematical journals or anything like that, it was a hobby project geared toward a community. If we want to really get into accuracy we could aim for controls like reliability and precision and verification, requesting full names of players, ensuring all data points used had hundreds of games (and dozens of recent games) of blitz/classical on Lichess and FIDE ect to get a more accurate data set. Do you want to pay for that time consuming project? I certainly don't. The formula here given now does give a close approximation for the known samples I have access to, which isn't surprising as they are most likely used in the OP data. Its actually within reason given all facts in this case and the math part is legitimate. It certainly seems more reliable than some other online indicators people use. You can reject this based on questioning the data set, but I'd challenge you to prove the data used wildly wrong based on actual data, not outcomes and personal hypothesis as you are now. There is no proof given yet that under 1500 rated lichess users should have lower FIDE ratings than their lichess ratings, that's a hypothesis you have but you'd have to gather data to prove it first. There have been other projects on various sites like this however and the results suggested FIDE ratings were higher at sub 1500 online levels because those players tended to be underrated due to faster improvement than their OTB ratings could keep up with. It's not until higher levels that the active player tends to reverse that. But I've not spent significant time on this outside of reading others findings and personal antidote information on the USCF/ online circuit. I'd be interested to see a more in-depth analysis but understand nobody qualified probably has the time to take on such a thank less project that'll get accosted ultimately by others as inaccurate/unreliable. All said, you're welcome to attempt a more thorough data subset to prove the OP wrong, but the data used supports his formula given. Use it or not as seen fit. -Jordan

@Sarg0n

I really like this one, actually. Had a good chuckle.

@Sarg0n I really like this one, actually. Had a good chuckle.

Math is always "sound".
Adding, subtracting, dividing, multiplying variables and constants always gives the same, repeatable result. It is sound in that respect.
What is not sound, is when (insufficient or unreliable data) is used as entry points. What is not sound is making an assumed premise, (that there exists a strong correlation between an online blitz rating and an OTB 90 minute rating that has yet to be established) and "creating" a formula that satisfies the criteria.
The math is sound, just the wrong math.

Math is always "sound". Adding, subtracting, dividing, multiplying variables and constants always gives the same, repeatable result. It is sound in that respect. What is not sound, is when (insufficient or unreliable data) is used as entry points. What is not sound is making an assumed premise, (that there exists a strong correlation between an online blitz rating and an OTB 90 minute rating that has yet to be established) and "creating" a formula that satisfies the criteria. The math is sound, just the wrong math.

@mdinnerspace

Great! It sounds like you agree that the mathematical/statistical method itself is fine, including the use of the constant/intercept. That's progress! Our positions are getting closer to each other.

Now, your only argument against my approach is that the input data is unreliable. Fine! I kind of agree with that too, as I've stated clearly in the original post, and in subsequent ones. Not every data point is accurate, and some people clearly are making up stuff! (I've taken many steps to deal with the most extreme cases this, but let's move on.)

So the only remaining disagreement is whether there is in fact a "strong empirical correlation between blitz rating and an OTB 90 minute rating".

You're right! That's a very important question. Let's look at this graph again:

https://imgur.com/a/nWy4x

Reminder: This is actual data from Lichess profiles. On the x-axis, you have observed Lichess ratings, and on the y-axis you have self-reported Fide ratings.

Obviously, there is a super strong, positive, and linear (!) relationship between online ratings and Fide self-reports. Again, some people make things up, but the bulk of the relationship is clearly strong, positive, and linear.

My question to you: If there is no relationship at all between OTB and online skills, why does the cloud look like that? Are you accusing those 2700+ people of lying (in coordinated fashion) about their Fide ratings?

Please don't just express undirected skepticism. I'd like you to offer a specific causal theory that explains the shape of the observed data cloud.

In my mind, the most straightforward explanation for the shape of that scatter plot is this: "OTB and online skills are strongly (if imperfectly) correlated."

What is your alternative theory? Be specific!

I feel like we're getting close to agreement here!

@mdinnerspace Great! It sounds like you agree that the mathematical/statistical method itself is fine, including the use of the constant/intercept. That's progress! Our positions are getting closer to each other. Now, your only argument against my approach is that the input data is unreliable. Fine! I kind of agree with that too, as I've stated clearly in the original post, and in subsequent ones. Not every data point is accurate, and some people clearly are making up stuff! (I've taken many steps to deal with the most extreme cases this, but let's move on.) So the only remaining disagreement is whether there is in fact a "strong empirical correlation between blitz rating and an OTB 90 minute rating". You're right! That's a very important question. Let's look at this graph again: https://imgur.com/a/nWy4x Reminder: This is actual data from Lichess profiles. On the x-axis, you have observed Lichess ratings, and on the y-axis you have self-reported Fide ratings. Obviously, there is a super strong, positive, and linear (!) relationship between online ratings and Fide self-reports. Again, some people make things up, but the bulk of the relationship is clearly strong, positive, and linear. My question to you: If there is no relationship at all between OTB and online skills, why does the cloud look like that? Are you accusing those 2700+ people of lying (in coordinated fashion) about their Fide ratings? Please don't just express undirected skepticism. I'd like you to offer a specific causal theory that explains the shape of the observed data cloud. In my mind, the most straightforward explanation for the shape of that scatter plot is this: "OTB and online skills are strongly (if imperfectly) correlated." What is your alternative theory? Be specific! I feel like we're getting close to agreement here!

This is obviously stupid - Look @Sarg0n his FIDE rating is 2154 and his blitz is 1500?. So then his rating would be 0.38x1500 + 0.48x2182 +187 which equals 1804 (rounded): 350 points off.

This is obviously stupid - Look @Sarg0n his FIDE rating is 2154 and his blitz is 1500?. So then his rating would be 0.38x1500 + 0.48x2182 +187 which equals 1804 (rounded): 350 points off.

@howchessYT

As I clearly stated in the original, the formula was calculated using data from people with at least 50 blitz and at least 50 classical games played. Obviously, it doesn't work with provisional ratings.

This is not "obviously stupid"

@howchessYT As I clearly stated in the original, the formula was calculated using data from people with at least 50 blitz and at least 50 classical games played. Obviously, it doesn't work with provisional ratings. This is not "obviously stupid"

Some guys even overtake a ghost-driver...

Actually I fudged my Elo and now I've been caught out! ;-)

Some guys even overtake a ghost-driver... Actually I fudged my Elo and now I've been caught out! ;-)

This topic has been archived and can no longer be replied to.