How to estimate your FIDE rating (conversion formula inside) • page 35/36 • General Chess Discussion • lichess.org

@dudeski_robinson They're exactly the same. I'm a National Master so I know these things. Nice try at attempting to belittle my achievements in chess though.

Karpovnik

#342

@dudeski_robinson Another thing, if you are implying a FIDE rating that is equal to a USCF rating is more impressive, then in this case, the fact that this formula put me three FIDE points above my current USCF rating would be impressive.

dudeski_robinson

#343

@Karpovnik

Relax.

What I meant is that if Fide and USCF are not the same, the fact that a Fide-targeted formula correctly predicts your USCF rating might not be super impressive. Thus, I was belittling my own achievement (since I came up with the formula).

In any case, I don't know much about the relationship between Fide and USCF ratings, but there are several conversations on the web which suggest that the two systems might not correspond exactly. For example:

www.quora.com/Why-is-the-USCF-system-inflated-when-compared-to-the-FIDE-system

Karpovnik

#344

Oh, you came up with that formula? You're smart.

MyPunIsTwoWeek edited

#345

You are really passionate about stats haha. Thanks for the effort, this is really cool!

MyPunIsTwoWeek

#346

Does this formula still apply after the changes that introduced "Rapid" category?

Karpovnik

#347

Just wait, one day I'll go to college and come up with my own formula. Yeah, my own formula. It will calculate how unlikely you and other girls are from entering my tree house.

JoshuaR

#348

My formula just came up with "Very Unlikely" based on your responses in this forum.

Karpovnik

#349

@JoshuaR Nice rating.

krasnaya

#350

To be honest i have choked on this post for a long time. On one hand i wanted to explain to the mathematical not-so-savvy why the formula is like it is (a thing that dudeski_robinson seems to have supposed to be self-evident when he mentioned "linear regression" in the first post detailing his laudable effort), but on the other hand i didn't want to attract any troll calling the usage of a constant a "mathematical problem". But anyway, the gain perhaps outweighs the trouble, so there:

Linear Regression: What it does

Suppose some set of data: a table with persons and their height, for instance. We can compute the average height by simply summing up and dividing by the number of persons (measurements) but this only gives us one reference point. We would like to get an idea of how the attribute "height" is distributed within the population. That is: instead of saying "the average height is 1.78 m" we would like to tell someone with, say, 1.95m which percentage of the population is bigger/smaller than s/he is. For this we would need a curve showing the connection between height and the number of people reaching this height.

Linear regression now determines such a curve, but - hence the "linear" - a special sort of curve: a line.

How Linear Regression works. What is a "line"?

As we may all remember from our days in grammar school a line (in a cartesian plane - think "sheet of paper") can be described by the formula (also called the "slope-intercept form")

y = kx + d

d will be a constant (the "y-intercept")
k will be a constant (the "slope")
x and y are any pair of numbers solving the above equation

As we create pairs of numbers x and y solving the equation we will notice that if we plot the points (x,y) on a grid they all will lay on one line. On the other hand as we draw this line we will generate an infinite number of points representing coordinate pairs x/y which will in turn solve the equation. In fact the line we drew and the equation are, in some way, the same.

It is interesting now that - because the pairs of x's and corresponding y's are many - what really defines the line is the "k" (the "slope") and the "d" (the "y-intersect"). It is clear that without the d (or the d being set to 0) all lines would go through the point (0,0) because for whatever the value of k is, the equation:

0 = k*0

for x=y=0 would be correct. This is why the d is necessary to create lines going anywhere through the y-axis, not just at the point (0,0).

The factor "k" is determining the "steepness" of the line. Set d=0 and you see immediately that k is the factor by which y increases as x increases by 1. The higher the value the steeper the line is.

Putting everything together.

What linear regression (i will restrict this to the simplest form of a two-dimensional problem like the population/height relationship. There are more complicated multi-dimensional extensions to this but the essence of the solution remains the same.) does now is to come up with a line - in fact a formula for a line of the form y = kx + d - which best fits the initial set of data.

What does "best fits" mean? It means that the sum of the distances of each really measured point to the nearest point on the line becomes minimal. This is done in the following way:

We have all our data points in a simple table. Let us call the rows "x" and "y" and number the lines. The line number will be represented by the index "i", so we have:

x(i) | y(i)
=======
x(1)| y(1)
x(2)| y(2)
x(3)| y(3)
... etc.

First we need to calculate the averages of all the x's and y's. We will call them X and Y (capitalised).

The slope k of our line is then determined by the following formula:

k = sum( (x(i)-X) (y(i)-Y) ) / sum( x(i)-X )²

For every line i of our table we subtract the average X from the x(i) and multiply that with y(i) from which we subtracted the average Y. The results of all these multiplications are then summed up and divided by the sqared sum of all x's, from each subtracted the average X.

The formula for d is far easier but we need the computed k to solve for it:

d = Y - kX

d equals the average Y minus the k times the average X.

Now that we have computed k and d we have everything we need to construct the line y = kx + d

Notice: if you don't like the rather chaotic formula for k there is a much easier formula for k:

k = r(x,y) * s(y)/s(x)

r(x,y) is the so-called "Pearson-Correlation Coefficient" (PCC) between x and y and s(x), s(y) is the variance for x and y. Since these have to be calculated beforehand the application of this seemingly easier formula might pose bigger computational hurdles in reality.

krasnaya