lichess.org
Donate

Why do we now get fewer rating points for winning?

Today: +8 rating increase for winning against an opponent with a rating 60 more than me.

5 days ago: +11 rating increase for winning against an opponent with a rating 29 more than me.

(Both opponents have played several thousand games; these are just two examples from at least 40 games over the last five days, all of which followed an apparently new rating system).

For as long as I've been here (4+ years) you received +10 for beating an approximately equally rated opponent; suddenly the system's changed. Fine by me, since it applies to everyone; but what's the reason? Was rating inflation starting to creep in?
Today: +12 rating increase for winning against an oponnent with more than 250 points than me, --> lichess.org/nffUl2cy, and I have other examples for this week.

It looks that it's something related to rating deviation, it went down to 50 without a reason. In my case, for both bullet and blitz.
The RD floor was reduced from 60 to 50, to stabilize ratings of frequent players.

We're using a rating system which has many similarities to Glicko-2: http://www.glicko.net/glicko/glicko2.pdf

If you read Glickman's paper, it doesn't have an RD floor; theoretically, ratings provide the greatest predictive accuracy if the RD floor is zero. Which immediately raises the question: what if I improve at the game?

The key difference between Glicko and Glicko-2 is a volatility term to keep track of (within a rating period) whether a player's level of performance is consistent (and if so, RD decreases slower). So if you're an improving player, your rating volatility should be high, causing RD to slowly decrease; conversely, a player who isn't improving or worsening will have a low volatility and a low RD. All that said, RD slowly increases over time to account for uncertainty due to player inactivity.

Ratings should have predictive accuracy. Having random ratings doesn't make sense. (As I again reread http://www.glicko.net/glicko/glicko2.pdf I think the RD >= 30 in Glicko-1 is unnecessary in Glicko-2 because the volatility term already accounts for improving or declining players. However this raises a question about Glicko-based systems with undefined rating periods and whether volatility terms are stable.

Thanks to gcp for this analysis/simulation: github.com/ornicar/lila/pull/4034#issuecomment-433472996

Because the RD(phi) increase is applied after every game instead of every rating period, Glicko-2 has much higher RDs by default, causing the ratings to flobber about much more. Empirically, on lichess the majority of players never go below RD=60, there's maybe a dozen or so in my entire dataset (mostly because they managed to get non-default lower volatility, which is also very rare).

After removing the RD >=60 limitation, or putting it to >= 30 (as recommended in the Glicko-1 paper), the prediction performance of this pull (Glicko-2 + sigma scaling over time, aka Lickgo-2) beats Glicko-1 and it's a strong improvement over the current Glicko-2 ratings.

Limiting the RD to 30 or not limiting at all seems to make very little difference [in terms of predictive accuracy]. In general it will grow quickly unless the player is playing a ton of games, in which case their rating is constantly adjusted anyway. So the limit probably just isn't necessary at all.

Glicko 1 prediction rate 56.591%, MSE=0.2250
Glicko 1 (no RD cap) prediction rate 56.618%, MSE=0.2250
Ligcko 2 prediction rate 55.483%, MSE=0.2269
Lickgo-2 (no RD cap) prediction rate 56.729%, MSE=0.2248
lichess prediction rate 55.122%

So, if the minimum RD gets lowered to 30, or removed entirely, it would clearly improve the accuracy of lichess ratings.

This topic has been archived and can no longer be replied to.