lichess.org
Donate

I think it would be nice to increas rating deviation

@Sybotes Thank you for the syntax feedback. You are right (i should watch of closing them), and the punctuation as well. I will keep an editing eye for that. Part of the problem, is in editing i might make it worse (for other reasons). But yes. I have noticed a punctuation problem as well but was unsure of its effect (commas, might be better placed given that i adventure myself in composite sentences). i will work on that.

I think the RD is indeed a term or is the volatility in the glicko model. But that is not the increment/decrement.
The wikipedia article about it actually does a better job at separating the concepts because it keeps parameters at symbolic levels; what computational implementations in source code, or as explaining implementation like the glicko2 pdf often linked, does NOT.

One can see there that, yes, the gain (neg or pos) will depend on the RD, as a variable, but also that there are parameters there can could be adjusted independently of the RD own formula parameters (that you told me are being declined per time control).

The above statement was a test of my punctuation better attention. I am not an English native, either. My writing style has a hard time making non-composite sentences, that would be more effective in explaining complex things. This would mean i should let my ugly posts some time out of sight, and come back a refresh reader. And do a lot of chopping. I might try that too.

Thanks for answering about volatility taken care of. And again for the writing feedback.. both appreciated and well understood for their intent.
@Sybotes said in #10:
> The rating deviation is a measure for volatility, as I understand it. Let OP wait a couple of weeks without, then he will probably indeed get 10 rating points for a win against an equal rated opponent. And the deviation is a separate value for blitz, bullet, classical etc.
Are you saying that, in order to solve my problem, I should only play with this account once evey couple of weeks? Is it not implicitely advocating for multiple accounts, which is against Lichess policy?
I wasn't specifically thinking of adjusting the rating volatility (hope that is the right term) to the time control, but why not.

The basic postulate is that over a large number of games, fluctuation would average by the games results themselves, and hence the rating would be an accurate representation of the "true strength" of the player.

However this postulate only works if the rating deviation is not too small.

Indeed, the "true strenght" of a player varies with time, and not only due to fluctuations : players tend to improve their game quality the more they play.

So, in order for the rating to accurately reflect the "true strenght", the rating increment needs to keep up the pace with the strengh improvement of the player.

Consider for instance a player whose "real strenght" in classical time control steadily increase by 10 points a month. Assume that this player only plays one rated classical game per month (this situation is not unrealistic: maybe they do a lot of puzzles and studies, maybe they play more rapid games and the improvement of their rapid strengh also impact their classical strength positively...). If they only gain 7 points per game they win, then, as long as they keep improving, their rating will NEVER be able to keep up with their real strength. Worse than that: the disparity between their rating and their "true strength" will INCREASE the more game they play.
Indeed, my past words did simplify the measure to assuming a constant true strength**.

The glicko model does not assume that at all, and does not need to. I think it is more about being really adapted to the pool dynamics, and a careful adaptation of the confidence in uncertainty that one own schedule of game events sampling of the pool may have. So the less often one gets to sample the pool by entering a pairing event, the more likely its past rating measure would lose confidence (the expectation of it, more about uncertainty or belief than frequency average btw).

You raise a good question though, as to the time scale resolution of this measure system in terms of progression detection.

It is far from clear to me, what answer might be given. And I would like to know how you can be so sure about the following:
> However this postulate only works if the rating deviation is not too small.

> in order for the rating to accurately reflect the "true strenght",
> the rating increment needs to keep up the pace with the strengh improvement of the player.

You may be right, but I don't follow yet. The problem is not about true progression, which your last paragraph is about. I accept that. The problem is about how to infer anything about the Rating estimate by lichess (glicko2) being or not an "accurate" and "precise" measure of true progression.

Is that a hypothesis, or a certitude. I would assume an intuition. can we expand on that. Perhaps we could consider the progression as an external (non-autonomous forcing) time dependent input in the measure system.

There is the pool dynamics, and there is the individual player progression kinetics. I would assume that the pools are big enough for an individual player time evoluation of true strength (fluctuation and time average progression alike) not to affect the rest of the pool ensemble distribution evolution. Some kind of steady state of the population...

so now we are left with how the sampling trajectory by one individual of the pool, and the unrelated time dependency of that individual true strengh evoluation. I don't see yet a dynamic link between the timing of sampling and the timing of true strengh evolution.

I have just tried to narrow down my question about your claims to that last paragraph.. make my current fog a claim. but you can certainly convince me and inform me otherwise.. I wonder if using the wikipedia equation (we can't share the visuals here though, so on our own) might help not divergerging using words and their inherent wobble upon transmission from one head to the other.

Can we agree about the following objects (I am abusing ground truth to mean input we don't need to undertand or explain the internal dynamics mode, their are entrant things to our discussion):

true strength, rating measure, accuracy versus precision of a measure estimating a ground truth true strenght, a ground truth time evoluation of that strenght, a explicit dynamic model of the measure dependency of pool population own true strenght time evoluations and the player sampling that with own trajectory of paring sample events..

** True Strength (TM) = what we would like to measure, and pretty much all of us do have some common sense of some average move quality ability, no matter the word preference, but saying rating does not just cut it, now that we are more aware of its pool and game events outcome data dependency. So we accept that there is an estimator of that, a measure, and that is the rating in some context, like here glicko.
Another (possibly obvious) point in favor of not increasing the rating increment is that it's actually already higher than real OTB ratings when you compare the amount of games you play.

I consider myself an average OTB player with somewhere around 150 games played in tournaments valid for FIDE ELO during a period of 10+ years. In the same period I happened to play over 75000 games on lichess. Comparing those is not a matter of what's bigger or how much, it's a matter of orders of magnitude!

So if you complain that online ratings don't increase/decrease fast enough... well I shall say you should try OTB! I think I'm not alone in the feeling on how slow OTB ratings change (compared to online ratings) when in order to just play a tournament of 7-9 games you have to book a week in the year away from family and work.
@dboing the last paragraph of my post is a proof that
> However this postulate only works if the rating deviation is not too small.
Namely, the postulate
> over a large number of games, fluctuation would average by the games results themselves, and hence the rating would be an accurate representation of the "true strength" of the player.
should hold in every possible situation. In order to falsify it, it suffices to provide a counter-example. Which is exactly what I did.

@pepellou your argument is akin to "you can find worse elsewhere so we have no reason to improve here".
> I think I'm not alone in the feeling on how slow OTB ratings change
So you're explicitly saying that you understand my frustration, and implicitely that I shouldn't complain because you are in an even more frustrating situation?
now rethinking about what i just wrote, and putting myself in your shoes, i would have some counter-points or complementary points to what I just said. Even a possible error. i will just share that doubt, leave your the fun of the rest, for me to ponder (and others). An invitation for your abilities to go look into the model also.

I said that the glickos2 measure confidence was being controlled by volatility model (an hypothesis built-in about confidence evolution upond volatilty function dependent variables evolution, time being what it is).

But this is a 2 variable model, at event completion:
the update estimate rating
and
the volatility
both as functions of event outcome, previous estimate rating, and time since last event.

the wiki and computer imlementation speel out the computer step times.. volatility is computed, and then enter the Gain with will be added (i think) to previoius estimate, at each step i remember some extra symbolic parameters and algebraic ratios.

if time since last event of game pairing for you, is large, volatility increase with its prescribed growth for that duration input, and will actually enter the gain as dependent variable, as a growth factor (i.e. the gain is an increasing function of the volatility)

(please check yourself, i am error prone with orientations, i flip things when not in my face, like while typing here, labile memory of mine needing constant external visual support for continuity of thought).

The point of that bigger swing effect after long hiatus, might actually be about uncertainty of your own strenght since it may have evolved. it may not be about being out of touch with the pool.. not about the pool uncertainty, since we assume it at steady state.

voilà my bad, above about the ground truth put into the glicko2 model.. The big swings are a sort of binary sorting or search, allowing bigger oscillation so that some calibration of estimation can happen. I think it is then actually having an active mechanis to adapt to what you want to have happend. You assume growth as time passes, but it could also be weakening (the measure does not know that you went into puzzle frenzies on your non recorded time). I use analogies. because we are outside the math here, and into the modeling aspect of the model. being whistled. so this is WIP post... or you get what i mean.
@PxJ said in #16:
> @pepellou your argument is akin to "you can find worse elsewhere so we have no reason to improve here".

Quite the contrary. What I'm saying it's already bad enough (when compared to "real" OTB) so let's not make it even worse.
@PxJ said in #16:
> @dboing the last paragraph of my post is a proof that
>
> Namely, the postulate
>
> should hold in every possible situation. In order to falsify it, it suffices to provide a counter-example. Which is exactly what I did.
>
> @pepellou your argument is akin to "you can find worse elsewhere so we have no reason to improve here".

You have strong certitudes or a machine scale logic. you don't need to explicit your logic in smaller steps, you see far. but i don't.
could you break it down, using what I offered as building blocks. might be work. i am sorry.
or i give up discussing. i can read your claims they are clearer than my trying to get to you paragraphs. i would expect reciprocity over being right. explaining to each other. this is not chess. discussion requires cooperation.. not a place to get a rating..
we are talking about rating of competitions. we don't need to be uncooperative in our debates.. but mind frames often might spill over.

I did enough weeding of the issue out. your time to sweat a bit on the explaining side. I hope you will.

This topic has been archived and can no longer be replied to.