lichess.org
Donate

Accuracy is NOT an improvement over centipawn loss

Why?

because it's possible to have a (much) worse accuracy than your opponent and still win. I recently had a game with 60% accuracy (due to 1 blunder in a endgame time scramble) whereas my opponent had 75% accuracy since they had more smaller blunders.

I had ACPL of ~60, while my opponent had an ACPL of ~80 in that game.

How is that possible?

Because lichess uses a harmonic mean instead of an arithmetic mean when calculating Accuracy %. This is bad since it means accuracy isn't additive the way centipawn loss is.

What is the goal of accuracy % that is supposed to make it an improvement over ACPL?

Not much really, just that CPL seems like an unfamiliar scale (it shouldn't be for chess players) and apparently a 0-100% is better. It also uses W% rather than centipawns since the gap between 0 and 5 is much larger than the gap between 5 and 10.

How do we fix this?

1. Converting to W% is good, keep that
2. Using an arbitrary exponential curve to convert the Win Probability Lost to 0 to 1 scale is weird, but not inherently flawed since it's mostly linear anyways
3. The harmonic mean is the problem. Instead of averaging the move accuracies (0% to 100% scale) it averages the inverses (1 to infinity scale) and takes the inverse of that. A geometric (since the curve is exponential) or arithmetic mean would be better.

What makes a good score for a two player game?

In sports, the team that gives up fewer points wins the game.

Similarly, in chess (or any game for that matter) the player who gives up fewer centipawns (or Win probability on that scale) wins the game. Specifically, the player that wins the game will give up precisely 50% less in W% than their opponent over the course of the game.

Having a score is useful since it allows for the calculation of "Pythagorean W%" (see Wikipedia for more details).

In chess, one way to use this method to evaluate how much a better a player played than their opponent would be to look at the ratio between their CPLs (or Win Probability Lost).

When that ratio is close to 1, the game could have gone either way, in the same way a 25-24 match could've went either way.

When that ratio is high (say 3 to 1), it's safe to say 1 player was clearly better, just as a 25-8 match wasn't close.

Conclusion:

Thanks for listening to my lecture on the proper way to score a 2 player game. Hope the feedback was insightful and it can be implemented very easily.

Looking forward:
Note that this method does not resolve the issue that CPL uses Stockfish as a baseline which is unrealistic in dynamic positions where mistakes are expected.

To resolve that, we would need something that can model how much CPL the typical human would have. This should be doable considering how well transformers (like ChatGPT) can model human text and chess notation is a form of text (see this paper).

Once we have the expected CPL (or equivalent in W%), we can normalize by that factor to get the best 1 number estimate of a player's skill given the moves they played.
If you are interesting in the charting of the Accuracy, the LiChess Tools browser extension has an option to show it on the server analysis chart. You have to enable it from Preferences. It's mostly an inverse of the delta eval value, but I think it looks pretty informative. A consistent accuracy is a flat line while the dips show you were you went wrong.
"Looking forward:
Note that this method does not resolve the issue that CPL uses Stockfish as a baseline which is unrealistic in dynamic positions where mistakes are expected."

A great idea if its used in elite games uding as input SF to calculate CPL and a human-like engine to calculate accuracy.

This topic has been archived and can no longer be replied to.