- Blind mode tutorial
lichess.org
Donate
The equation for the ease metric

A Chess Metric: Ease (for Humans)

ChessAnalysisSoftware DevelopmentChess engine
“I know the engine evaluation but how easy to play is this position, really?”

For any given position, chess engines give us an evaluation and we talk about it as the objective evaluation. Clearly that’s not completely true because chess is not a solved problem (outside of table bases!) but somehow the evaluation is superhuman.

That is to say, the engine evaluation is giving us information as to how good a position is if a superhuman player were to play it. But engines rely on their assessment of millions of positions and humans cannot come close to that.

That’s not ideal! I would love an evaluation that will actually reflect how easy or hard to play a position is, for human beings.

Let’s walk through a few scenarios where this discrepancy is not ideal.

image.png

The Problem With A Superhuman Metric

Scenario 1: Game Analysis


I am in the midst of analyzing a game I played and the engine tells me I had the advantage in a given position. Yet I used up more time than my opponent and ended up blundering. Trusting the superhuman evaluation, I compare the suggested moves with my own and try to figure out why my intuition did not guide me towards the best moves. After a lot of comparative analysis, I vaguely understand why the best moves are indeed the best moves and I hope that my intuition has been prodded enough that I can do better next time.

The issue here is that I’m prodding my own intuition with superhuman suggestions—moves that even master players might not find. The two levels of analysis are clashing. It is likely that the moves went over my head and that I learned absolutely nothing for the next time a similar position comes up.

Scenario 2: Opening Repertoire


I am building an opening repertoire and choosing lines that have a slightly better objective evaluation. When I take that repertoire for a spin, I realize that I get outplayed in these slightly better positions.

The issue here is that the objective evaluation is telling me how easy to play the position is for a superhuman player. Of all the positions evaluated as good or equal, some will be fine to play (maybe most). But some of them will turn out to be really difficult in practical play and I now have to go back to the drawing board and figure out which lines I need to reconsider (some lines will have to be discarded and my analysis of them with it).

Scenario 3: Live Broadcast


I am following a live broadcast of a top-level chess game. The commentators and the engine both consider the position equal. One player makes a move and suddenly the evaluation drops sharply. The chat explodes with confusion. How could a GM blunder in an equal position?

Okay, GMs blunder too sometimes but more to the point: the engine saw a balanced position that might have been very difficult to play. In practice it can be near impossible to hold on to equality. In this scenario, perhaps the commentators were swayed by the engine evaluation and did not appreciate the mine field on the board.

How the Ease Metric Works

It is not straightforward to figure out how easy to play certain moves are but as a proxy, I am using the policy value that Leela Chess Zero outputs for any given position before moving on to calculation.

This policy value is the output of a neural network that could be interpreted as reflecting the intuition of a very strong player. For many players, this might be too strong, but I think this is a good start. It’s also beneficial to be exposed to stronger intuition than ours in order to learn from it. As long as the policy value does not rely on millions of calculations, I am happy to use it as a proxy for how natural a move can look to humans.

(I have not done this, but it would be interesting to use a chess engine that actively tries to be more human-like—for example Maia.)

So for all the possible moves, Leela Chess Zero gives us a policy value P (how natural the move is) and an evaluation Q (how good the move actually is). I am using these values in a weighted sum to describe how close the natural moves are to the best move:

metric-equation.png
Where:

  • Pi is the probability (policy value percentage from 0 to 100) for candidate move i
  • Qi is the final evaluation from −1 (certain loss) to +1 (certain win) for candidate move i
  • Qmax is the highest Q value across all candidate moves
  • α controls the bias to give more resolution at the top end (here set to 1/3)
  • β controls the emphasis on high-probability moves (here set to 1.5)

A position close to 1 is easy as pie and a position close to 0 is treacherous.

A Quick Walkthrough


Let’s walk through a simple example. Let’s assume there are 2 natural-looking moves with a P of 90 and 10 and none of the other moves is natural (P = 0 for all of them). Assuming the most natural move is winning and the second best is a draw (Q1 = 1 and Q2 = 0). Then we get:

metric-workout.png

So in this case, we get an ease value of 0.75. This is close to the maximum of 1 so it will be fairly easy to play. Now this is a unitless number so it only makes sense when compared with other positions. Let’s look at a few examples.

Example Positions and Their Ease Metric

Here are somewhat arbitrary examples of ease values. White to play in all of them.

Position Ease
image.pngInitial position0.83
image.pngEasy puzzle0.74
image.pngIntermediate puzzle0.48
image.pngHard puzzle0.53
image.pngItalian game0.81
image.pngQueen’s Gambit Declined0.79
image.pngTrivial rook endgame0.92

The ease values above reflect my own intuition though you might have noticed that the ease for the hard puzzle is higher (easier) than the one for the intermediate puzzle. This could be because the first move of the hard puzzle is fairly easy to find. For a more accurate metric, we need to look at the ease over multiple moves. I won’t get into this in this blog post but I did try to define an ease metric over multiple moves in the notebook below.

Try It Yourself

You can try this metric on any given position in this Colab notebook.

Feel free to leave a comment on the notebook or in the Lichess forum if you would also appreciate this kind of metric being widely available. And obviously leave a comment if you know of a better way to achieve a similar metric.