A Chess Metric: Ease (for Humans)
“I know the engine evaluation but how easy to play is this position, really?”For any given position, chess engines give us an evaluation and we talk about it as the objective evaluation. Clearly that’s not completely true because chess is not a solved problem (outside of table bases!) but somehow the evaluation is superhuman.
That is to say, the engine evaluation is giving us information as to how good a position is if a superhuman player were to play it. But engines rely on their assessment of millions of positions and humans cannot come close to that.
That’s not ideal! I would love an evaluation that will actually reflect how easy or hard to play a position is, for human beings.
Let’s walk through a few scenarios where this discrepancy is not ideal.

The Problem With A Superhuman Metric
Scenario 1: Game Analysis
I am in the midst of analyzing a game I played and the engine tells me I had the advantage in a given position. Yet I used up more time than my opponent and ended up blundering. Trusting the superhuman evaluation, I compare the suggested moves with my own and try to figure out why my intuition did not guide me towards the best moves. After a lot of comparative analysis, I vaguely understand why the best moves are indeed the best moves and I hope that my intuition has been prodded enough that I can do better next time.
The issue here is that I’m prodding my own intuition with superhuman suggestions—moves that even master players might not find. The two levels of analysis are clashing. It is likely that the moves went over my head and that I learned absolutely nothing for the next time a similar position comes up.
Scenario 2: Opening Repertoire
I am building an opening repertoire and choosing lines that have a slightly better objective evaluation. When I take that repertoire for a spin, I realize that I get outplayed in these slightly better positions.
The issue here is that the objective evaluation is telling me how easy to play the position is for a superhuman player. Of all the positions evaluated as good or equal, some will be fine to play (maybe most). But some of them will turn out to be really difficult in practical play and I now have to go back to the drawing board and figure out which lines I need to reconsider (some lines will have to be discarded and my analysis of them with it).
Scenario 3: Live Broadcast
I am following a live broadcast of a top-level chess game. The commentators and the engine both consider the position equal. One player makes a move and suddenly the evaluation drops sharply. The chat explodes with confusion. How could a GM blunder in an equal position?
Okay, GMs blunder too sometimes but more to the point: the engine saw a balanced position that might have been very difficult to play. In practice it can be near impossible to hold on to equality. In this scenario, perhaps the commentators were swayed by the engine evaluation and did not appreciate the mine field on the board.
How the Ease Metric Works
It is not straightforward to figure out how easy to play certain moves are but as a proxy, I am using the policy value that Leela Chess Zero outputs for any given position before moving on to calculation.
This policy value is the output of a neural network that could be interpreted as reflecting the intuition of a very strong player. For many players, this might be too strong, but I think this is a good start. It’s also beneficial to be exposed to stronger intuition than ours in order to learn from it. As long as the policy value does not rely on millions of calculations, I am happy to use it as a proxy for how natural a move can look to humans.
(I have not done this, but it would be interesting to use a chess engine that actively tries to be more human-like—for example Maia.)
So for all the possible moves, Leela Chess Zero gives us a policy value P (how natural the move is) and an evaluation Q (how good the move actually is). I am using these values in a weighted sum to describe how close the natural moves are to the best move:

Where:
Piis the probability (policy value percentage from0to100) for candidate moveiQiis the final evaluation from−1(certain loss) to+1(certain win) for candidate moveiQmaxis the highestQvalue across all candidate movesαcontrols the bias to give more resolution at the top end (here set to1/3)βcontrols the emphasis on high-probability moves (here set to1.5)
A position close to 1 is easy as pie and a position close to 0 is treacherous.
A Quick Walkthrough
Let’s walk through a simple example. Let’s assume there are 2 natural-looking moves with a P of 90 and 10 and none of the other moves is natural (P = 0 for all of them). Assuming the most natural move is winning and the second best is a draw (Q1 = 1 and Q2 = 0). Then we get:

So in this case, we get an ease value of 0.75. This is close to the maximum of 1 so it will be fairly easy to play. Now this is a unitless number so it only makes sense when compared with other positions. Let’s look at a few examples.
Example Positions and Their Ease Metric
Here are somewhat arbitrary examples of ease values. White to play in all of them.
| Position | Ease | |
|---|---|---|
![]() | Initial position | 0.83 |
![]() | Easy puzzle | 0.74 |
![]() | Intermediate puzzle | 0.48 |
![]() | Hard puzzle | 0.53 |
![]() | Italian game | 0.81 |
![]() | Queen’s Gambit Declined | 0.79 |
![]() | Trivial rook endgame | 0.92 |
The ease values above reflect my own intuition though you might have noticed that the ease for the hard puzzle is higher (easier) than the one for the intermediate puzzle. This could be because the first move of the hard puzzle is fairly easy to find. For a more accurate metric, we need to look at the ease over multiple moves. I won’t get into this in this blog post but I did try to define an ease metric over multiple moves in the notebook below.
Try It Yourself
You can try this metric on any given position in this Colab notebook.
Feel free to leave a comment on the notebook or in the Lichess forum if you would also appreciate this kind of metric being widely available. And obviously leave a comment if you know of a better way to achieve a similar metric.






