- Blind mode tutorial
lichess.org
Donate

A Chess Metric: Ease (for Humans)

@daniel921126 said in #20:

Another way I use to asess how easy a position is for humans is to check how many moves continue to maintain the draw according to Stockfish. I always analize my games with at least 5 candidate moves given by Stockfish. If there's ONLY ONE MOVE that continues to maintain the draw AND that move isn't a check, a capture or a threat I consider it difficult for humans.

If on the contrary, all 5 candidate moves are draws, then I consider it easy.

That makes sense to me.
I suspect checks, captures, and threats will all have a high P according to Leela.

@daniel921126 said in #20: > Another way I use to asess how easy a position is for humans is to check how many moves continue to maintain the draw according to Stockfish. I always analize my games with at least 5 candidate moves given by Stockfish. If there's ONLY ONE MOVE that continues to maintain the draw AND that move isn't a check, a capture or a threat I consider it difficult for humans. > > If on the contrary, all 5 candidate moves are draws, then I consider it easy. That makes sense to me. I suspect checks, captures, and threats will all have a high P according to Leela.

@AnlamK said in #12:

Hi - thanks for the great post.

Here's an idea that might be some work to implement but is perhaps worth trying:

  1. From the Lichess database, take Stockfish analyzed rapid games between players in the rating range of 2100-2400.
  2. For every position before a move marked as a blunder or mistake, see if your rating marks it as a difficult position. (Perhaps leave out blunders and mistakes made in time trouble by looking at how much time the player had before making the move.)
  3. Fiddle with the parameters so that ease rating better predicts of blunders or mistakes.
  4. Cross-validate and/or test on previously unseen positions.

I just thought of a much simpler idea to get a 'training dataset' for this ease metric. You already do this a bit.

Why not just use puzzles and their rating, as their rating just indicates their difficulty? It would be fun to see if the ease metric, trained only on tactical puzzles, could be also predict the difficulty of moves that are not tactical in nature..

@AnlamK said in #12: > Hi - thanks for the great post. > > Here's an idea that might be some work to implement but is perhaps worth trying: > > 1. From the Lichess database, take Stockfish analyzed rapid games between players in the rating range of 2100-2400. > 2. For every position before a move marked as a blunder or mistake, see if your rating marks it as a difficult position. (Perhaps leave out blunders and mistakes made in time trouble by looking at how much time the player had before making the move.) > 3. Fiddle with the parameters so that ease rating better predicts of blunders or mistakes. > 4. Cross-validate and/or test on previously unseen positions. I just thought of a much simpler idea to get a 'training dataset' for this ease metric. You already do this a bit. Why not just use puzzles and their rating, as their rating just indicates their difficulty? It would be fun to see if the ease metric, trained only on tactical puzzles, could be also predict the difficulty of moves that are not tactical in nature..

This is good work. It would be interesting to compare the human evaluation (probabilities of moves) with the Lichess rapid database.

But the method should be recursive, I think. As @HelloItsDmitri pointed out, engine evaluations ignore the difficulty in calculation. Suppose I consider two positions. Each has a probability 100% chance of a move which will result in a draw with perfect play: there's a forced move due to check, for example. By this metric, the two positions are equal. However, the resulting position may be considerably more complex in one case than the other.. Even though the first move is of equal complexity (ease = 1), the next position will have very different ease values. Perhaps this is by design: the metric is designed to only consider the very next move. But instead of using the evaluation of candidate moves, you could also consider using the ease values of candidate moves, and do a recursive calculation, the recursion ending when the game is resolved or when the ease increases to some trivial value.

Another comment is why did you choose "ease" (1 = forced) versus "difficulty" (0 = forced)? Difficulty would simplify the formula further (removing the 1 -).

This is good work. It would be interesting to compare the human evaluation (probabilities of moves) with the Lichess rapid database. But the method should be recursive, I think. As @HelloItsDmitri pointed out, engine evaluations ignore the difficulty in calculation. Suppose I consider two positions. Each has a probability 100% chance of a move which will result in a draw with perfect play: there's a forced move due to check, for example. By this metric, the two positions are equal. However, the resulting position may be considerably more complex in one case than the other.. Even though the first move is of equal complexity (ease = 1), the next position will have very different ease values. Perhaps this is by design: the metric is designed to only consider the very next move. But instead of using the evaluation of candidate moves, you could also consider using the ease values of candidate moves, and do a recursive calculation, the recursion ending when the game is resolved or when the ease increases to some trivial value. Another comment is why did you choose "ease" (1 = forced) versus "difficulty" (0 = forced)? Difficulty would simplify the formula further (removing the 1 -).

@AnlamK said in #23:

Hi - thanks for the great post.

Here's an idea that might be some work to implement but is perhaps worth trying:

  1. From the Lichess database, take Stockfish analyzed rapid games between players in the rating range of 2100-2400.
  2. For every position before a move marked as a blunder or mistake, see if your rating marks it as a difficult position. (Perhaps leave out blunders and mistakes made in time trouble by looking at how much time the player had before making the move.)
  3. Fiddle with the parameters so that ease rating better predicts of blunders or mistakes.
  4. Cross-validate and/or test on previously unseen positions.

I just thought of a much simpler idea to get a 'training dataset' for this ease metric. You already do this a bit.

Why not just use puzzles and their rating, as their rating just indicates their difficulty? It would be fun to see if the ease metric, trained only on tactical puzzles, could be also predict the difficulty of moves that are not tactical in nature..

The MAIA research project showed that puzzles aren't reliable for measuring player skill, since beginners and experts may play the same move but intermediate players may play a different move.

@AnlamK said in #23: > > Hi - thanks for the great post. > > > > Here's an idea that might be some work to implement but is perhaps worth trying: > > > > 1. From the Lichess database, take Stockfish analyzed rapid games between players in the rating range of 2100-2400. > > 2. For every position before a move marked as a blunder or mistake, see if your rating marks it as a difficult position. (Perhaps leave out blunders and mistakes made in time trouble by looking at how much time the player had before making the move.) > > 3. Fiddle with the parameters so that ease rating better predicts of blunders or mistakes. > > 4. Cross-validate and/or test on previously unseen positions. > > I just thought of a much simpler idea to get a 'training dataset' for this ease metric. You already do this a bit. > > Why not just use puzzles and their rating, as their rating just indicates their difficulty? It would be fun to see if the ease metric, trained only on tactical puzzles, could be also predict the difficulty of moves that are not tactical in nature.. The MAIA research project showed that puzzles aren't reliable for measuring player skill, since beginners and experts may play the same move but intermediate players may play a different move.

@Toadofsky said in #25:

The MAIA research project showed that puzzles aren't reliable for measuring player skill, since beginners and experts may play the same move but intermediate players may play a different move.

Yes but we are not trying to predict a player's skill.

We are trying to predict how hard a certain chess position is to play for a human. For that I thought a puzzle's rating was a good indicator.

@Toadofsky said in #25: > The MAIA research project showed that puzzles aren't reliable for measuring player skill, since beginners and experts may play the same move but intermediate players may play a different move. Yes but we are not trying to predict a player's skill. We are trying to predict how hard a certain chess position is to play for a human. For that I thought a puzzle's rating was a good indicator.

I guess that a puzzle rating might be an OK indicator, although there are many forum posts about deficiencies with puzzles which don't occur on Chess Tempo etc. (puzzles terminating early, puzzles "throwing away" their queen etc. instead of trying to defend, etc.) so if we are trying to assess how challenging a position is to play for a human, I would strongly hesitate to rely upon puzzle ratings.

I guess that a puzzle rating might be an OK indicator, although there are many forum posts about deficiencies with puzzles which don't occur on Chess Tempo etc. (puzzles terminating early, puzzles "throwing away" their queen etc. instead of trying to defend, etc.) so if we are trying to assess how challenging a position is to play for a human, I would strongly hesitate to rely upon puzzle ratings.

@djconnel said in #24:

But the method should be recursive, I think. As @HelloItsDmitri pointed out, engine evaluations ignore the difficulty in calculation. Suppose I consider two positions. Each has a probability 100% chance of a move which will result in a draw with perfect play: there's a forced move due to check, for example. By this metric, the two positions are equal. However, the resulting position may be considerably more complex in one case than the other.. Even though the first move is of equal complexity (ease = 1), the next position will have very different ease values. Perhaps this is by design: the metric is designed to only consider the very next move. But instead of using the evaluation of candidate moves, you could also consider using the ease values of candidate moves, and do a recursive calculation, the recursion ending when the game is resolved or when the ease increases to some trivial value.

If you have a look at this section in the Colab notebook, I implemented a 5-ply ease where I go through the full tree for the next five plies and pick the lowest ease as the metric. Other decisions could be made (why 5 and why the lowest?) but I think it makes a lot of sense to look at the next few positions instead of settling on the current one.

@djconnel said in #24: > But the method should be recursive, I think. As @HelloItsDmitri pointed out, engine evaluations ignore the difficulty in calculation. Suppose I consider two positions. Each has a probability 100% chance of a move which will result in a draw with perfect play: there's a forced move due to check, for example. By this metric, the two positions are equal. However, the resulting position may be considerably more complex in one case than the other.. Even though the first move is of equal complexity (ease = 1), the next position will have very different ease values. Perhaps this is by design: the metric is designed to only consider the very next move. But instead of using the evaluation of candidate moves, you could also consider using the ease values of candidate moves, and do a recursive calculation, the recursion ending when the game is resolved or when the ease increases to some trivial value. If you have a look at [this section](https://colab.research.google.com/drive/1LEXjH18A34lkZw2qwHIV0AwNuJrjLBGR#scrollTo=XNf7Rc58U9j9) in the Colab notebook, I implemented a 5-ply ease where I go through the full tree for the next five plies and pick the lowest ease as the metric. Other decisions could be made (why 5 and why the lowest?) but I think it makes a lot of sense to look at the next few positions instead of settling on the current one.

People will believe anything as long as you throw enough fancy-looking math at them

People will believe anything as long as you throw enough fancy-looking math at them

@MarkosTaimanov said in #29:

People will believe anything as long as you throw enough fancy-looking math at them

I don't believe that

@MarkosTaimanov said in #29: > People will believe anything as long as you throw enough fancy-looking math at them I don't believe that