Pernicious Computer Analysis Bug: Two-Fold Repetition? • page 1/1 • Lichess Feedback • lichess.org

I've noticed since two updates ago that the computer routinely yields an evaluation of "0" whenever it sees a position has occurred only twice, often triggering huge "Blunder" markers and messing up average pawn loss calculations. This error seems to be pernicious in the computer's internal analysis, since it will also e.g. fault the losing player for not repeating the position, as seen twice in my game here (moves 38 and 45):

https://en.lichess.org/cCnQ75m8

These 0- and 1-ply misevaluations are usually easy enough to spot, but I'm also worried that it could cause less obvious faults at deeper plies of calculation. Has this error been reported already?

I've noticed since two updates ago that the computer routinely yields an evaluation of "0" whenever it sees a position has occurred only twice, often triggering huge "Blunder" markers and messing up average pawn loss calculations. This error seems to be pernicious in the computer's internal analysis, since it will also e.g. fault the losing player for not repeating the position, as seen twice in my game here (moves 38 and 45): https://en.lichess.org/cCnQ75m8 These 0- and 1-ply misevaluations are usually easy enough to spot, but I'm also worried that it could cause less obvious faults at deeper plies of calculation. Has this error been reported already?

Harutsedo edited

#2

This is a frequent question on this forum.
This is simply how Stockfish and many other engines choose to evaluate 2-fold repetition. I've read that developers
I don't see that it faulted anyone for not repeating a position. In both of the cases you provided, it faulted Black for repeating the position, consistent with considering 2-fold repetition a draw.

I defer to this post: https://en.lichess.org/forum/lichess-feedback/stockfish-only-requires-twofold-repeatition-to-draw#10

But however you evaluate repetition, I can't see how that would affect evaluations in other positions.

This is a frequent question on this forum. This is simply how Stockfish and many other engines choose to evaluate 2-fold repetition. I've read that developers I don't see that it faulted anyone for not repeating a position. In both of the cases you provided, it faulted Black for repeating the position, consistent with considering 2-fold repetition a draw. I defer to this post: https://en.lichess.org/forum/lichess-feedback/stockfish-only-requires-twofold-repeatition-to-draw#10 But however you evaluate repetition, I can't see how that would affect evaluations in other positions.

Paracompact

#3

Hm, apparently that thread (and its linked sites) do suggest this is a deeper and more complicated problem with Stockfish for several years now, and Lichess only tries to provide unofficial patches for the problem. In this case I hope they manage to provide another patch, but if they've decided it's become too much of a hassle, I respect the decision.

With regards to affecting evaluations in other positions, here's someone's misgivings I share:

"Suppose stockfish was white and losing in a certain position (-3.0) but black blundered and stockfish managed to improve the evaluation to -0,5 a few moves later. At this point stockfish could play the best move, or repeat the position in which it was losing, but stockfish considered repeating the position 0.0, and therefore ranked the repeat position as 0.0, which it would rank above than the other move. Now if SF's opponent was a computer it might make the same blunder, but maybe not e. g. the other computer cleared its hash table and played a better move. The opponent might also be a human and make a better move."

Hm, apparently that thread (and its linked sites) do suggest this is a deeper and more complicated problem with Stockfish for several years now, and Lichess only tries to provide unofficial patches for the problem. In this case I hope they manage to provide another patch, but if they've decided it's become too much of a hassle, I respect the decision. With regards to affecting evaluations in other positions, here's someone's misgivings I share: "Suppose stockfish was white and losing in a certain position (-3.0) but black blundered and stockfish managed to improve the evaluation to -0,5 a few moves later. At this point stockfish could play the best move, or repeat the position in which it was losing, but stockfish considered repeating the position 0.0, and therefore ranked the repeat position as 0.0, which it would rank above than the other move. Now if SF's opponent was a computer it might make the same blunder, but maybe not e. g. the other computer cleared its hash table and played a better move. The opponent might also be a human and make a better move."

Harutsedo

#4

I don't think that will affect the moves played by Stockfish. I came up with an example: https://en.lichess.org/4SQvOqUj
I don't know why the initial position isn't evaluated higher (the first move is definitely a blunder, right?), but we see the computer analysis shows a value of 0 after 5. Nf5 and even calls it a blunder, but the suggested move by Lichess is not 5. ..Re8 as would be predicted if Stockfish just made whichever move that resulted in the position with the highest evaluation. So based on this, it definitely handles the move recommendations correctly.

I don't know what the Stockfish AI would actually play in this case, since situations like this are so infrequent, and it's difficult to contrive it since I'm only in control of half the moves.

Do you have an example of a game in which this occurred? For science.

I don't think that will affect the moves played by Stockfish. I came up with an example: https://en.lichess.org/4SQvOqUj I don't know why the initial position isn't evaluated higher (the first move is definitely a blunder, right?), but we see the computer analysis shows a value of 0 after 5. Nf5 and even calls it a blunder, but the suggested move by Lichess is not 5. ..Re8 as would be predicted if Stockfish just made whichever move that resulted in the position with the highest evaluation. So based on this, it definitely handles the move recommendations correctly. I don't know what the Stockfish AI would actually play in this case, since situations like this are so infrequent, and it's difficult to contrive it since I'm only in control of half the moves. Do you have an example of a game in which this occurred? For science.

Paracompact edited

#5

Hm, it does seem this example proves there's a workaround Stockfish takes so that it doesn't actually act on a mistaken belief of two-fold repetition; my hypothesis is that the computer, when given a completely concrete position (and move history), never zeros out an evaluation in its mind unless 1) there actually is three-fold repetition, or 2) there is two-fold repetition /entirely within one of its imagined variations/. And really, this is (or would be) perfectly sound, for the same reasons that humans themselves take the mental shortcut (2).

However, this doesn't explain why the real-time Stockfish analysis on White's 5th move (where it suggests Nf5 as near optimal) disagrees with the post-game analysis, which triggers the zeroing and calls it a blunder. I can only venture the guess that—somehow—the "imagined variation evaluation" is being recorded in post-game analysis, while the "refresh memory/concrete game evaluation" is being shown in the real-time Stockfish analysis. Can anyone confirm or explain?

Hm, it does seem this example proves there's a workaround Stockfish takes so that it doesn't actually act on a mistaken belief of two-fold repetition; my hypothesis is that the computer, when given a completely concrete position (and move history), never zeros out an evaluation in its mind unless 1) there actually is three-fold repetition, or 2) there is two-fold repetition /entirely within one of its imagined variations/. And really, this is (or would be) perfectly sound, for the same reasons that humans themselves take the mental shortcut (2). However, this doesn't explain why the real-time Stockfish analysis on White's 5th move (where it suggests Nf5 as near optimal) disagrees with the post-game analysis, which triggers the zeroing and calls it a blunder. I can only venture the guess that—somehow—the "imagined variation evaluation" is being recorded in post-game analysis, while the "refresh memory/concrete game evaluation" is being shown in the real-time Stockfish analysis. Can anyone confirm or explain?

MoistvonLipwig

#6

The reason why SF does that is because in a computer vs computer game this doesn't hurt. If you analyze weak humans playing each other it can lead to those weird evaluations, but SF is made to reach highest possible rating/playing strength and not to be specifically handy when analyzing games.