lichess.org

Suspicious Eval?

Just had a 2+1 bullet game; where my opponent (black) hung a bishop on move 7. The eval goes from +0.5 to like +9.something. For a bishop? No way.

What gives, did Stockfish have a melt-down? Can anyone explain the massive eval change?

i find your question interesting. because the depth used for that 9.x eval is quite shallow, i got curious and for comparision current Stockfish 20230911 calculates with depth 46 and 16gb hash a jump from 0.65 -> 5.69:

rnbqk2r/pp3pp1/2pbpn1p/3p4/3P4/3BPNB1/PPPN1PPP/R2QK2R b KQkq - 1 7 acd 46; acs 0; bm c5; ce -65; Ae "Stockfish dev-20230911-3f7fb5ac";
r1bqk2r/pp1n1pp1/2pbpn1p/3p4/3P4/3BPNB1/PPPN1PPP/R2QK2R w KQkq - 2 8 acd 46; acs 0; bm Bxd6; ce +569; Ae "Stockfish dev-20230911-3f7fb5ac";

Also, with newer stochfish, eval is stable, wont reach 7+:

$ /home/michael/opt/stockfish-20230911/bin/stockfish
Stockfish dev-20230911-3f7fb5ac by the Stockfish developers (see AUTHORS file)
setoption name Threads value 8
setoption name Hash value 16384
position fen "r1bqk2r/pp1n1pp1/2pbpn1p/3p4/3P4/3BPNB1/PPPN1PPP/R2QK2R w KQkq - 2 8"
go depth 30
info string NNUE evaluation using nn-1ee1aba5ed4c.nnue
info depth 1 seldepth 1 multipv 1 score cp 450 nodes 88 nps 88000 hashfull 0 tbhits 0 time 1 pv g3d6
info depth 2 seldepth 2 multipv 1 score cp 450 nodes 238 nps 238000 hashfull 0 tbhits 0 time 1 pv g3d6
info depth 3 seldepth 2 multipv 1 score cp 642 nodes 326 nps 326000 hashfull 0 tbhits 0 time 1 pv g3d6
info depth 4 seldepth 3 multipv 1 score cp 642 nodes 500 nps 500000 hashfull 0 tbhits 0 time 1 pv g3d6 c6c5
info depth 5 seldepth 4 multipv 1 score cp 636 nodes 1282 nps 1282000 hashfull 0 tbhits 0 time 1 pv g3d6 c6c5 d4c5
info depth 6 seldepth 4 multipv 1 score cp 586 nodes 2330 nps 2330000 hashfull 0 tbhits 0 time 1 pv g3d6 b7b5 d6c5 d7c5
info depth 7 seldepth 5 multipv 1 score cp 656 nodes 4028 nps 4028000 hashfull 0 tbhits 0 time 1 pv g3d6 d7b6 d6e5
info depth 8 seldepth 8 multipv 1 score cp 692 nodes 7614 nps 3807000 hashfull 0 tbhits 0 time 2 pv g3d6 b7b5 d6e5 d7e5 f3e5
info depth 9 seldepth 8 multipv 1 score cp 643 nodes 13456 nps 4485333 hashfull 0 tbhits 0 time 3 pv g3d6 b7b6 d6g3 f6h5 e1g1
info depth 10 seldepth 10 multipv 1 score cp 649 nodes 15808 nps 5269333 hashfull 0 tbhits 0 time 3 pv g3d6 c6c5 d4c5 d8a5 e1g1 d7c5
info depth 11 seldepth 10 multipv 1 score cp 642 nodes 19770 nps 4942500 hashfull 0 tbhits 0 time 4 pv g3d6 b7b6 d6g3 f6h5 g3e5 c6c5
info depth 12 seldepth 11 multipv 1 score cp 613 nodes 36738 nps 6123000 hashfull 0 tbhits 0 time 6 pv g3d6 f6e4 d6a3 e4d2 d1d2 b7b5 a3d6 a7a5 d6g3
info depth 13 seldepth 10 multipv 1 score cp 590 nodes 73494 nps 7349400 hashfull 0 tbhits 0 time 10 pv g3d6 f6e4 d3e4 d5e4 d2e4 d7f6 e4f6 g7f6 d6g3
info depth 14 seldepth 20 multipv 1 score cp 624 nodes 249313 nps 8310433 hashfull 0 tbhits 0 time 30 pv g3d6 a7a5 d6g3 c6c5 a2a4 c5d4 e3d4 f6e4 d2e4 d5e4 d3e4
info depth 15 seldepth 18 multipv 1 score cp 609 nodes 532656 nps 9028067 hashfull 0 tbhits 0 time 59 pv g3d6 b7b6 d6f4 c6c5 d3b5 c8b7 b5d7 f6d7
info depth 16 seldepth 18 multipv 1 score cp 596 nodes 1034368 nps 9235428 hashfull 0 tbhits 0 time 112 pv g3d6 b7b6 d6f4 c8b7 h2h3 c6c5 a2a4 c5d4 e3d4 e8g8 h3h4
info depth 17 seldepth 19 multipv 1 score cp 595 nodes 1421693 nps 9353243 hashfull 1 tbhits 0 time 152 pv g3d6 b7b6 h2h3 c6c5 d3b5 c8b7 c2c3 c5d4 e3d4 f6e4 d6a3 e4d2
info depth 18 seldepth 21 multipv 1 score cp 595 nodes 1931840 nps 9377864 hashfull 1 tbhits 0 time 206 pv g3d6 b7b6 h2h3 c6c5 d3b5 c8b7 c2c3 a8c8 d6h2 c5d4 c3d4 a7a6 b5d7 d8d7
info depth 19 seldepth 23 multipv 1 score cp 589 nodes 2628687 nps 9455708 hashfull 1 tbhits 0 time 278 pv g3d6 b7b6 h2h3 c6c5 d6h2 a7a5 a2a4 c5d4 e3d4 c8a6 d3a6 a8a6 c2c4 f6e4 e1g1 e8g8
info depth 20 seldepth 24 multipv 1 score cp 581 nodes 3823416 nps 9534703 hashfull 1 tbhits 0 time 401 pv g3d6 b7b6 h2h3 c6c5 d3b5 c8b7 e1g1 a7a6 b5e2 f6e4 d6h2 c5d4 e3d4 e4d2 d1d2
info depth 21 seldepth 27 multipv 1 score cp 579 nodes 6018523 nps 9583635 hashfull 3 tbhits 0 time 628 pv g3d6 b7b6 e1g1 c8b7 h2h3 f6e4 d6h2 c6c5 c2c3 e8g8 d3b1 d7f6 d2b3 b7a6 f1e1 a8c8
info depth 22 seldepth 31 multipv 1 score cp 583 nodes 8352530 nps 9545748 hashfull 3 tbhits 0 time 875 pv g3d6 b7b6 f3e5 c8b7 e5g6 h8g8 g6e5 f6e4 d2e4 d5e4 d3e2 d7e5 d6e5 c6c5 c2c3 e8f8 e5g3 a8c8 d4c5 d8d1 a1d1
info depth 23 seldepth 25 multipv 1 score cp 582 nodes 9295385 nps 9553324 hashfull 3 tbhits 0 time 973 pv g3d6 b7b6 f3e5 d7e5 d4e5 f6d7 d3e2 d7c5 a2a4 a7a5 e1g1 d8g5 f2f4 g5d8 c2c4 d5d4 e3d4 c5b7
info depth 24 seldepth 34 multipv 1 score cp 570 nodes 14351263 nps 9454059 hashfull 4 tbhits 0 time 1518 pv g3d6 b7b6 f3e5 c8b7 d1f3 d7e5 d4e5 f6d7 h2h4 d7c5 d3f1 b7a6 f1a6 c5a6 e3e4 a6c5 f3c3 c5b7 c3c6 d8d7 c6d7 e8d7 d6a3 a8c8 e1d1 b7c5 e4d5 e6d5
info depth 25 seldepth 40 multipv 1 score cp 581 nodes 23259054 nps 9233447 hashfull 5 tbhits 0 time 2519 pv g3d6 b7b6 f3e5 c8b7 e5g6 h8g8 g6e5 c6c5 c2c3 d7e5 d6e5 f6e4 d3e4 d5e4 d2c4 c5d4 c4d6 e8e7 d6b7 d8d5 e5d6 e7d7 d6g3 d4c3 d1a4 d5c6
info depth 26 seldepth 42 multipv 1 score cp 581 nodes 30474285 nps 9156936 hashfull 8 tbhits 0 time 3328 pv g3d6 b7b6 f3e5 d7e5 d4e5 f6d7 e1g1 d7c5 a2a4 c5d3 c2d3 c8a6 a1a3 a8b8 d1h5 c6c5 d6b8 d8b8 d2f3 e8g8
info depth 27 seldepth 37 multipv 1 score cp 580 nodes 49602945 nps 9071496 hashfull 15 tbhits 0 time 5468 pv g3d6 b7b6 c2c4 c8b7 c4d5 f6d5 e1g1 c6c5 d3e2 c5d4 f3d4 a7a6 d6g3 e8g8 a1c1 a8c8
info depth 28 seldepth 38 multipv 1 score cp 588 nodes 68674596 nps 9045652 hashfull 20 tbhits 0 time 7592 pv g3d6 b7b6 c2c4 c8b7 c4d5 f6d5 e1g1 d5e7 d1b1 c6c5 f1d1 c5d4 f3d4 e8g8 d3h7 g8h8 h7e4 b7e4 d2e4 e6e5 d4f3 f7f5 e4c3 e5e4 f3d4 d7c5 d6g3
info depth 29 seldepth 38 multipv 1 score cp 583 nodes 73928544 nps 9035510 hashfull 21 tbhits 0 time 8182 pv g3d6 b7b6 c2c4 d5c4 d2c4 a7a6 d3e2 c6c5 d4c5 d7c5 d6e5 d8d1 a1d1 c8b7 c4b6 a8d8 d1d8 e8d8 e1g1 d8e7 e5c3 f6e4 c3b4 a6a5 b4a5 e4f2
info depth 30 seldepth 43 multipv 1 score cp 569 nodes 113423954 nps 9023385 hashfull 38 tbhits 0 time 12570 pv g3d6 b7b5 c2c3 a7a5 e1g1 c8b7 f1e1 d7b6 d6a3 b6d7 f3e5 d7e5 d4e5 f6d7 a3d6 d8b6 e3e4 c6c5 e4d5 c5c4 d3c4 b5c4 d2c4

Maybe its hash-size related? no, i checked this, doesnt make any difference.

Now i wonder, what is "wrong" with that stockfish version that lichess ships to our browsers :))
So we just found proof, why the changed (quite a lot) eval with newer stockfish is better and was necessary.

cheerz go out to the stockfish dev team :)
we could collect such weird eval jumps.

what about the following one i have in store:

lichess.org/analysis/standard/rnbqkb1r/3ppppp/p4n2/1PpP4/8/8/PP2PPPP/RNBQKBNR_w_KQkq_-_0_5#8

lichess says 0.7 with depth 51 and recommends next move e3

if you play e3, lichess eval drops to 0.1, also with depth 51

that makes a 0.6 difference in eval out of thin air

so, yeah, something has melted down :/

for the record: e3 is correct and i suggest you try this nice surprize :)

this time the crazy 0.1 eval was delivered by lichess cloud. if i didnt knew better, and i would find such crazy eval value in a commercial opening book, i would think that the author placed it there as a watermark to poison others that steal their database. or maybe someone could on purpose have corrupted lichess cloud with random values. i hope no one does.
@mrqwak said in #1:
> Just had a 2+1 bullet game; where my opponent (black) hung a bishop on move 7. The eval goes from +0.5 to like +9.something. For a bishop? No way.
>
> What gives, did Stockfish have a melt-down? Can anyone explain the massive eval change?

Black will be down a full bishop with no compensation, and can't even castle (you did allow him to castle, which was inaccurate, but didn't make much of a difference because Black is still down a bishop). The high evaluation makes sense to me. Even though for humans, White could still blunder (especially at faster time controls), this is already game over for engines.

Also, you should note that the piece values are merely guidelines and are relative to other pieces. If one side is down 3 points of material with no compensation, the engine evaluation will likely be much higher than +3.
@AsDaGo said in #5:
> Black will be down a full bishop with no compensation, and can't even castle (you did allow him to castle, which was inaccurate, but didn't make much of a difference because Black is still down a bishop). The high evaluation makes sense to me. Even though for humans, White could still blunder (especially at faster time controls), this is already game over for engines.
>
> Also, you should note that the piece values are merely guidelines and are relative to other pieces. If one side is down 3 points of material with no compensation, the engine evaluation will likely be much higher than +3.

Ah yeah, thanks. I noticed I should have brought my bishop back on the other diagonal (a6 maybe) to prevent castling (am a London player though, and in the habit of keeping it the h2-b8 diagonal); made a few other mistakes too.

I guess I'm not really understanding what the eval number represents. I always thought it was a representation of how better (or worse) you are, relative to the value of the pieces.
Its not only that ba3 prevents castling, its that nb6 followed by ba3 makes both e5 and c5 hard to achieve. Also makes it very hard to develop. His bishop is going to be a big pawn, not worth 3 points. Leaving the knight there is no picnic either, but perhaps he can play b6 and c5, or b6, a6 and ba6. Either way, I think what is going on is that he is down a bishop and his position is suddenly a lot crappier than it was. I agree +9 or even +7 is very surprising, but I certainly would have guessed at least +4.5.

Its like the old saying about an utterly failed sacrifice/gambit, "you are down material and your opponent has compensation".
@mrqwak said in #7:
> I guess I'm not really understanding what the eval number represents. I always thought it was a representation of how better (or worse) you are, relative to the value of the pieces.

In theory, that's what it's supposed to represent, but it really depends on the position. Being up a bishop could be much more of an advantage in some positions than in others. As @mortmann said, maybe the newer Stockfish's evaluations are closer to the material difference, but again, it all depends on the position.

This topic has been archived and can no longer be replied to.

Reconnecting