Aren't puzzles supposed to feature perfect play?

Consider this (relatively straightforward) puzzle, white to play: https://lichess.org/training/DsX7P

Stockfish's evaluation (at depth 30) in the initial position after 13. ... Bc5 is +5.3 provided white finds the best move Nxe4.
The suggested solution of the puzzle goes like this: After 13. ... Bc5 play would continue 14. Nxe4 Qxe4 15. Re1 0-0 16. Rxe4 which concludes the puzzle.

The way Stockfish (at depth 30) evaluates this final position white has a +7.6 advantage at least (given that black plays the best move 16. ... Bxf2+) and the evaluation climbs as high as +9.0 in case black recaptures immediately with 16. ... dxe4.

White's material advantage after the best continuation 16. ... Bxf2+ 17. Kh1 dxe4 is three points of material (white is up a queen in exchange for a rook and a pawn).

But wait, how did the evaluation climb from an initial +5.3 to an eventual +7.6 in just three moves with perfect play?

Well, the solution suggested by the puzzle doesn't feature perfect play by black!

After white has correctly found 14. Nxe4 Qxe4 15. Re1 (pinning black's queen to the king) there is a better move for black! Stockfish (at depth 31) suggests this beautiful defensive sacrifice 15. ... Bxf2+ right away, followed by 16. Kxf2 0-0+ 17. Kg1 Qb4.

The combination of the bishop's sacrifice (which MUST be accepted because the bishop also forks white's king and the rook that's pinning black's queen!) on f2 and the already half-open f-line means that short castles, unpinning the queen from the king, simultaneously comes with check, thereby saving the queen!!!

Stockfish (at depth 31) gives 15. ... Bxf2+ an evaluation of +5.2 whereas the move 15. ... 0-0 is evaluated at only +7.8 as it unpins, but does not do so with check allowing white to win the queen.

In the final position after 15. ... Bxf2+ 16. Kxf2 0-0+ 17. Kg1 Qb4 white only has a material advantage of two points of material (up a bishop for a pawn) instead of three, black gets to keep the queen on the board and Stockfish (at depth 30) evaluates this position as +5.4, much more in line with the initial position's evaluation of +5.3 after 13. ... Bc5.

Don't get me wrong, it's still an instructive puzzle as is. White's thematic 15. Qh5+ idea after a potential misguided 14. ... dxe4 by black, forking King and bishop (only possible because the pawn vacated d5) is but one example. The blitz game that spawned this puzzle actually followed that line.

But aren't puzzles supposed to feature perfect play from both sides so as to challenge the player to calculate as accurately as possible?

Or is it deemed more important that the player always finds the best moves (which is the case here, so the puzzle is not wrong) whereas the (virtual) puzzle opponent is allowed to make mistakes on which the player can capitalise?

I wish to apologise in advance in case this question has already been answered or should have been posted in another forum.

I have a question about the puzzles on this site: Are they supposed to always feature perfect play by the opponent/computer? If they aren't, won't you waste time calculating the best lines for your (puzzle) opponent only to be underwhelmed by their inaccuracy? Consider this (relatively straightforward) puzzle, white to play: https://lichess.org/training/DsX7P Stockfish's evaluation (at depth 30) in the initial position after 13. ... Bc5 is +5.3 provided white finds the best move Nxe4. The suggested solution of the puzzle goes like this: After 13. ... Bc5 play would continue 14. Nxe4 Qxe4 15. Re1 0-0 16. Rxe4 which concludes the puzzle. The way Stockfish (at depth 30) evaluates this final position white has a +7.6 advantage at least (given that black plays the best move 16. ... Bxf2+) and the evaluation climbs as high as +9.0 in case black recaptures immediately with 16. ... dxe4. White's material advantage after the best continuation 16. ... Bxf2+ 17. Kh1 dxe4 is three points of material (white is up a queen in exchange for a rook and a pawn). But wait, how did the evaluation climb from an initial +5.3 to an eventual +7.6 in just three moves with perfect play? Well, the solution suggested by the puzzle doesn't feature perfect play by black! After white has correctly found 14. Nxe4 Qxe4 15. Re1 (pinning black's queen to the king) there is a better move for black! Stockfish (at depth 31) suggests this beautiful defensive sacrifice 15. ... Bxf2+ right away, followed by 16. Kxf2 0-0+ 17. Kg1 Qb4. The combination of the bishop's sacrifice (which MUST be accepted because the bishop also forks white's king and the rook that's pinning black's queen!) on f2 and the already half-open f-line means that short castles, unpinning the queen from the king, simultaneously comes with check, thereby saving the queen!!! Stockfish (at depth 31) gives 15. ... Bxf2+ an evaluation of +5.2 whereas the move 15. ... 0-0 is evaluated at only +7.8 as it unpins, but does not do so with check allowing white to win the queen. In the final position after 15. ... Bxf2+ 16. Kxf2 0-0+ 17. Kg1 Qb4 white only has a material advantage of two points of material (up a bishop for a pawn) instead of three, black gets to keep the queen on the board and Stockfish (at depth 30) evaluates this position as +5.4, much more in line with the initial position's evaluation of +5.3 after 13. ... Bc5. Don't get me wrong, it's still an instructive puzzle as is. White's thematic 15. Qh5+ idea after a potential misguided 14. ... dxe4 by black, forking King and bishop (only possible because the pawn vacated d5) is but one example. The blitz game that spawned this puzzle actually followed that line. But aren't puzzles supposed to feature perfect play from both sides so as to challenge the player to calculate as accurately as possible? Or is it deemed more important that the player always finds the best moves (which is the case here, so the puzzle is not wrong) whereas the (virtual) puzzle opponent is allowed to make mistakes on which the player can capitalise? I wish to apologise in advance in case this question has already been answered or should have been posted in another forum.

MDA22

Is it possible to tell the exact model of the processor and the amount of RAM memory of your system? @Thalassokrator

Hi Is it possible to tell the exact model of the processor and the amount of RAM memory of your system? @Thalassokrator

ShiningDrongo

Puzzle "opponent" plays moves that stockfish at a certain depth evaluated as optimal.

Evaluations like +5 and more basically mean that if you let 2 strong engines play it out, white will score close to 100% wins. So what does "perfect play" even mean? Stockfish is like "you up a bishop bro, what else can I tell you" and gives you a random high number. +7 or +9, whatever, white is winning. You can change your engine, hardware, settings, depth and let it run for a different amount of time, and you'll get a different random high number.

Puzzle "opponent" plays moves that stockfish at a certain depth evaluated as optimal. Evaluations like +5 and more basically mean that if you let 2 strong engines play it out, white will score close to 100% wins. So what does "perfect play" even mean? Stockfish is like "you up a bishop bro, what else can I tell you" and gives you a random high number. +7 or +9, whatever, white is winning. You can change your engine, hardware, settings, depth and let it run for a different amount of time, and you'll get a different random high number.

Imadechessgreatagain edited

The jump from +0.5 to +5.8 is very high (depth 22 and 30 Stockfish with NNUE) and that makes it easier to spot blunders for players. So the puzzle is totally okay for its level of difficulty (Normal). But your understanding what it means to be a puzzle is flawed. Puzzle should be teaching you something. In this case it teaches you to win material (win the bishop or take their queen) IF the opponent doesn't see certain threats.

FrozenFractals

edited

the only situation where you can talk a really perfect play is when you calculate it AALLL the way down to a checkmate or draw (might require you a computer the size of enceladus!)

anything other than that is just a heuristic probabilistic estimation.

an extra bishop for a doubled extra pawn + an open field with multitude of sexy long-winded attack possibilities
VS. a queen for a rook + a messed up pawn structure of the opponent with a lone over-advanced pawn

both are clearly winning. but which one is better? there is no objective way to solve this dilemma, other than by a brutal calculation run for aeons of time to establish the distance to the closest forced checkmate. yet no real-world stockfish or a nnue super-intelligence would be able to find that exact shortest line anyway (that is unless they are lucky and there are myriads of the best winning lines with the same number of moves)

/eZhpevk8 <-- the game in question from which the puzzle was generated. the extra bishop line is evaluated as +7.1 (at depth 22, at my end), while the puzzle main line is evaluated as only +5.6. guess that is why it is played by black in the puzzle.

but i agree that the from a didactic perspective the line with an early bishop f2 sac-check and salvaging the queen would be more instructive, as well as better for black from the point of view of a simple intuitive material count (and with a bigger chance for a casual player to win it or draw)

the only situation where you can talk a really perfect play is when you calculate it AALLL the way down to a checkmate or draw (might require you a computer the size of enceladus!) anything other than that is just a heuristic probabilistic estimation. an extra bishop for a doubled extra pawn + an open field with multitude of sexy long-winded attack possibilities VS. a queen for a rook + a messed up pawn structure of the opponent with a lone over-advanced pawn both are clearly winning. but which one is better? there is no objective way to solve this dilemma, other than by a brutal calculation run for aeons of time to establish the distance to the closest forced checkmate. yet no real-world stockfish or a nnue super-intelligence would be able to find that exact shortest line anyway (that is unless they are lucky and there are myriads of the best winning lines with the same number of moves) /eZhpevk8 <-- the game in question from which the puzzle was generated. the extra bishop line is evaluated as +7.1 (at depth 22, at my end), while the puzzle main line is evaluated as only +5.6. guess that is why it is played by black in the puzzle. but i agree that the from a didactic perspective the line with an early bishop f2 sac-check and salvaging the queen would be more instructive, as well as better for black from the point of view of a simple intuitive material count (and with a bigger chance for a casual player to win it or draw)

Thalassokrator

@ShiningDrongo said in #3:

Puzzle "opponent" plays moves that stockfish at a certain depth evaluated as optimal.

Yeah, that used to be my understanding as well. But this puzzle seems to somewhat contradict that notion. Or at least suggest that this "certain depth" is somewhat shallow at times:

When I ask Stockfish to analyse the position after 15. Re1 it first suggests 15. ... 0-0 as black's best response to the pin. But only very briefly. I've tried it several times and it seems (for my bad specs at least) Stockfish only likes short castles best right up to depth 8. From depth 10 upwards (and still at the lichess standard depth 20-21 and beyond) Stockfish realises that 15. ... Bxf2+ is a lot better for black. That's what got me wondering. Do all puzzles use such low depth (below 8)? Certainly not. So this one could be due to a bug right? Or is it by design? Short castles is not a bad move here. It's just not the best (according to Stockfish from depth 10 upwards).

Evaluations like +5 and more basically mean that if you let 2 strong engines play it out, white will score close to 100% wins. So what does "perfect play" even mean? Stockfish is like "you up a bishop bro, what else can I tell you" and gives you a random high number. +7 or +9, whatever, white is winning. You can change your engine, hardware, settings, depth and let it run for a different amount of time, and you'll get a different random high number.

I admit that my title was a bit provocative. I'm aware that even engines are not capable of absolutely perfect play (with the exception of engines having access to a tablebase in the endgame). In the context of my post I should have defined "perfect" play as play that features Stockfish's top moves at a certain, reasonable depth at least. I would have expected this reasonable depth to always exceed depth 8 in the case of puzzles but it appears as though that were not the case necessarily?

@ShiningDrongo said in #3: > Puzzle "opponent" plays moves that stockfish at a certain depth evaluated as optimal. Yeah, that used to be my understanding as well. But this puzzle seems to somewhat contradict that notion. Or at least suggest that this "certain depth" is somewhat shallow at times: When I ask Stockfish to analyse the position after 15. Re1 it first suggests 15. ... 0-0 as black's best response to the pin. But only very briefly. I've tried it several times and it seems (for my bad specs at least) Stockfish only likes short castles best right up to depth 8. From depth 10 upwards (and still at the lichess standard depth 20-21 and beyond) Stockfish realises that 15. ... Bxf2+ is a lot better for black. That's what got me wondering. Do all puzzles use such low depth (below 8)? Certainly not. So this one could be due to a bug right? Or is it by design? Short castles is not a bad move here. It's just not the best (according to Stockfish from depth 10 upwards). > Evaluations like +5 and more basically mean that if you let 2 strong engines play it out, white will score close to 100% wins. So what does "perfect play" even mean? Stockfish is like "you up a bishop bro, what else can I tell you" and gives you a random high number. +7 or +9, whatever, white is winning. You can change your engine, hardware, settings, depth and let it run for a different amount of time, and you'll get a different random high number. I admit that my title was a bit provocative. I'm aware that even engines are not capable of absolutely perfect play (with the exception of engines having access to a tablebase in the endgame). In the context of my post I should have defined "perfect" play as play that features Stockfish's top moves at a certain, reasonable depth at least. I would have expected this reasonable depth to always exceed depth 8 in the case of puzzles but it appears as though that were not the case necessarily?

ShiningDrongo

I don't know and I have zero idea how engines work. I also still don't get why you care so much :)

In any case, my engine disagrees with yours. It says your variation sucks and the main line is slightly better, at a reasonable depth too. Again, I'm not a computer guy, I don't know how these things work.

All I understand is NxN and if pawn takes Qh5+ is a fork, if queen takes Re1 obviously, and your bishop desperado doesn't do anything, because I just take the bishop and I have a bishop and you don't have a bishop.

2 "points of material" I never ever count like this

I don't know and I have zero idea how engines work. I also still don't get why you care so much :) In any case, my engine disagrees with yours. It says your variation sucks and the main line is slightly better, at a reasonable depth too. Again, I'm not a computer guy, I don't know how these things work. All I understand is NxN and if pawn takes Qh5+ is a fork, if queen takes Re1 obviously, and your bishop desperado doesn't do anything, because I just take the bishop and I have a bishop and you don't have a bishop. 2 "points of material" I never ever count like this

FrozenFractals

edited

that is why no two games are exactly the same, even if you play the same moves against it with the same time control. and that is why a mere reloading of the analysis page in your browser (and waiting until it reaches a given depth limit again) produces slightly different numbers every time you do that.

as for the puzzle position, here on my end, i consistently get the castling line (castling both ways actually) evaluated as better for black than the bisho-sac:

https://imgur.com/Iq02Sra.png

you can see on the screenshot ^^ it has reached the default 22/22 depth limit both times. but i think it doesn't really matter, for the modern stockfish even with an only 8 ply depth (+searching for quiescence) is already pretty damn badass for all practical tactical didactical purposes.

it is not only your specs that affect the evaluation. as far as i know, stockfish, like most chess engines, uses a RANDOMIZED approach to searching the game tree (the order in which it goes through the lines and deepens down already calculated ones). it also uses complex pruning and extra depth heuristics (ie. it doesn't waste its brainpower on very bad moves, but looks much deeper than the stated limit if the king is exposed to very forcing check barrages). not to mention the nnue module smacked on top of all that. that is why no two games are exactly the same, even if you play the same moves against it with the same time control. and that is why a mere reloading of the analysis page in your browser (and waiting until it reaches a given depth limit again) produces slightly different numbers every time you do that. as for the puzzle position, here on my end, i consistently get the castling line (castling both ways actually) evaluated as better for black than the bisho-sac: https://imgur.com/Iq02Sra.png you can see on the screenshot ^^ it has reached the default 22/22 depth limit both times. but i think it doesn't really matter, for the modern stockfish even with an only 8 ply depth (+searching for quiescence) is already pretty damn badass for all practical tactical didactical purposes.

Thalassokrator

@ShiningDrongo said in #7:

I also still don't get why you care so much :)

I'm just curious. When something is confusing to me I want to find out how it works. Simple as that.

It turns out that you're correct by the way, according to Stockfish 14+ NNUE the bishop-sac doesn't work as well as Stockfish 10+ WASM might have thought.
My confusion turns out to be the consequence of different engines unsurprisingly favouring different moves when several decent moves exist in a (sufficiently complicated) position. See my next post for details if you're curious about this anticlimactic ending as well ;-)

@ShiningDrongo said in #7: > I also still don't get why you care so much :) I'm just curious. When something is confusing to me I want to find out how it works. Simple as that. It turns out that you're correct by the way, according to Stockfish 14+ NNUE the bishop-sac doesn't work as well as Stockfish 10+ WASM might have thought. My confusion turns out to be the consequence of different engines unsurprisingly favouring different moves when several decent moves exist in a (sufficiently complicated) position. See my next post for details if you're curious about this anticlimactic ending as well ;-)

Thalassokrator

#10

@FrozenFractals said in #8:

as for the puzzle position, here on my end, i consistently get the castling line (castling both ways actually) evaluated as better for black than the bishop-sac:

You're absolutely right!

The problem seems not to be with my specs necessarily, but with my choice of browser. My browser seems to be outdated, it still uses Stockfish 10+ WASM.

When I switch to your browser (Firefox) I basically get the same result as you do. On Firefox I can access Stockfish 14+ NNUE and it disagrees with the (presumably?) older Stockfish 10+ WASM. It indeed evaluates 15. ... 0-0 as better for black and dislikes 15. ... Bxf2+.

Evals (SF 14+ NNUE, depth 22) are at

+5.4 eval for 15. ... 0-0
+6.6 eval for 15. ... 0-0-0
+7.1 eval for 15. ... Bxf2+

Very comparable to your numbers (and they don't change much when depth is increased to 30).

On my old browser it looks like this:

Evals (SF 10+ WASM, depth 22) are at

+4.8 eval for 15. ... Bxf2+
+7.6 eval for 15. ... 0-0
+8.8 eval for 15. ... 0-0-0

It's not surprising to see two different engines disagree about the "best" move in a position in which several decent moves are available.

Thanks for your help!
It appears that the puzzle opponent indeed plays the best move according to the latest version of Stockfish that's available on lichess. And that version seems to be Stockfish 14+ NNUE (or higher) at the moment. I used an outdated browser initially which is how the confusion arose. But at least it turns out that puzzles always play the "best" move as several of us thought :-)

@FrozenFractals said in #8: > as for the puzzle position, here on my end, i consistently get the castling line (castling both ways actually) evaluated as better for black than the bishop-sac: You're absolutely right! The problem seems not to be with my specs necessarily, but with my choice of browser. My browser seems to be outdated, it still uses Stockfish 10+ WASM. When I switch to your browser (Firefox) I basically get the same result as you do. On Firefox I can access Stockfish 14+ NNUE and it disagrees with the (presumably?) older Stockfish 10+ WASM. It indeed evaluates 15. ... 0-0 as better for black and dislikes 15. ... Bxf2+. Evals (SF 14+ NNUE, depth 22) are at +5.4 eval for 15. ... 0-0 +6.6 eval for 15. ... 0-0-0 +7.1 eval for 15. ... Bxf2+ Very comparable to your numbers (and they don't change much when depth is increased to 30). On my old browser it looks like this: Evals (SF 10+ WASM, depth 22) are at +4.8 eval for 15. ... Bxf2+ +7.6 eval for 15. ... 0-0 +8.8 eval for 15. ... 0-0-0 It's not surprising to see two different engines disagree about the "best" move in a position in which several decent moves are available. Thanks for your help! It appears that the puzzle opponent indeed plays the best move according to the latest version of Stockfish that's available on lichess. And that version seems to be Stockfish 14+ NNUE (or higher) at the moment. I used an outdated browser initially which is how the confusion arose. But at least it turns out that puzzles always play the "best" move as several of us thought :-)

This topic has been archived and can no longer be replied to.