- Blind mode tutorial
lichess.org
Donate

jomega

Strategic Test Suite (STS): STS(v4.0) Square Vacancy.057

StrategyChess engineAnalysisChess
What is the "best move"? Why did Stockfish 14 vacillate?

A continuation of the discussion here:
jomega's Blog • Strategic Test Suite (STS): The EPD file's best and alternate best moves. • lichess.org

One of the interesting positions on which Stockfish 14 failed to get the "best move" (at depth 22) is the following position from the Square Vacancy category. Position 057. I promised in my last blog post to discuss this position.

https://lichess.org/study/DMOJQHcr/e7SLW2yV#0

Reminder of the Results
Recall from the above post that the supposed best move is Qf4 per the STS, and that Stockfish 14 produces that move at depth 15 and depth 24, but not at depth 22. Those results were obtained on my local machine. In the study's chapter, I explain how my browser set up is for Lichess and I list the Stockfish 14 evaluations per Lichess for the best move and the alternate best moves.

Too Close to Call at Small Depths
All of these evaluations at approximately depth 23 are very close; the maximum spread is 0.4 points. If you are just relying on Stockfish to make the call of the best move, this result indicates that you need to give Stockfish more time. Stockfish was vacillating because it could not differentiate the terminal positions at low depth.

Extending the Search
On Lichess you could set the mode to 'infinite', or you could ask for a cloud evaluation. What I did was have my local machine evaluate to a depth of 47. This produced:

1.Qf4 2.14
1.a4 0.54
1.Re2 0.52
1.a3 0.43

At the tip (last position) of the various variations for the above, if you have Stockfish 14 continue to evaluate then, 1.Qf4 is evaluated at 8.27/32; while the other variations have an evaluation of 3.69/26, 0.0/23, and 0.0/37 respectively. These are extra half-moves beyond the initial 47!

Of course, it is possible that Stockfish 14 misevaluated some terminal node and so has come up with a wrong result. The saying "Long variation; wrong variation.", comes to mind. However, assuming that we don't have that case, then Stockfish 14 seems to now be agreeing with the EPD on the best move. However, the EPD score of Re2 should probably be set to 1, while a4 should get a higher score than 2.

The Human Perspective
Evaluating this position from a human perspective depends on many things; mostly experience. A strong player would evaluate within a few seconds that White probably has a strong kingside attack coming. A few more seconds and they'd say that if Black plays ...g5 then White can sacrifice the h4-Bishop for two pawns and a blistering attack. They'd make this assessment so fast that there is no way they are doing a 22 half-move search! Probably more like a straight-line 5 half-move search of the Bishop sacrifice on the threat of ...g5.

What are the clues in this position for such an assessment?
- Black has disturbed his Kingside pawns. His h-pawn is not guarding g6 which leaves that square unguarded because the f7-pawn is absolutely pinned.
- Black's dark-square Bishop has moved to f8 (probably from g7) and so is not guarding f6.
- Only Black's King is guarding f7, and once the White Queen gets a check at g5 (after the Bishop sac), White's b3-Bishop will come crashing in at f7.
- White's Knights are ready on the kingside to join the fight.

The above points are the words that express what I'm sure is subconscious in the mind of a strong player who has enough experience with these types of positions. The sac of a piece for several pawns and a kingside attack is a commonly known theme.

So why would a human consider Qf4? Probably because a) experience says to pile-on-the-pinned-piece (the f6-Knight), and b) the quick calculation of the sacrifice for several pawns and a kingside attack is extremely promising and so White does not fear 1...g5, which forks the Queen and Bishop.

Links

- The Strategic Test Suite (STS) home page.
https://sites.google.com/site/strategictestsuite/
- The STS-rating code.
https://github.com/fsmosca/STS-Rating