- Blind mode tutorial
lichess.org
Donate

jomega

Strategic Test Suite (STS): The EPD file's best and alternate best moves.

ChessChess engineStrategySoftware Development
What does 'best' and 'alternate best' moves mean for STS?

A continuation of the discussion on the STS here:
jomega's Blog • Strategic Test Suite (STS): Introduction • lichess.org

One of the interesting positions on which Stockfish 14 failed to get the "best move" is the following position from the Square Vacancy category. Position 057. I'll discuss this position in my next blog post. First we need to discuss what 'best move' and 'alternate best move' means in the STS EPD.

https://lichess.org/study/DMOJQHcr/e7SLW2yV#0

How the STS EPD Works

The EPD entry for this was the following:

2rqrbk1/1b1n1p2/p2p1npp/1p6/4P2B/1B3NNP/PP1Q1PP1/3RR1K1 w - - bm Qf4; c0 "Qf4=10, Re2=3, a3=1, a4=2"; id "STS(v4.0) Square Vacancy.057"; c7 "Qf4 Re2 a3 a4"; c8 "10 3 1 2"; c9 "d2f4 e1e2 a2a3 a2a4";

The first part of the line is an FEN. Then occurs the op-code mnemonic 'bm' which stands for 'best move', and then the Short (Standard) Algebraic Notation (SAN) for the best move, which is listed as Qf4. The other op-code mnemonics, c0-c9 are comments. The value of c0 contains the 'score' for the best move. Other moves are also accepted as 'alternate best' but are given a smaller score than the best move. The scale is arbitrary; where the best move is given a score of 10. Since the suite has 1500 positions, the maximum score is 15,000.

It became clear to me, on reading various posts by the authors of STS, that the original intent of STS was that the positions should not have tactical refutations (an extreme example would be a missed mate), and that the solution should be what an engine developer should strive to obtain to play "best chess". The vetting method was to use the alpha/beta engines circa 2009. I assume that the suite was re-vetted for the STS-rating in 2019. I believe that the "best moves" and "alternate best moves" were first determined by human understanding of the positions, and then vetted by the engines of that day.

Of course even humans disagree as to the "best move" for a position; for all sorts of reasons that I will not get into here. Whether some engine is programmed with code that we would identify with looking for the STS themes, is up to the developers of the engine. From the point of view of the test suite as an evaluation of engine performance, either the engine produces the correct moves or it does not. The test suite does not care or measure how the engine comes up with its moves. Note that there are no variations given for either the best move or the alternate best moves.

The Stockfish Results for STS(v4.0) Square Vacancy.057

In the original STS-rating test, Stockfish 14 produced Qf4 as the move. So Stockfish got the best move as listed in the EPD. That is a score of 10 on that position. That was at a depth of 15; which was determined indirectly by the time (0.082 s) the STS-rating system allowed on my machine. In the longer test (3 s), Stockfish produced a4 as the move. This was at a depth of 22. So in the second test, Stockfish scored a 2 on this position.

In the study chapter for this position, I outline how the Stockfish evaluation works on my browser using Lichess. There, Stockfish is back to choosing Qf4 as the move at depth 24.

In my next blog post, I'll discuss this position and Stockfish's evaluations in detail.

Links

- The Strategic Test Suite (STS) home page.
https://sites.google.com/site/strategictestsuite/
- The STS-rating code.
https://github.com/fsmosca/STS-Rating
- The FEN standard.
https://www.chessprogramming.org/Forsyth-Edwards_Notation
- The EPD standard.
https://www.chessprogramming.org/Extended_Position_Description