lichess.org
Donate

How can Stockfish rate a position to be +50 or greater?

#1 I have a feeling that it has to do with the function "winnable" here:
github.com/official-stockfish/Stockfish/blob/master/src/evaluate.cpp#L869

This function winnable is called to adjust the evaluation value in the main evaluation routine. Without that call, the value after White's 46th move would be close to +6.62 as a static evaluation. One way to see that is to enter the FEN for the position into this page
hxim.github.io/Stockfish-Evaluation-Guide/
which is a javascript implementation of the main evaluation but note that it does not have anything corresponding to the winnable function.

If you let Stockfish work on the position long enough it says 46... Rxh8 is #25.
For reference, +100 means forced mate so +50 or higher means that the engine knows there will be a mate, but hasn't calculated all the way yet. It's likely that a strong enough computer in the future would be able to calculate to mate if a traditional computer evaluates that one side has already blundered or made a mistake 1 more time than the opponent (+5-6). 1 mistake or blunder is required for forced mate to exist. This all assumes perfect play post evaluation of course.
#4 vs #6
"one pawn or better in the position - more space, more active pieces."
I think the "or" before "better" takes precedence in the meaning of that phrase. could have been clearer with more "rambling", but nowadays twitter formatting expectation means human buffer overrun. I do that myself with my own rambling, i overrun myself most of the time, like right now, on the verge of doing it again. or am I? (being giddy).

Maybe an expanded formulation would not have made that phrase look like what #6 is referring to as material.
"one pawn" or better score at that position, because of other aspects than the material count at that position, such as more space, more active pieces". Notice the redundancy to avoid imprecision from concision (we write and assume we write in the minds directly, but there is a wall in between, the lighting is fluctuating in many directions, might be windy around, in my head at least).

#6 But, I agree there is something half-baked in the hybridization with NNues.... and clinging to material count (pawn) as baseline, might be misleading. Because even if attributing some positional measure to a few more nodes that were not in the past (without NNue), my current evolving understanding is that the tree search still dominates with more non-evaluated positions than there are that are evaluated. It may be that they are trying to combine 2 interpretations, and squeeze a disk into a square depression (like baby puzzles).

It is with questions such as these here, that we may help "them", really think about what they did that made their grey-box machine soar in engine tournament (censoring my rambling here). Here it is technology first, science later. But me I like science better. so that's a new puzzle...

There is definitely a "continuity" problem between scoring a mate position in a tree, and a scoring an evaluated non-mate position in the same tree with a scoring function that is still material first (oh yeah, i forgot, the very necessary condition for that new type of position evaluation score, is still material only, about whether the position is about to go material swinging or not).

From there I propose to "them": "Relax, you don't have to choose, use both interpretations side by side, and let the community of chess thinking humans, non-dev perhaps (but still using those scores and thinking chess, that must be worth something, use a bit of that lichess tea from the new puzzle system). I suspect that there may be an apprehension against having 2 scales. like that. But what if, people would start tuning their own internal judgment better modulating their belief depending game phase for example. or whenever a mate would show up, putting more trust in one of the 2 scales. This is not in-game anyway. so this is for analysis. Expect the user to want to think hard, or understand at some level with better measure system. we already use many tools side by side with opening explorer, engine, human annotation, what is another scale about non-nebulous (at all) outcome probabilities... (bad word).
#15 What the programmers of Stockfish are doing with these high evaluation values is getting the engine to hill climb to a possible mate. If a mate is found, then that becomes the value (#n). If the engine is still running it will continue to search and can find shorter mates; I've seen that happen.

The UCI protocol currently only allows the "info score ..." command to pass back to the GUI the cp, or mate (i.e. #n), or bound values. So if the "them" you are referring to is the Stockfish programmers, they are not going to be giving you multiple "interpretations" of the score. If the "them" you are referring to is the larger community of chess programmers, there is a current protocol that allows passing from the engine W/D/L probabilities as well as the cp/mate/bounds. That still may not be what you are wanting. Also, I believe that this new W/D/L protocol is for the programmers' benefit in understanding what their program is doing, whether it be an engine like lc0 which currently is having to convert W/D/L "Q-values" into cp values because of the UCI protocol, or the hybrid engines like Stockfish 13+NNUE that uses W/D/L at leaf nodes but cp values up the tree.

Also, I believe, for the most part, engine programmers are much more interested in producing engines that win against other engines, and have high rating values. Those programmers don't have as a priority, passing back values to allow people to "tune their own internal judgment".

You know I sympathize with your position. As a chess teacher I applaud the efforts by those chess programmers who are applying techniques to get information out of the engines to help people improve at chess. There are a number of such efforts going on. On Lichess, for example, the new miniboard pop-ups when hovering the cursor on Stockfish's variations are a fantastic help in that regard.

This topic has been archived and can no longer be replied to.