lichess.org
Donate

Stockfish 14+ NNUE or Stockfish 16?

Hey, so yesterday I installed Stockfish 16 on my mobile device and I've noticed pretty big differences in position evaluations when compared to Stockfish 14 NNUE on Lichess. For example the following position, analysis board on Lichess evaluates it as +2 for white.. but Stockfish 16 on my phone evaluates it as only +0,8 (depth 45). So which engine should I trust?

[Event "Gandalf Gambit: Starting position"]
[Site "lichess.org/study/GkWkjZvi/7DywHE3P"]
[Result "*"]
[UTCDate "2021.10.19"]
[UTCTime "11:02:25"]
[Variant "Standard"]
[ECO "B01"]
[Opening "Scandinavian Defense: Icelandic-Palme Gambit"]
[Annotator "@Adam_Prikler"]

1. e4 (1. Nf3) 1... d5 2. exd5 Nf6 3. c4 e6 4. dxe6 Bc5 5. exf7+ Kxf7 *
I don't know so much about hardware, but I think analysis on a mobile may be less reliable than that on a reasonably modern desktop. On the other hand ...

I found the line in a database of openings analysed with Lc0, and it has an evaluation of 0.28. But this is Lc0 analysis, which doesn't map exactly to SF analysis. Nevertheless, according to the database, it is a roughly balanced position. On his scale and classification, 0-0.25 is balanced, 0.25-0.40 is unbalanced, and > 0.4 is very unbalanced, and according to him 0.25 roughly maps to 0.30 on Stockfish scale. The source of my database is a github repository by jhorthos (a repository called lczero training). You might want to take a look there for some interesting data.
I ran the position through SF16 and got an eval of +1.24 after 1 billion nodes. I also ran it through a cloud engine, and it got +1.10 although I don’t know what SF they are using. Generally, your homefish should perform better than lichess stockfish in almost every situation. My standard protocol for a position that I am unsure of is to run test games between engines and see the result.
Thanks for replying.. the thing is, before SF16 I was using SF 15 of course (on my phone), and the evaluations corresponded pretty well with what analysis board on Lichess would show (on a PC).
And now that I upgraded to SF 16 all of a sudden the evaluations are crazy off... Like in any position that SF 14 says is winning like +4 or something, the SF 16 is like chill, it's just +1,5... So I don't know what to think, is anybody else experiencing this?
I don’t have that problem often although it seems that you look at a lot of exotic positions that are uncommon, so the engine evals can vary greatly. These sharp positions can change the eval a lot between different SF versions, so as new stockfish versions come out and lichess remains on the same level, the difference can increase. For example: r2q1rk1/2pn1p1p/1p2pPp1/p5B1/1bpP1N1Q/2N5/PPB2P1P/3K4 b - - 0 16
The lichess engine says that the position is winning for black before going to 0.0, but my SF 16 with NNUE evaluates it as +2.71 and the engine games did show that white wins every time. Lichess engine is good enough for most positions, but for the offbeat and sharp openings you look at, the difference is going to vary a lot.
I wonder if Lc0 would give more reliable evaluations in the opening phase, while SF would be better in positions requiring deep calculations. In general I think that if an engine gives an evaluation of 3 or 4, it better give concrete lines showing how the evaluation eventually translates to material advantage or dangerous passed pawns or a position that is reasonably easy to convert. Otherwise it is hard to believe.
@Adam_Prikler said in #1:
> Hey, so yesterday I installed Stockfish 16 on my mobile device and I've noticed pretty big differences in position evaluations when compared to Stockfish 14 NNUE on Lichess. For example the following position, analysis board on Lichess evaluates it as +2 for white.. but Stockfish 16 on my phone evaluates it as only +0,8 (depth 45). So which engine should I trust?
>
> [Event "Gandalf Gambit: Starting position"]
> [Site "lichess.org/study/GkWkjZvi/7DywHE3P"]
> [Result "*"]
> [UTCDate "2021.10.19"]
> [UTCTime "11:02:25"]
> [Variant "Standard"]
> [ECO "B01"]
> [Opening "Scandinavian Defense: Icelandic-Palme Gambit"]
> [Annotator "@Adam_Prikler"]
>
> 1. e4 (1. Nf3) 1... d5 2. exd5 Nf6 3. c4 e6 4. dxe6 Bc5 5. exf7+ Kxf7 *

@Adam_Prikler

As A Bot Creator, You Could Use Any But I Would Prefer Stockfish 16.
Real Stockfish 16 evaluates +1.6 for white, not +0.8. Chess System Tal 2 evaluates even + 2.2. Berserk 11.1 only +1.3.

www.computerchess.org.uk/ccrl/4040/

All of these engines are much stronger than latest versions of former top engines Fritz,Hiarcs or Shredder. So it really does not matter which engine you use. Just use the one you are most comfortably with.
If you're playing stuff like that, why even bother with engines?

This topic has been archived and can no longer be replied to.