lichess.org
Donate

Leela Chess Zero traps Stockfish with BLACK

lmao Leelas sense of humor to sacrifice 3 queens before checkmating
first game. move 25 placing the rook for white. any analysis as to what SF might have been seeing in its search tree, at some depth?

The word trap for me makes sense only for things on the board. There is also a meaning that is about position sequences and sharpness. It might just have been SF goggles placing own rook there that might have been narrow minded (yes projecting too myself some human things onto what is either programming or statistics at board level from many games brought in-game).

So I think it would be nice to explore the 2 machines thinking. In the case of Lc0 it might be about characterizing its training set time-series across its RL batches, in the case of SF, it might be about having a dump of its search tree at that position (or nearby).

none of which are available.
@dboing said in #3:
SF16dev:
1. e4 c5 2. Nf3 Nc6 3. Bb5 Nf6 4. Bxc6 dxc6 5. d3 Bg4 6. h3 Bxf3 7. Qxf3 e5 8.
Nd2 Nd7 9. a4 Bd6 10. Qg3 Qf6 11. Nc4 Bc7 12. h4 Qg6 13. Qh3 Nf8 14. h5 Qf6 15.
Qf5 Qe7 16. Be3 f6 17. a5 Ne6 18. g3 O-O 19. Ke2 Nd4+ 20. Bxd4 cxd4 21. Ra4 Rf7
22. Nd2 Qc5 23. Rha1 Re8 24. R4a3 Qe7 25. Rb3 (25. Nf3) ({Stockfish
dev-20230613-7922e07a: 1)} 25. Ra4 Qf8 26. Kf1 Qc5 27. Rc4 Qd6 28. Nb3 Rd8 29.
Kg2 Rb8 30. Rh1 Rbf8 31. Ra4 Qe7 32. Rha1 Rd8 33. Rc4 Qd6 34. Kf1 Re8 35. Kg1
Ref8 36. Rc1 Re8 37. Ra4 Qe7 38. Nd2 Qc5 39. Ra3 Ref8 40. Kg2 Qe7 41. Ra4 Qd6
42. Rca1 Qc5 43. Nc4 {[%eval 62,45]}) ({Stockfish dev-20230613-7922e07a: 2)}
25. Rb3 Bd6 26. Nf3 Rd8 27. Nh4 Qd7 28. Kf3 Rb8 29. c4 dxc3 30. bxc3 Bf8 31.
Qxd7 Rxd7 32. Ke2 Bd6 33. Rbb1 Kf7 34. Nf3 Bc5 35. Ra4 Rbd8 36. Rd1 b5 37. axb6
axb6 38. Ra6 g6 39. Nh4 gxh5 40. Nf5 Kf8 41. Ra4 Ke8 42. Rd2 Kf7 43. Rc4 Re8
44. f3 Rb8 45. d4 Rbd8 46. f4 Kg8 47. Rd3 Kf7 48. Ke3 {[%eval 61,44]}) ({
Stockfish dev-20230613-7922e07a: 3)} 25. Kf1 Bd6 26. Ra4 Bb4 27. Nc4 Rb8 28.
Kg1 Re8 29. Kg2 Bd6 30. Rf1 Rd8 31. Raa1 Re8 32. Rh1 Bb4 33. Kf3 Qc5 34. Ra2
Qf8 35. Kg2 Qe7 36. Rb1 Ref8 37. Raa1 Re8 38. Ra4 Rb8 39. Rf1 Re8 40. Rc1 Ref8
41. Rh1 Rb8 42. f4 Bd6 {[%eval 56,44]}) ({Stockfish dev-20230613-7922e07a: 4)}
25. Kf3 Bd6 26. Ra4 Bb4 27. Nc4 Rb8 28. Rc1 Re8 29. Raa1 Rb8 30. Rf1 Re8 31.
Kg2 Bd6 32. Rh1 Bb4 33. Ra4 Bd6 34. Rha1 Bb4 35. Rf1 Bd6 36. Raa1 Ref8 37. Rfb1
Re8 38. Kf3 Bb4 39. Ra2 Rb8 40. Rf1 Re8 41. Kg2 Ref8 42. Ra4 Re8 43. Rfa1 Ref8
44. f4 Bd6 {[%eval 56,44]}) 25... Bd6 26. Nf3 Bc5 27. Qg4 b6 28. axb6 axb6 29.
Nh4 Qf8 30. Nf5 Kh8 31. Qh3 Rb8 32. Nh4 Qe8 33. Qf5 Kg8 34. c3 Re7 35. Nf3 Qf7
36. Nd2 Rd8 37. Qg4 b5 38. c4 Rb8 39. cxb5 cxb5 40. Ra6 b4 41. Nc4 Ra7 42. Rc6
Rc7 43. Ra6 Kh8 44. Kf1 h6 45. Kg2 Qe8 46. Qg6 Qb5 47. Ra1 Bf8 48. Qg4 Qc6 49.
Qd1 Ra8 50. Rb1 (50. Rxa8 Qxa8 51. Kh3 Ra7) 50... Re8 51. Qf3 f5 52. Re1 Qa4
53. Qd1 f4 54. g4 f3+ 55. Kxf3 Re6 56. Kg2 Kh7 57. Nd2 (57. Qf3 Qxb3 58. Qf5+)
57... Be7 58. Qa1 Qd7 59. Nc4 Ra7 60. Qd1 Ra8 61. Rf1 Rf8 62. f3 Qa4 63. Qc2
Bg5 64. Rc3 Qb5 65. Rb3 Ra6 66. Rh1 Rf7 67. Rb1 Qb8 68. Qe2 Kg8 69. Qe1 Be7 70.
Rc1 Raf6 71. Nd2 Rf4 72. Qd1 Qd6 73. Qe2 Qe6 74. Kg1 g6 75. hxg6 Qxg6 76. Qg2
Qg5 77. Ra1 R4f6 78. Kh1 Rg7 79. Rg1 h5 80. Qh2 Rfg6 81. f4 exf4 82. Nf3 Qc5
83. g5 Bxg5 84. Qg2 Be7 85. Qxg6 Rxg6 86. Rxg6+ Kf8 87. Rg2 Qc1+ 88. Rg1 Qe3
89. Rf1 h4 90. Ng1 Ke8 91. Rf3 Qc1 92. e5 Qd1 93. Kg2 Qxb3 94. Ne2 Qxb2 95. Rf2
b3 96. e6 h3+ 97. Kf3 Qd2 98. Rh2 b2 99. Rh1 Qxd3+ 100. Kxf4 Qxe2 101. Rb1 h2
102. Rh1 b1=Q 103. Rxb1 h1=Q 104. Rxh1 Qxe6 105. Rh8+ Kf7 106. Kf3 d3 107. Rh2
Bg5 108. Rf2 d2 109. Rf1 Qf5+ 110. Kg2 Qxf1+ 111. Kxf1 d1=Q+ 112. Kg2 Qe2+ 113.
Kh3 Bf4 114. Kh4 Qg2 115. Kh5 Qg5# *
@megaman7de said in #4:
> SF16dev:
> 1. e4 c5 2. Nf3 Nc6 3. Bb5 Nf6 4. Bxc6 dxc6 5. d3 Bg4 6. h3 Bxf3 7. Qxf3 e5 8.
.....
> Kh3 Bf4 114. Kh4 Qg2 115. Kh5 Qg5# *

is that the full search tree? thanks for trying. looks like one PV. I would need something impossible to get.. the full searched tree of all nodes considered and their dispatching subtypes per heuristic base cases (right now all heuristics in recursive form have equal syntactic weight, but at execution we get the full picture with chess potential interpretability, it just gets systematically lost on demand by the algorithm to make room for the future).

the debug mode i speak of, I think would be an ELO regression at first, nowadays, all engines of such type have foregone spilling their entrails like that.. but thanks of the PV. I might have meant another position too, where the idea of bringing out rook felt better that the other branches.. I think the PV might help, seems to be very deep search or extension ending with a mate... wow.. so the question is what was the rest of the tree made of, for this to be the best branch?

a legal mate is not a sure bet... the path along needs to be min-max vetted, and guided somehow by past iterative depths (which the historically name TT might have had a say about in that direction of iterative depth). I can't explain further , would get head ache.. I just think that whatever the source code intricacies and difficulty of reading and bringing back to chess land interpretation, a full dump in some debug mode of what the TT forgets, might help not having to distangle the code, optimized as it is.

edit: not expect such output to be readable as text file at first.. It would need some binary format first, and nowadays plenty of modern statstics can help slice up in many useful way such peek into engine workings on just one position. And that chain of manipulation could be made full reproducible (well I think one should try). this is not a magic layer. but such code intricacy would require such attempts.. yet, assurance that all the chess interpretable data that made the engine decide would be at hand, not forgotten for some uncharacterized pools yielding ELO gains we accept as sole satisfying performance feedback right now.

edit: I did not check the details of the sequence above.. is this the same SF. I don't think so. Is this meant to repeat that it does not matter the entrails the diviner is working with, since SF 16 dev (which mutant or instance?, secondary question, but I welcome the ideas of those who want to use such language about SF global optimization framework), is able to win from some related position (against what).. or is that the PV from some position as I first thought.

in any case, I was not talking about single sequence.. (all engine types, need sometimes to give some crumbs to chess land, but I think that would require some imagination of the top design of engine competition, what are they still optimizing for really? clunky robot wars destroying each other? sorry it slipped. getting late must end with something funny looking... my bad.
Impressive game indeed! I am glad that i am not the only one who don't know how to play well against the space advantage!:P
Reference post 4. @dboing
I analysed it using Lucas chess to see the values of the moves by Eco rating.
White bad choices: (Elo rating of moves)
24. R4a3 (1942) Better was Nc4
34. c3 (1948) Better was Qg4
35. Nf3 (1940) Better was Kf1
39. cxb5 (961) Better was Kf1
There were many more....

The worst of the moves was move 74. Kg1 (Eco 320) Why would an engine blunder like that?
Did another application gain priority over the engine?

https://i.postimg.cc/0yky4Hmr/Eco-Rating-of-moves.png
@Toscani the problem with your analysis is that you are using a much weaker engine/hardware to determine the value of the much stronger engine's moves. If anything, it should open your awareness to how weak whatever you are using with Lucas chess is.
> (Elo rating of moves) .

It seemed at first to me, to mean that each move along a game could have different rating assigned to them.

The context is that Lucas chess has a battery of engines of different codes origin (we have no clue about behavior per position though, we can only infer from full games win ratios made ELOs attached to each engine, in some relevant pool engine where they got their ratings).

So using the the battery of engines as a scale of sort in reviewing what each preferred. When put like that I get what you are doing, however I find ELO rating of moves a contraction. It is more "analysis by engine with ELO rating label" or better English phrase to the same extent. Sorry to sound picky that way, I like to keep the engine sources of information in check.. They should not spillover more than they seem to have done already. (personal opinion).

Not a comment about the post content or chess meanings.

Edit: However, I just looked at the lucas interface, before shutting it down, and I can see that ELO and engine scores mingle a lot in the interface labelling. Might I missed some clue in past year, about ELO and position evaluation. I understand the notion of ELO performance, it is like on lichess for example, puzzle theme performance ratings (as restriction on the pool of puzzle IDs and player game pairings with them), or for tournaments versus ongoing rating time series. again would be restricted to the tournament events (games and pools of players). But ELO performance within games... I would like to have some background links if anyone have them.