@dboing said in #35:
>With respect to RL training. I is not clear to me how the mesh density has been made constant through all game phases during the chess play quality changes during training. i.e. I is quite possible that the very random exploration has created very dense outcome feedback of small game lengths, hence exploring more opening accidents than endgame accidents. In RL subsequent training batches would have smaller exploration potential and better quality of play, which I intuit to mean likely to make longer games decided on more subtle differences but not necessary of same mesh density over potential accident prone landscape. I may have lost some audience here, hopefully not the quoted ones. low energy, being lazy. sorry.
For those not familiar with the acronyms in neural net programming; RL is reinforcement learning - a behavioral learning model where the algorithm provides data analysis feedback; so the future program behavior is based on past performance.
Tactics is all over chess - it cannot be avoided.
Non neural net engines avoid tactical loss in non game completion scenarios with some sort of material evaluation system; such as the 9-5-3-1 standard humans sometimes use. Such engines are much more complicated than that though because they have human coded information concerning positional features, and special routines for certain endgames.
Neural net engines have nothing like that - they have no programming corresponding to material values, etc. Neural net engines backtrack game completion positions in a complex way to arrive at a probability value for positions.
Stockfish is complex. It has code to evaluate terminal positions on the fly - either by a neural net or by 'classical evaluation'. Its choice depends on the position. If it uses the neural net evaluation at the terminal positions, then it converts that to the internal evaluation system used by the classical evaluation.
I think it is clear that the "mesh density", as dboing called it, is the issue with lc0 doing so poorly on the STS-rating suite. I did not post my results on running tests on tactical suites, but the same poor performance happens with lc0.
This is really easy to understand. Suppose, for the sake of argument, that the training set never had a Knight fork winning material that subsequently cost the game. Then lc0 would probably not see a Knight fork tactic, unless it was looking at a sufficient number of nodes in its search so as to reach positions as terminal nodes that it already had information on as being losing.
The upshot of this is that lc0 does poorly on tactical and positional suites because those positions are outside of its training, and those positions don't lead to its known positions given the parameters being used in the search. lc0's performance in games against other engines is more likely to lead to positions lc0 can handle.
Switching topics to using engines to help evaluate games and positions...
Clearly Stockfish is the best choice today because its mixture of classical and neural net evaluations is more likely to give a good answer. Stockfish also runs well on typical hardware. If one has time to run lc0 also, then interesting things might pop up. However, most people want a quick analysis of their game that points out tactical mistakes. They do not want to spend hours analyzing a speed chess game.
Trying to get any engine to help understand positions with respect to strategic or positional features is a problem. They don't speak in that language and instead give us variations with "scores". There has been, and continues to be, work done in chess programs that output a natural language description of positions. In the meantime, the following page can sometimes help get strategic/positional information:
- Stockfish static evaluation.
hxim.github.io/Stockfish-Evaluation-Guide/- Under Graph, there are a *lot* of positional categories. Each is a link to the page with the javascript code that approximates Stockfish code.