Science of Chess - Candidate Moves, David Marr, and why it's so hard to be good.

Comments on lichess.org/@/ndpatzer/blog/science-of-chess-candidate-moves-david-marr-and-why-its-so-hard-to-be-good/9NeSflea

tricksya

edited

I have expected more from the article.

Is not human chess mostly a game of patterns?
Why do we miss the pattern we are aware of?
How GMs store and replay hundreds of quite complicated patterns?
Why chess aptitude does not really improve with time after we reach certain age?

Those are the questions that came up in my mind when I was reading your article :)

Graque

@tricksya said in #2:
> I have expected more from the article.

I, on the other hand, didn't expect this article at all! I also didn't expect one blog post to answer all my hard cognitive questions about chess.

Anyway, thanks @NDpatzer for the really interesting article!

About the Marr framework, didn't AlphaZero show that GM-level chess should easily be within human hardware constraints? I believe AlphaZero only uses a few thousand neurons, and even with minimal search depth still performs at an IM level. I thought the question became why humans (and especially me in particular!) are so bad at chess, when chess is computationally so easy.

dboing

edited

I like those visual cognition blogs.... from the outside world of science that keeps growing and updating itself... brought back to chess community awareness... Shall I dare say it: we might need that. I have not read yet, just a general support comment.. I will read it for sure.

NDpatzer

@Graque said in #3:
> I, on the other hand, didn't expect this article at all! I also didn't expect one blog post to answer all my hard cognitive questions about chess.
>
> Anyway, thanks @NDpatzer for the really interesting article!
>
> About the Marr framework, didn't AlphaZero show that GM-level chess should easily be within human hardware constraints? I believe AlphaZero only uses a few thousand neurons, and even with minimal search depth still performs at an IM level. I thought the question became why humans (and especially me in particular!) are so bad at chess, when chess is computationally so easy.

Thanks for reading, and great question! My best answer is that while our brain has millions of neurons, there are still constraints on what we can do with them in terms of memory, perception, and other cognitive processes. For example, visual working memory (the ability to hold visual information in mind for a short time) has a capacity of 4-6 items. It's trivial to get computer hardware to store much more visual data with perfect fidelity! That's why I think it's so important to think about cognitive processes and their limitations when we think about how humans play chess. The scale of chess is well-met by modern engines, but not well-met by the human mind.

dboing

edited

the weights are the things to size up, not the neurons. But yes, one can blow up the size of brain with machines. Yet one can still waste that size by not allowing associations where it matters, or hard wiring that all move decisions cumulated make for equally distinct boards to think about and find best moves or human best moves.

Also, the architecture benefit (wet or silicon) is not only about size, but about implied wealth of representation a priori accessible during training (well and the training environment information angle of exposure and scope of exploration).

I still think, humans do certain things faster, from its high parallelism (the mighty visual cortex, and beyond ... in there).
The machines are still crawling emulators of that. The intelligence is still in the programming of the well formulated problem.

A0 would have had a hard time with PGN and bitboards, or zobrist keys as input. maybe the FENs or EPDs would have been something...

dboing

> How to Choose a Chess Move 2005 by Andrew Soltis (Author)
Might be a recent enough complement to your references, for the existing chess models of thinking.
There is a chapter I focussed on, to see where chess has gone for theories of thinking.
> 7: The Four Thinking Models
It does include and criticize the kotov model.

I also wonder why the impass on (3). the size of blog, or that there is a lot of catching up to do, even at more high level?

Also, I never quite got the sense that there was clarity about working memory versus short term memory. It does not sound like there is only one fast storage module. I think I liked a more dynamic model, where the working memory, indeed of the same numbers of critters, is more about juggling stuff from long term and short term sensory buffer to be held on some decision stage (being a box also in the 4 or 5 slots of that working memory). I generally find, subjectively and trying to understand such few modules memory models, that there are more layers in the memory dynamics and more times scales.. seconds, minutes, hours, even the day versus the week.. as so forth. I am not a professional in that field, but I still have modeling abilities and abstraction juggling, and I have been following in the past some of that science.. Mostly from either the physiology side or the machine learning side. The connexionist view I guess. Not so much the high level models. So at its research front. more exposed to the received ideas being taught in psychology undergrad courses. In my time it was that short term memory was about retaining woodpeckered perceived items like auditory phone number digit sequences. and that the population storage size was distributed between 5 and 7 (it could have been that everyone would have day to day or topic per topic fluctuations within). That time scale was very short, and it was often introducing short after the long term memory effect of chunking to make it easier to memorize more than the limited short term one digit at a time memory one was stuck with. It appears to me now, from my own reasoning from crumbs abilities, that there might have been a missing high level model cog there, and it might be why I prefer the more numerous time scales of memory modules.. the work done to chunk the very short term memory would make sense in that intermediate working memory calling in associative long term memory processes than could latch on the short term memory chunks to become elements of the working memory slots... something like that. do you have clarity on this, while we have you here....

Graque

@NDpatzer said in #5:
> Thanks for reading, and great question! My best answer is that while our brain has millions of neurons, there are still constraints on what we can do with them in terms of memory, perception, and other cognitive processes. For example, visual working memory (the ability to hold visual information in mind for a short time) has a capacity of 4-6 items. It's trivial to get computer hardware to store much more visual data with perfect fidelity! That's why I think it's so important to think about cognitive processes and their limitations when we think about how humans play chess. The scale of chess is well-met by modern engines, but not well-met by the human mind.

But what about the AlphaZero-style neural network seems unrealistic for the human mind? That neural network basically looks at a board position, and a possible next move, and outputs a sense of how likely that move is, and also who's better in the game. You mentioned that humans can't store more than 6 items in memory; this isn't required for AlphaZero's neural net.

To me, AlphaZero's neural net seems similar to what chess masters can do, and also explains why they can get a basic feeling for a game at a glance, without calculation.

dboing

edited

I agree. A0/LC0** is NN based machine learning of the reinforcement learning kind, not the imitation learning kind that one could construe supervised learning to be, if based on an oracle trainer (possibly like SF NNue big master Network training, the one that should be talked**** about first for chess interpretation, not its small efficient progeny we use to analysis our positions). Both all NN training is based on mathematical models that represent some sort of statistical learning. It is how we build associations (or many polarity values, and across various scale in time spacing between objects to be tested for association). The basic mathematical formalism that allowed the current mainstream awareness of AI circus shows (someone has to take that stance sometimes, that we could be critical past the surprise), is based on the basic, "Once upon a time, there was a probability law".

So I do agree, for the intuition or pattern internalization part of what is a learned expert in chess. LC0 is kind of our best model.

but it is not that good at the learning part. although, the dilemna between exploration and exploitation that one self play batch trajectory has to deal with, may lead to bubble expertise. (see recent A0 paper, at least, they can do theory, while LC0 does arena reproducibly).

** why do people stick with the "kleenex" trademark imprint, we have a better open source one in LC0, where it can actually be more scientific communication, from the bottom being shareable and reproducible, yet missing the math. model that A0 was forced to give in lieu of the means of reproduction... no source code, what are you gong to do, to give some confidence this is not magics of the reporting.

**** user documentation should start with that, and then for the tech savvy talks about the juicy implementations or even NN architecture prowess that allow the speed primary directive to be still within accuracy improvement potential... for they might not be always compatible.....).

NDpatzer

#10

@Graque said in #8:
> But what about the AlphaZero-style neural network seems unrealistic for the human mind? That neural network basically looks at a board position, and a possible next move, and outputs a sense of how likely that move is, and also who's better in the game. You mentioned that humans can't store more than 6 items in memory; this isn't required for AlphaZero's neural net.
>
> To me, AlphaZero's neural net seems similar to what chess masters can do, and also explains why they can get a basic feeling for a game at a glance, without calculation.

I think I may have misunderstood your initial comment. Re-reading it, I see that you started with the suggestion that GM-level chess should be within human hardware constraints based on AlphaZero's scale. I'd argue that human GMs already showed us that - they play as well as they do with their human hardware. Your next question seems to be why an ordinary player can't repurpose a few thousand neurons for playing chess so that AlphaZero can be as good as it is with a network of the same scale.

I think my response about what seems unrealistic (though I wouldn't go that far - again, GMs exist!) is that there *are* things required for AlphaZero to work (to my understanding anyway) that can be and are compromised during ordinary chess play by a patzer like me. For example, you mentioned that AlphaZero "looks at a board position" but already that's an important difference between the network and humans - I certainly don't always accurately encode the configuration of all the pieces on the board, and this is one way I make mistakes. AlphaZero doesn't have to worry about getting its look at the board through a visual system that has processing bottlenecks, etc. Likewise, AlphaZero's weights (again to my knowledge) reflect storage of many, many games and positions in a manner that I think is essentially noise-free. That's also not the case for the human network.

The thing is, I guess on some level human players that are very strong eventually DO repurpose some fraction of their biological network to store patterns, etc. and that's part of what I think is interesting to understand. They have do it with a nervous system that isn't just trying to do that task, however, which is where these cognitive and neural constraints come in.