@Wodjul said in #70:
- Overlearning
E. effect? see NDpatzer blogs. I forgot the spelling of the E.
Hammer and nails. Hammering training.
There exist a more pragmatic and mathematical language or formalism. MAchine learning is based on that question. But it need 2 direction of errors. in generalization from training to unseen test set of input data. (positions).
So three different source of thinking about this. I only knew the mathemtical one from ML.
The problem has been made reproducible and testable. Not only qualitative.
The "over" problem also comes with the under problem or question. One has the 2 questions to consider, and finding the room in between is the theory of learning research or art creativity of effort.
The problem definitoin, needs to be well formed. And I am sorry to say, but I think SF NNue do not realize it or do not think it important to share as much as the meaningless ELO numbers. If we don't ask.. but I do.
What is taken as known, and then was is used to defined the learning objective, and the dataset sampling construction and relation etc.. I wonder, if having had something always developped at code level, and working well enough, is not its own training that one does not need to know why it works. or what it means about the chess data information flow in the learning problem.
one can get lost is pairs of words that mean the same thing in the end. my point is that there are 2 types of misgeneralization.
sometimes people talk about the NN parameter set obtained after some training. is having a case of underfitting. this would mean that one has use a fucntion space that is not flexible enough to express the phenomenology fujnction complexity (this is not combinatorial complexity, unless one is careful about what is being counted... and it is not about the number of positions.. although that might have some effects in case of mis gen.
the othercase. with big computers and big NN sizes (number of layers or layers numbers of units or both or else).
can be sometimes using the words overfitting to go with the oppsite word in the pair.
Overfitting the data, is actually meaning it is fitting even the quirks of the training data that are possibly measurement error, but in chess we don,t have that. well not when the problem is well stated (and I am still wainting form SF to get its act together, but lazy to find out, waiting for it to come from the grapevine magics. been burned in the past seeking such basic information).
In chess. The generalizatni problem is not from input having some typos. or in SF NN stuff, from the training SF oracle score to be having error, The undefined truth there is that the phenomenology is exactly SF score at the target vector output givens. That the NN will have to fit one the big set of position inputs. I am basing this on SF blog crumbs. They have a wiki now. The exhaustive search part has been well extracted from the encrypted programming language encoding (laughing) back to higher level where chess users can hope to undertand how the search part independent or modular from the leaf evaluation part (the NN being component).
I did not dare to be disappointed again for the other wiki part. And I had bad glimpses that not the whole dev team is eager to cooperatoin with the writer, which did the more ubitquitous exhaustive search engine model presentation, pretty well (but I had already an understanding, about it, from previous rabbit hole, explaining current lazyness. Source code is the worst user manual surrogate. And the readme files. Well, on the other more obscure part of SF that is their current target of improvement I gather, well some orbit repository have kept previous misleading sentences that had rumors persisting about NNue using reinforcement learning, and then using leelas data, in one sibiline sentence in SF16 blog. The two sorry to say sloppy communication from the SF team to the user popuation, in spite of the wiki effort, have made me waste ramblings. something wrong with me.
anyway.. there is a lot of prevoius work on that quesiton, and many tools and we might just be missing cultural curiosity. or knowledge of their existence. We have lots of chess data sitting on servers, some inert less informative versions (liek the puzzle database, missing a lot of the real data). but there is the lichess opening sequences database (not the explorer, which uses that database internally to attribute single name to input porisiont based on some policy of name priority from which branch contains the position, if many opening sequences or sequences of named connecte segments contain the postions. they have a shorter one wins the name, or also a popularity rule. Anyway. The thing is there is plenty of position dataset , some well restricted and many, mostly pre lichess, obscure non-reproducible ones.
So misgneeraliastion in chess is crucial here. About repetition on the same positoin very often if the training set is itself not represenatative of the acutal wilderness probability of ranom encounter (timeout: one might count on swarming convergence of the latest novelty, or that within the unspecified axiom 1 duration of the prolem of improvment from not yet improved to improved some notch, it might be argued that in some tournament event player number or maximal position in vogue might be not as wild as all of chess. The chances that some old historical playable but not hot, positoins that is not a novelty but is not usual encoutner nowadays, might be small.
it depends on how much and which pragmatic level one is thinking. So, I find that we can get lost there
and i prefer a clearer approach. Which need discussion, but which is need some other kind of pragmatism about specifying exaclty what your put in #2. which is only half the story.
Actually in chess, there is also, the notoin of learned pattern taught by examples. becasue training of a constructed or chosen set of many positions (which ones is part of the questions I think should be part of discussion).
The first exposure and the interaction with others using language or pattern defition. One can also be trained with the same set and also have the same kind of gneralization problems or actualy positive effect.
I guess there might be 4 problems in chess theories of learning if one stops being gun ho on one magic bullet theory., and actually consider that the concious and many head cooperative dwarf still surviving in the culture, we could call chess theory rebuilding efforts (another problem is being very shy with being critical AND constructive, also if liking one book, or one authors, not being able to be surgical, but in general, I find the lack ot cooperation and habit of disscussion like here might be to have been retarding chess theory and chess learning theory, for a while. For some reason, I suspect the confusio between performance and learning.
but what do I know. i would like to share mnore. but I think I would need questions. It is hard to know where to start, when all we know of each other is that chess is our common interest. and even that, which chess.
@Wodjul said in #70:
> 2. Overlearning
E. effect? see NDpatzer blogs. I forgot the spelling of the E.
Hammer and nails. Hammering training.
There exist a more pragmatic and mathematical language or formalism. MAchine learning is based on that question. But it need 2 direction of errors. in generalization from training to unseen test set of input data. (positions).
So three different source of thinking about this. I only knew the mathemtical one from ML.
The problem has been made reproducible and testable. Not only qualitative.
The "over" problem also comes with the under problem or question. One has the 2 questions to consider, and finding the room in between is the theory of learning research or art creativity of effort.
The problem definitoin, needs to be well formed. And I am sorry to say, but I think SF NNue do not realize it or do not think it important to share as much as the meaningless ELO numbers. If we don't ask.. but I do.
What is taken as known, and then was is used to defined the learning objective, and the dataset sampling construction and relation etc.. I wonder, if having had something always developped at code level, and working well enough, is not its own training that one does not need to know why it works. or what it means about the chess data information flow in the learning problem.
one can get lost is pairs of words that mean the same thing in the end. my point is that there are 2 types of misgeneralization.
sometimes people talk about the NN parameter set obtained after some training. is having a case of underfitting. this would mean that one has use a fucntion space that is not flexible enough to express the phenomenology fujnction complexity (this is not combinatorial complexity, unless one is careful about what is being counted... and it is not about the number of positions.. although that might have some effects in case of mis gen.
the othercase. with big computers and big NN sizes (number of layers or layers numbers of units or both or else).
can be sometimes using the words overfitting to go with the oppsite word in the pair.
Overfitting the data, is actually meaning it is fitting even the quirks of the training data that are possibly measurement error, but in chess we don,t have that. well not when the problem is well stated (and I am still wainting form SF to get its act together, but lazy to find out, waiting for it to come from the grapevine magics. been burned in the past seeking such basic information).
In chess. The generalizatni problem is not from input having some typos. or in SF NN stuff, from the training SF oracle score to be having error, The undefined truth there is that the phenomenology is exactly SF score at the target vector output givens. That the NN will have to fit one the big set of position inputs. I am basing this on SF blog crumbs. They have a wiki now. The exhaustive search part has been well extracted from the encrypted programming language encoding (laughing) back to higher level where chess users can hope to undertand how the search part independent or modular from the leaf evaluation part (the NN being component).
I did not dare to be disappointed again for the other wiki part. And I had bad glimpses that not the whole dev team is eager to cooperatoin with the writer, which did the more ubitquitous exhaustive search engine model presentation, pretty well (but I had already an understanding, about it, from previous rabbit hole, explaining current lazyness. Source code is the worst user manual surrogate. And the readme files. Well, on the other more obscure part of SF that is their current target of improvement I gather, well some orbit repository have kept previous misleading sentences that had rumors persisting about NNue using reinforcement learning, and then using leelas data, in one sibiline sentence in SF16 blog. The two sorry to say sloppy communication from the SF team to the user popuation, in spite of the wiki effort, have made me waste ramblings. something wrong with me.
anyway.. there is a lot of prevoius work on that quesiton, and many tools and we might just be missing cultural curiosity. or knowledge of their existence. We have lots of chess data sitting on servers, some inert less informative versions (liek the puzzle database, missing a lot of the real data). but there is the lichess opening sequences database (not the explorer, which uses that database internally to attribute single name to input porisiont based on some policy of name priority from which branch contains the position, if many opening sequences or sequences of named connecte segments contain the postions. they have a shorter one wins the name, or also a popularity rule. Anyway. The thing is there is plenty of position dataset , some well restricted and many, mostly pre lichess, obscure non-reproducible ones.
So misgneeraliastion in chess is crucial here. About repetition on the same positoin very often if the training set is itself not represenatative of the acutal wilderness probability of ranom encounter (timeout: one might count on swarming convergence of the latest novelty, or that within the unspecified axiom 1 duration of the prolem of improvment from not yet improved to improved some notch, it might be argued that in some tournament event player number or maximal position in vogue might be not as wild as all of chess. The chances that some old historical playable but not hot, positoins that is not a novelty but is not usual encoutner nowadays, might be small.
it depends on how much and which pragmatic level one is thinking. So, I find that we can get lost there
and i prefer a clearer approach. Which need discussion, but which is need some other kind of pragmatism about specifying exaclty what your put in #2. which is only half the story.
Actually in chess, there is also, the notoin of learned pattern taught by examples. becasue training of a constructed or chosen set of many positions (which ones is part of the questions I think should be part of discussion).
The first exposure and the interaction with others using language or pattern defition. One can also be trained with the same set and also have the same kind of gneralization problems or actualy positive effect.
I guess there might be 4 problems in chess theories of learning if one stops being gun ho on one magic bullet theory., and actually consider that the concious and many head cooperative dwarf still surviving in the culture, we could call chess theory rebuilding efforts (another problem is being very shy with being critical AND constructive, also if liking one book, or one authors, not being able to be surgical, but in general, I find the lack ot cooperation and habit of disscussion like here might be to have been retarding chess theory and chess learning theory, for a while. For some reason, I suspect the confusio between performance and learning.
but what do I know. i would like to share mnore. but I think I would need questions. It is hard to know where to start, when all we know of each other is that chess is our common interest. and even that, which chess.