Analysed Games (Database for Deep Learning)

RichardRahl

Hi there,

I am working on a Neural Network that can evaluate any chess positions and can explain its reasoning.
E.g explain what features lead to the evaluation in a human understandable form.

So far I was succesful Training the Neural Network, but there is one catch. I could only train it on annotated games that use Stockfish at Depth 20. Both Engine Games and Human games.

Do you know of a PGN Database that has annotations e.g {@depth30: +0.2} for each move?
The higher the depth the better. Both Engine and Human games are needed for optimal training.

Cheers,
Richard

dboing

edited

This might be interesting. But I would be curious about more specifics to "successful training". This is not difficult to stipulation, and does not require to divulge implementation details in order to help people help you about designing and constructing databases, to fit with the framework of training you have used or are ready to use.

Don't do the same mistake as others to not make it easy to figure out the overall flow of information, machine learning is that before it is weight updating.

database reproducible construction specifications (not just how many, but statistical characteristics: how generated)
>1) which means not only positions specs (the X)

>2) but also the target (the Y, whether annotation classes or scores, or else, classification problem, or density estimation, regression, the target specification will tell).
AND
> 3) finally what is the objective function content in terms of chess-able words (no need to give the equation details, but its intelligible content: how does Y data enters it).

You can leave cross-validation precautions, there would not be the gist (although there would need to be that under the hood).

I think you have a good idea. A first post has to start somewhere. I hope I am helping.

RichardRahl

There is

ccrl.chessdom.com/ccrl/4040/

but the time given to the engines to calculate is just not a lot.

@dboing I just have handcrafted features, that humans can understand and learn from. The Neural Network uses these features to evaluate any given position. As of now I am confident that the Neural Network is predicting the stockfish evaluation given the features quite well. (Exceptions are positions with a lot of threats and attacking peaces)

I can then use the Neural Network to reverse engineer the importance of the provided features and then give feedback to the user what features are important in that exact game position. Its basically a coaching from the computer.

I just like to have a stronger ground truth to train the Neural Network on. The better the annotation (CP) centiepawns from Stockfish, the more accurately the NN learns. I do not have the resources to analyse thousands of games myself. :))

dboing

edited

You input vector and your output vector as it enters the loss function that you are minimizing with the neural network training? Let me try.

Edit: got too big for here. I will use your inbox. and maybe i can post something better here after.

Thanks for the link above. It is a ladder at first sight. I thought it would let me understand the database construction for your training. I think it would be nice to share that here. (but we can talk in inbox too).

boilingFrog

120 million games should be plenty for initial training, no ?

dboing

edited

To get some background I suggest reading the papers introducing maia.

All the work prior to Neural Network training, actually, where they really do a good job at characterizing the data, beyond just the size of it. The quality and spread over many player pairing levels should give room for well constructed data sets for training.

Size of database is not the only thing to look at. Although having a big size reduces sampling flukes, having a well spread primary database over the possible future experience of a player or engine might matter more (i don't know how to refer to the mother-load, versus the subset database sufficient for proper cross-validated training).
If the primary is not that well spread to start with, find methods of sampling toward such quality.

Both aspects actually matter, but one can waste a lot of repetition if blind to the "spread" characteristics. Lots of work with reduced information to extract. One could wonder why it took self-play reinforcement learning for Neural nets to become visible in the chess community. I suspect that the distribution of positions in the human databases, even if huge, may have some answer to give. Fortunately, lichess is good on many counts.

Also, we don't have many ways to measure what spread might mean. We could tell oh year. that position has nothing to do with that other one. But that might not be enough to test for the "spread" or distribution or mesh size aspect. (these are sloppy synonyms find the intersection and voilà).

I wonder though if all the SF evaluations are updated to the same versions.
You could keep the version along your experiments.

RichardRahl

@axlkit thats a good idea, I will check the database. It is just important that the stockfish evaluation is at a high depth.
I require that because I do not want to train the NN with incorrect position evaluations.

@dboing nice to see that you are really interested in this. I read your msg, I reply here. So everybody can read here as well.

You are right, I am not using FEN positions as input. Indeed I calculate "magic" numbers i.e features.
These features are intended to be understandable by humans and rather simple. Otherwise the NN can not coach us.
The goal is to make evaluations explainable. It is not to beat stockfish.

Actually I do not only compute the features from FEN positions, but also from the whole game history.

Since you are curious about what features I calculate, here are a few of the most important ones (in total I have over 100 features)

(with important I mean if the values are changed slightly the NN will predict different results)

["last move by"] // Can be 1 or 0 (white or black)
["Half-Open C File {black}"]
["{history} {castle} 0-0 {white}"]
["P == p"] // Means equal count of Pawns
["Open D File"]
["Fianchetto King Side {white}"]
["King Side Pawn Majority {black}"]
["{history} {castle} 0-0 {black}"] // Gives the NN information if black casteled in the past
["Half-Open E File {white}"]
["Open C File"]
["Half-Open D File {white}"]
["b == n"] // Means Black has an equal count of Bishops & Knights
...

the more useful features the better. Later the user will only be shown the relevant features, that matter in the given position.
I am still on my way inplementing new features, there are many features that stockfish uses as well. www.chessprogramming.org/Evaluation
And there are features that I come up with on my own / put together from the internet.

dboing

edited

@RichardRahl said in #8:

> You are right, I am not using FEN positions as input. Indeed I calculate "magic" numbers i.e features.
Just to make more precise what i meant by magic number. I did not mean the signal from chess input, but the parameters in your algebraic or functional formulation. Also, since we are on a open source platform, I think, keeping your encoding secret is not really making this discussion as interesting as it could. At least you could give some list of the features (the set does not have to be optimal, but at least give us a scope of your experiments dependent or controlable variables).

> These features are intended to be understandable by humans and rather simple. Otherwise the NN can not coach us.
> The goal is to make evaluations explainable. It is not to beat stockfish.

That is where I find your thread appealing, and why I mean that you are not alone in wanting that. I would call that an engine as an actual tool for post-game analysis by a human. Not one that is optimized to beat a series of similar engines via some finite set of tournament formats with rigid contraints on hardware and competition factors, all of which are far from human play and experience.

However, I would put some "bémol" on your association with correct evaluation and depths of evaluation. The SF design is built around a fixed material counting (1,3,3,5,9, ?) (?=W,D,L), for the actual position information static evaluations. Not all nodes explored are being evaluated with full position information (the static evaluation that gives scores). So keep in mind that there is a bias toward only looking for material conversions at some remote depth. But in chess we really care not about those (do we?), we care about the final conversion of any advantage, not only material. What is the material value of a mate? That can get complicated, but even when making that complicated, this discontinuity problem persists (just with many branches mixing types of things to convert or not).

> Actually I do not only compute the features from FEN positions, but also from the whole game history.
Well, that is interesting. and actually informative. I would like to know more. Are you of the open source, open data, and open science philosophy? I am.

> Since you are curious about what features I calculate, here are a few of the most important ones (in total I have over 100 features)
OOOPS. sorry, I am impulsive. I should have read further... . But this fits with not overwhelming the reader, well done (I am bad at that....). Apologies.

> (with important I mean if the values are changed slightly the NN will predict different results)
>
> ["last move by"] // Can be 1 or 0 (white or black)
> ["Half-Open C File {black}"]
> ["{history} {castle} 0-0 {white}"]
> ["P == p"] // Means equal count of Pawns
> ["Open D File"]
> ["Fianchetto King Side {white}"]
> ["King Side Pawn Majority {black}"]
> ["{history} {castle} 0-0 {black}"] // Gives the NN information if black casteled in the past
> ["Half-Open E File {white}"]
> ["Open C File"]
> ["Half-Open D File {white}"]
> ["b == n"] // Means Black has an equal count of Bishops & Knights
> ...
>

> the more useful features the better. Later the user will only be shown the relevant features, that matter in the given position.
the crux and where my inbox ramble matters. Be careful how you tangle all your features, from a quantitative point of view. That is also where the "magic" numbers problem arises (not the features, or the input values transmitted to the NN first layer, but the formulation parameter values). Do you do any tuning? Or discussion of those parameters associated to how you feature become ordered quantitative input of Neural Nets (because that is their "language" they only work with quantities, be it the trivial binary order if obligatory (not much to tell as a function of 0,1).

> I am still on my way inplementing new features, there are many features that stockfish uses as well. www.chessprogramming.org/Evaluation
> And there are features that I come up with on my own / put together from the internet.

SF has already been labeling some code variables with human feature names. But we can't make heads or tails of those, because of the magic numbers accumulating since SF1. So I advise toward being careful in your feature set. Throwing all of them there sure will make for more fitting neural net over the training set, but watchout for the testing set .

The more parameter you put into your formulations of features (whether fixed or tuneed) the less discrimination of which label is a factor do you get. Neural Nets like a0 lc0, and NNue, gave up on that to have maximal fitting&predict ability for their Training/Testing partitioning of sample database (that is where i need one more word that does not add confusion).

Motherload Database = lichess (of some category constraint set or all of it)
???? Database = correct size and "spread" subset of motherload from which cross-validation (best way to do it) or simple random partitions (means disjoint), are taken to form the training set and the disjoint testing set.

Succesful training means successful generalization (if you aim at SF as target, then you need to predict stockfish over input not part of training). This is a double (or more) optimisation problem. Best training AND Best testing under the objective function. That is why besides the target vector, one often finds what is called a regularization term in the loss function (or whatever chess engines schools like to call it).

So now I had insisted on target definition. And you new statement about history of positions as part of input process remains as a further question.

Back to the X of the Y in NN(X(game)) approaches Y(X(Game)) on the dataset (with cross-valiadation measure as criterion of fit). i.e. what matters is not how you fit the data, but how you predict outside of the data. (hence the test partition). Sorry for the repeats, I am hoping it helps.

There is no ASCII (at least not on my keyboard with some contorted non-touch typing dance of the fingers) for Converge to, approaches, of fits... math where are you?

Since you don't have a mathematical formulation, perhaps sharing your code as last resort could help some of us make one high level model for you, that would allow to see your overall information flow, beneficial for you, and for me trying to explain the warning about it matters what you put and how you put it, in the salad of features. I disagree with the "more" the better.

Even without engines, having a flat set of features is what makes human theory look shaky and only good for hindsight (which I don't believer it is). Because experienced players have already done the feature salad pruning intuitively and only mention the winner in their annotations when going into positional arguments (some prefer to also forget the words and only keep their intuition as secret to be used in their own games only....;)

RichardRahl

#10

@dboing my code is public under github.com/Philipp-Sc/learning and almost everything is allowed, except commercial use is prohibited. This app is very early and unfurnished, it is indeed only intended for my own need. I think that is the best way to make progess. Once my chess improves using it, I might promote it.

The NN just looks one time at the feature vector. I do not use a search tree like Stockfish. It could be added, but the goal is not to get the perfect evaluation score, but to find out what features contribute to the positions evaluation.

In contrast to Stockfish I can do expensive feature calculations, because its only calculated once. And not for each node.

I still need to implement some complicated features, namely:

- Over protection by Pieces And Pawns
- Outposts: Number of Squares that can never be attacked by a opponments pawn and that is free and protected by my own pawns
- Backward pawn: Number of squares occupied by a pawn that can never be protected by another pawn
- connected passed pawns
- connected pawns
- passed pawns

Longest Diagonal Pawn chain (Left To Right, Right to Left)
-> Bishop pin Knight Pattern nf6/Bg7, nc6/Bb7, Nf3/bg4, Nc3/bb3 (H7) Knight On A/H File
Pawn Fork, Knight Fork, Rook double attack,..

King safety (see wikipedia - pawns before king, etc)

Pinned direction
Knight attack
Bishop xray attack
Rook xray attack
Queen attack
Pawn attack
King attack
Attack
Queen attack diagonal
Pinned

Some will make less sense, not help much evaluating the position, but the good thing with Neural Networks is that one can find out which features are of no use and then not use them.

I was just thinking are there grandmaster databases with stockfish evaluations, I think there should be, but not easy to find, I guess.

I already have a NN model, that kinda works, predicts about 70% of the positions correctly.
But with all the features I still have to add and test, I will wait publishing something you can play with. Takes a lot of brain power to implement all of these features. github.com/Philipp-Sc/learning/tree/main/src/js/eval

This topic has been archived and can no longer be replied to.