Using Lichess's Public Data To Find The Best Chess 960 Position

Comments on lichess.org/@/rdubwiley/blog/using-lichesss-public-data-to-find-the-best-chess-960-position/GCpB9WLH

assumption 2 that people hate draws might be shaky.
Or it is uneeded.

Is it because the very interesting question of initial position as dependent variable to outcome odds is easier to study away from drawish?

Because i find all the odds equally interesting. I also like the drawish ones... that is the whole point of 960 no, to avoid people using opening knowledge to gain an advantage via rote memory for example.

I would then rather use the drawish 960 position as the real aim of 960. and the others as failures...

But i think this is a good blog question. I would need to read it carefully.. I just stumbled on assumption 2.

BradyIsSystemQB

Are these results even statistically significant?

Even if all positions were equal, just by chance one of them wold have the worst performance over the sample, and another the best performance.

rdubwiley

@BradyIsSystemQB said in #3:
> Are these results even statistically significant?
>
> Even if all positions were equal, just by chance one of them wold have the worst performance over the sample, and another the best performance.

Sample size is definitely an issue as we're doing 960 comparisons. In a perfect world, I would put these proportions into PYMC or something similar and do comparisons that way, but I don't currently have the time for that and I don't know that something as silly as this really warrants it. As such, I wouldn't take any of this as scientific evaluation of anything.

There's also a discrepancy between the opening explorer and what I see in my aggregated data. I believe Lichess is doing some additional filtering for the opening explorer. I did some additional processing and removed things like abandonments and looked at removing tournament games but couldn't match the reported proportions.

Even at thousands of games most of these positions that have a higher win % for black often converge to 49/47 for white once we look back to 2012. Here's an example where if you look July 2021-July 2022 it shows an advantage for black but if you look back further it converges to 49/47: lichess.org/analysis/fromPosition/brknrbnq/pppppppp/8/8/8/8/PPPPPPPP/BRKNRBNQ

However, you can still find positions that have better starting engine evaluations for white and have equal winning chances for white and black (among 2012 forward). Here's an example: lichess.org/analysis/fromPosition/rnbkrbnq/pppppppp/8/8/8/8/PPPPPPPP/RNBKRBNQ

dboing

edited

thanks for providing the notebooks and intermediate results data, files (ipynb, txt, csv). I just noticed them. Should help those wanting to reproduce some of your results, without having to redo everything from the source data, right?

rdubwiley

@dboing said in #5:
> thanks for providing the notebooks and intermediate results data, files (ipynb, txt, csv). I just noticed them. Should help those wanting to reproduce some of your results, without having to redo everything from the source data, right?
Yeah, this was my idea. The only thing someone would have to do is download the pgn files from the Lichess database and rerun the notebook.

dboing

@rdubwiley said in #6:
> Yeah, this was my idea. The only thing someone would have to do is download the pgn files from the Lichess database and rerun the notebook.

just a tip kind of question.. Is ipynb the same thing as juptier notebooks. or are they compatible (or minimal conversion rules)?
I have been used to their ancestors in mathematica (not python, but the cell concept). Lately for data analysis stuff I have noticed a lot of jupiter things.... Anaconda. I have not yet been serious with this, so I would have like your perspective.. sorry if tangent.

rdubwiley

@dboing said in #7:
> just a tip kind of question.. Is ipynb the same thing as juptier notebooks. or are they compatible (or minimal conversion rules)?
> I have been used to their ancestors in mathematica (not python, but the cell concept). Lately for data analysis stuff I have noticed a lot of jupiter things.... Anaconda. I have not yet been serious with this, so I would have like your perspective.. sorry if tangent.
Technically, ipynb is a type of format for the notebook and jupyter is a local server to allow you to interact and run notebooks, but practically speaking they're the same thing (VS code and nteract also work with notebooks that don't require you to run a local Jupyter server for example). I like notebooks because I can work with the data iteratively and not have to worry so much about the structure of my code until I really understand what I want to do. It's definitely something worth checking out if you're in the data space.

CharmingMoose

For further analysis maybe it is helpful to look at the pawns structure for every starting position after 15 moves (e.g.) and compare it with the result for getting the best play in different positions.

Is there a possibility to create heat maps with this data?

I am not a coder... so just take my ideas :)

dboing

#10

I think being a coder or data analyst is not an initiation requirement to think chess in light of such data analysis. So, I don't know about the op, but I for one would welcome all walks of chess to contribute to the questions adressed by data analysis or coding dependent tools used to understand the chess world.

A forum.. might be the right place to have many skills with common interest in chess being represented in a cooperative bridging kind of way.. That's my philosophy. :)