Which openings are related?
Before reading this post
The games in this analysis were lichess-annotated 10+0 rapid games from August 2023. More about the data can be found at the lichess open database. I did not have the chance to renew my database recently. The study includes players with more than 30 games in both White and Black. I did not filter players based on their ratings.
The theory behind this post
The techniques and theory behind this post can be found in a published article. This is an excellent article that I wish some people would take a peak. I only conducted the relatedness graph part but the paper goes deeper into predicting future opening repertoire from certain players and analyzes the complexity of openings. Some of these topics I might consider diving deeper in a later post.
In this specific study, the relatedness between two chess openings is considered as many people having both of the openings inside their repertoire.
There are some details I changed in my data.
- Used 10+0 games: I assumed people would play more well-prepared opening on the longer time format
- Filtered 30 games from both colors: My data is relatively small (gathered in one month) compared to the data gathered over more than a year from the paper.
- Did not filter based on rating: some of my games were played by low-rated players while the paper filtered ratings above 2000.
- The ECO codes and descriptions I gathered are from a former lichess post: the link. The ECO codes added up to 500 types of openings.
As a result, my data had 2724686 games from 3495 players resulting in 897 openings differentiating openings from black and white perspectives.
From the validating process from bicm, some nodes didn't have a significant connection to some other nodes. An example was the Bird's Opening. Some people played this opening but the connection players having this opening in the repertoire was not so significant. Furthermore, the dataset I used for this study was gathered in a relatively short period resulting in some openings not being played by many people and the varying elo might have resulted in additional noise. You can open the image in another tab to view it better.
I decided to remove a lot of text from the image. If you want to see the graph will all the text, it is in this link.
Probably you saw the thumbnail and thought the White Queen is wrong and I think everyone knows Caro-Kann and French are different. This can be seen in the connectivity graph. If I filter French and Caro Kann from this graph, they are shown on opposite sides. Caro Kann is colored Red, and French is colored Blue. The White Queen is on the weirder end of the spectrum playing both openings. Most people who play French don't play Caro-Kann and vice versa.
There are limitations to this study. When people consider openings "related", it is a combination of different components: strategic ideas, pawn structure, positional ideas, and more. Probably there exists an absolute similarity of positions that only depends on the chessboard. The graph that we make using these techniques is based on the consensus of players' repertoire resulting in related chess openings = many players play both openings together. Although there are limits, the graph still provides insight into which combinations of openings people tend to pick.
It has been a long time since I posted an actual post and I would love to know what people want to see from my future posts. Thank you for reading my blog.