Linking top players based on openings
I am sharing some experiments that I have been conducting recently. The games were gathered through MegaDatabase 2021 with slow games from 1990 to 2021. I filtered such that at least one of the players needed 2700+ elo. I filtered such that players had more than 100 games from each color. The graph only features players who had more than 30 games with a rating over 2750.
The method is an extension of the method I used in the previous post. The research article can be accessed through this link and my previous post can be accessed here. This is based on the fact that the player-opening relationship can be formulated into a bipartite graph. However, I decided to use a weighted bipartite graph (adding weights to the edges), such that I can account for the proportion of players playing certain openings.
This part was just added for curiosity purposes. If you are not interested in the whole process it is fine to skip this part. I generated a Matrix player - opening as follows.
|A00 as white
|A00 as black
|E99 as black
|games A played A00 as white / number of white games played by A
|0.1 (1 out of 10 games)
The relatedness of the two players was determined using the following equation. I am just leaving it here for reference. This can be reached by multiplying the previous matrix M with its transpose M^T and doing some row and column operations.
Then I used biwcm from the bicm package to obtain statistically meaningful links.
I decided to use some interactive network visualization. I have generated 4 networks with increasing significance (allowing less relatedness)
link 1 only contains 4 connections. So if you want to see a more connected graph, you can start from link 2 and onward.
I'll be using link3 as my example. There is basically 3 use for this interactive graph.
1. You can filter for a certain player by using the top filter. This will allow you to see the player (node) and all its neighbors.
2. If you click on the player node, it will show the choice of the top 5 openings played for each color.
3. If you click on the connections, it will show you which ECO openings influenced the connection between the two players.
I wasn't able to find a way for the users to filter all the graphs based on significance thus decided on 4 different links. I also regret not giving an interactable version in the previous post. Have fun playing around with the networks.
The database itself had some issues. There were some games where the event date (when the game happened) and reported date (when the game was added to the database) were far apart and this would have caused some details to be misrecorded. I found that some games had wrong ratings for players maybe there were some wrong values for other variables as well. If you feel like there is a correlation that is wrong, that may be true. It is really hard to gather correct data for an extended period for chess games so please understand. For later posts with professional players, I will consider collecting pgn from the FIDE official website.
Also, this doesn't account for the choices of certain openings over others. For example, player A might choose to use opening O against weaker players while player B uses it only against stronger players. Player A played opening O when they were young but moved on to opening P later. These tendencies and changes are not reflected at all in this graph. There might be ways to deepen this analysis but that will need more time. Also since the filter was above 2700 from the first place, games played when a player was lower than 2700 are only included if the opponent was higher than 2700.
ECO classification also has its limits. maybe some ECOs can be classified similarly and should belong to the same category for this analysis. If you hover over the node Artemiev, you can see the top 2 most played are A13 and A14 both named the "English Opening". There is a fine line between where you should not distinguish and where you should distinguish these openings. For classification purposes, I decided to keep the original ECO classification. Some ECO classification covers fewer plys (half moves) than others and this might result in completely different preparations of openings being classified as one. However, it still is the most reasonable choice of opening classification that exists maybe I would test alternatives in some other post.
Feel free to leave comments about your thoughts about this post and what I should do next. I want to see lots of people participating in the comment section so please don't leave multiple long comments in the discussions. If you have feedback, keep it short and precise. Thank you for reading!