As a coder recently turned into a chess enthusiast, one of the best aspects of Lichess is its open source ethos and making all games played in the platform freely available to download. This enables many interesting applications such as the development of machine-learning based chess engines, analyzing W/L/D frequencies at different time controls and player ratings and how often certain types of positions (like opening systems or endgames) show up in games between players of different rating classes, etc. The Lichess database makes it possible to download all games played in the platform in a given month. The most current file (September 2020) contains over 68 million games. This is 16.8 GB compressed and over 130 GB uncompressed.
Needless to say, having such huge files makes it difficult to work with this data unless you have a substantial amount of storage in your computer that you can dedicate just to your chess-related projects. However, these files include games of all time controls, and in particular a massive number of bullet games. In most applications it probably makes sense to focus on a specific time control (e.g. rapid and/or classical when wanting higher quality games or blitz due to its popularity). Because of this, I would like to ask the Lichess devs if it would be possible to provide downloadable files split by time control in order to make the use of these data feasible for more people.
As a coder recently turned into a chess enthusiast, one of the best aspects of Lichess is its open source ethos and making all games played in the platform freely available to download. This enables many interesting applications such as the development of machine-learning based chess engines, analyzing W/L/D frequencies at different time controls and player ratings and how often certain types of positions (like opening systems or endgames) show up in games between players of different rating classes, etc. The Lichess database makes it possible to download all games played in the platform in a given month. The most current file (September 2020) contains over 68 million games. This is 16.8 GB compressed and over 130 GB uncompressed.
Needless to say, having such huge files makes it difficult to work with this data unless you have a substantial amount of storage in your computer that you can dedicate just to your chess-related projects. However, these files include games of all time controls, and in particular a massive number of bullet games. In most applications it probably makes sense to focus on a specific time control (e.g. rapid and/or classical when wanting higher quality games or blitz due to its popularity). Because of this, I would like to ask the Lichess devs if it would be possible to provide downloadable files split by time control in order to make the use of these data feasible for more people.
Hello,
You might be interested by the "lichess elite database" project which filtered all (standard) games from lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games: https://lichess.org/team/lichess-elite-database
Otherwise you could use this very fast png parser which works on compressed dbs to sort games which could interest you: https://github.com/niklasf/rust-pgn-reader
Have fun!
Hello,
You might be interested by the "lichess elite database" project which filtered all (standard) games from lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games: https://lichess.org/team/lichess-elite-database
Otherwise you could use this very fast png parser which works on compressed dbs to sort games which could interest you: https://github.com/niklasf/rust-pgn-reader
Have fun!