Free online Chess server. Play Chess now in a clean interface. No registration, no ads, no plugin required. Play Chess with the computer, friends or random opponents.
Sign in
Reconnecting
  1. Forum
  2. Lichess Feedback
  3. LiChess Monthly Database

The monthly database at over 60GB is totally unmanageable - attempting to reduce the games sans comments, annotations and with a standard PGN header takes literally days on my i7.

Its a valuable resource and a pity that only a handful of very high-end machine owners get to reduce the database to just the raw games.

The number of games played is increasing monthly and it might be an idea for LiChess to publish on a weekly cycle rather than the current monthly or even daily.

Maybe you use the wrong PGN reader?

This one went through lichess_db_standard_rated_2018-10.pgn (24,784,600 games, 52,750 MB uncompressed) in less than 2 minutes.
On an SSD (Samsung 850), Intel i7-6850K CPU @ 3.60 GHz.

github.com/niklasf/rust-pgn-reader#benchmarks-v0120

My problem is not reading the uncompressed - its removing the comments, etc as I stated above to reduce the database to just the moves which will vastly reduce the size of the database - I know as via the command line I reduced the September and October bases to mere megabytes.



You could use the rust-pgn-reader to do it in minutes instead of days. You'd have to learn rust to write the reducer, but you could probably learn the rust you need in less than the time it takes for your current job to run.

One way to reduce the size of the database would be to delete the games of sandbaggers, and there's a lot. Example: lichess.org/SEzRQYyZ/black
(good Lichess identifies one as sandbagger, unfortunately they miss the other one)
Here they played 9 games like this, but I saw some players playing 50 games like that.

wait how are they not banned yet lmao

@aryadew you are probably missing a point that some people do not really care about the raw games and they need comments as well. Do not assume that if the information is useless for you, it is useless for everyone.

Also on a pretty common machine with a python script I am able to parse the games in the way I need in ~ a day. But even if it takes 5 days, it is not such a big deal.

OK but why parse games in one day when you could do it in 3 minutes?

#5 No, because sandbag games are extremely rare, and I invite you to prove me wrong using the database.

@thibault
Well, it's very long to do it manually. Here is what I've found, but it's not completed.
Last month, 31,000,000 games were played, 1 million per day.
I wanted to find 1% (10,000), but I found only about 150. So I must admit you're right. BUT I've not finished, it's very long…

lichess.org/2grY5T10 9 games
lichess.org/QWXXqoB2 4 games
lichess.org/Cteq3I8A 9 games
lichess.org/PGjGIUu3 5 games
lichess.org/gmWLHkq5 61 games
lichess.org/cFrITjOE/black 15 games
lichess.org/PGGVEH6W 61 games

Please don't do this manually!

By the way I don't think quick checkmate means sandbagging. I only checked the last game, it looks like a legit mate in an ultrabullet game. Black was caught premoving.