lichess.org
Donate

Open big pgn files

I have a big problem. I have downloaded from Lichess database the october games. Its 29 GB ! With wich program can i open so big pgn files ?
But then why available this database on lichess ? I think i had befoe a program on my pc that opened this big pgn but now i cant find anymore. Scid will not open it. Now i extract the database with 7zip zst. It need 1 hour 10 minutes. Please anybody help me to open this pgn file. THE FILE NAME IS : lichess_db_standard_rated_2022-10.pgn.zst
PGN files are just text files. Search for 'Large File Text Reader' apps, most of these will slice up the file as an output.

You can then load the PGN files into a database apps like Arena etc - although there may also be limits on the maximum size of a of database.

These things are best left on the web.
@nicolachess The best way to deal with this, assuming adequate software skills, is to grab some code in the best flavor you like which reads pgns ... or write your own ... but there's plenty of pgn readers out there complete with source pretty much in whatever lingo you like ... instead of leggos now it's software blocks ... just plug together what you need ...
@nicolachess said in #1:
> I have a big problem. I have downloaded from Lichess database the october games. Its 29 GB ! With wich program can i open so big pgn files ?
Well what I did was to prioritize the games. Rather than having all 90M+ games, you only grab what you need. Currently my settings are one side has to be 2400 at least. Been lazy but you can even go as far after setting that setting, you can delete all say anyone with 1200 elo. So the games would be one side is 2400+ and also the min rating is 1200 for example. I use Chessbase 17 and as of now it can handle 31M games from the beginning of lichess to now with these criteria
You can perhaps use the linux split command to split the file into as many files as you want. The simplest usage will be something like the following (in a terminal), which will split a file into 1000 parts

$ split -n 1000 file.pgn

Do this after extracting. But I think zip itself allows the split option, so any other modern zip/unzip tool would have the option. Check "man split" to see what other options split offers. It is also possible to specify the number of bytes in each file. I hope you won't face memory issues in using this tool.

You will face a minor problem. The split may happen in the middle of a game. Maybe you can fix this issue manually after the split. For example, you can check the tail of each file, say file 37, to see if it has been split in the middle of a game. If so, look at the head of the next file 38.pgn, and get the remaining part of the game from 38.pgn and add it at the end of 37.pgn and remove it from the beginning of 38.pgn. If you manage to run the split command, then the rest will be easy with small scripts (using e.g., bash, sed, awk, perl or python). But instead of this, you can simply ignore at most 999 games if you have split the file in 1000 parts.

After doing all this, you can use pgn-extract to sort the contents of all these files into ECO codes. For example,
"pgn-extract -E3 *.pgn" will create 500 files A00.pgn to E99.pgn. You can also apply other filters like minium rating etc. in the same command. For example, create a 'tagfile', and write the following in it:

WhiteElo >= "2400"
BlackElo >= "2400"

and then run

pgn-extract -E3 -ttagfile *.pgn

and you will get 500 files by ECO code, each containing only games in which both players have Elo at least 2400. See www.cs.kent.ac.uk/people/staff/djb/pgn-extract/help.html. This is a very powerful tool.

This topic has been archived and can no longer be replied to.