- Blind mode tutorial
lichess.org
Donate

Free chess game database with over 11 million games (Scid vs. PC database format)

Hello out there,

I've created a database with over 11 million games. I started based on several existing databases, some of which were already several years old in this form. A game collection has now been created from the following sources:

- The Week in Chess (One file per week with all possible tournaments)
- PGN Mentor (Extensive archive with individual files for players, openings, opening variations and various tournaments)
- Millionbase (Database until approx. 2017)
- Kingbase (A database project that was discontinued in 2019 and is now only available in the Internet Archive)
- DATABASE4U (A database of a user of chess.com)

The data preparation process

After merging the databases, a number of measures were taken to compress the database:

  • All games with less than 10 half-moves have been deleted.
  • All player names were corrected using Scid’s maintenance function, as far as Scid was able to do so.
  • All tournament locations and names have been corrected using Scid’s maintenance function, as far as Scid was able to do so.
  • All games in which both players have an ELO rating lower than 1800 ELO have been deleted.
  • All games without any ELO rating for both players have been deleted.
  • ECO codes have been added to all games.
  • All remaining games were checked for duplicates. The following parameters had to match in order to declare the game a double:
    • Exact matches for player names
    • The same player colors.
    • The same place.
    • The same year.
    • The same moves.

Contents of the database

- 11.494.169 Games
- 434.475 Player
- 93.713 Events
- 24.177 Locations
- 5.963 Rounds

The database is available for free at https://LumbrasGigaBase.de

How can you support me?

I love coffee! You are welcome to buy me a coffee!

The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem. But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on my website.

Regards,
Lumbra/Michael

Hello out there, I've created a database with over 11 million games. I started based on several existing databases, some of which were already several years old in this form. A game collection has now been created from the following sources: - The Week in Chess (One file per week with all possible tournaments) - PGN Mentor (Extensive archive with individual files for players, openings, opening variations and various tournaments) - Millionbase (Database until approx. 2017) - Kingbase (A database project that was discontinued in 2019 and is now only available in the Internet Archive) - DATABASE4U (A database of a user of chess.com) The data preparation process After merging the databases, a number of measures were taken to compress the database: - All games with less than 10 half-moves have been deleted. - All player names were corrected using Scid’s maintenance function, as far as Scid was able to do so. - All tournament locations and names have been corrected using Scid’s maintenance function, as far as Scid was able to do so. - All games in which both players have an ELO rating lower than 1800 ELO have been deleted. - All games without any ELO rating for both players have been deleted. - ECO codes have been added to all games. - All remaining games were checked for duplicates. The following parameters had to match in order to declare the game a double: - Exact matches for player names - The same player colors. - The same place. - The same year. - The same moves. Contents of the database - 11.494.169 Games - 434.475 Player - 93.713 Events - 24.177 Locations - 5.963 Rounds The database is available for free at https://LumbrasGigaBase.de How can you support me? I love coffee! You are welcome to buy me a coffee! The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem. But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on my website. Regards, Lumbra/Michael

I corrected the URL to the download website. Sorry for the inconvenience.

I corrected the URL to the download website. Sorry for the inconvenience.

So basically the database doesnt include any games prior to the introduction if the FIDE rating?

So basically the database doesnt include any games prior to the introduction if the FIDE rating?

It includes a lot of old games. Morphy, Capablance, Euwe...

I will do a recheck these days and maybe import all games from PGN Mentor again. Then most of the old games should be in there.

I might have imported and cleaned up in the wrong order.

I plan to update the database every two to three months.

A lot of the ratings of the old players were historical ELOs calculated after the introduction of ELO

It includes a lot of old games. Morphy, Capablance, Euwe... I will do a recheck these days and maybe import all games from PGN Mentor again. Then most of the old games should be in there. I might have imported and cleaned up in the wrong order. I plan to update the database every two to three months. A lot of the ratings of the old players were historical ELOs calculated after the introduction of ELO

I've updated the database file, imported all PGN-Mentor games again and cleaned up the database again. This means that there are approx. 200000 more games in the database. Including the latest update of TWIC (1525)

I've updated the database file, imported all PGN-Mentor games again and cleaned up the database again. This means that there are approx. 200000 more games in the database. Including the latest update of TWIC (1525)

Wow this is so cool actually. I searched this, i remember that i find 2.1 million game database from rebel website that was up to 2013. With pgnmentor's help i added modern players games and created and opening book over 100.000 games. 11 million is fantastic, great work! I can use this to add many games to my opening book so thank you.

Wow this is so cool actually. I searched this, i remember that i find 2.1 million game database from rebel website that was up to 2013. With pgnmentor's help i added modern players games and created and opening book over 100.000 games. 11 million is fantastic, great work! I can use this to add many games to my opening book so thank you.

I have searched the database for the number of games of some well-known, historical and current players and listed them on a separate page: https://lumbrasgigabase.de/notable-players-in-the-pgn-database/

I have searched the database for the number of games of some well-known, historical and current players and listed them on a separate page: https://lumbrasgigabase.de/notable-players-in-the-pgn-database/

Good effort, but there are many duplicates.
For example the game between Frisk and Hobber played on 2011.01.05 is present 3 times.

Good effort, but there are many duplicates. For example the game between Frisk and Hobber played on 2011.01.05 is present 3 times.

These games are from different sources. So that there are no duplicates is almost impossible. I will have a look at this, tomorrow. Maybe some of the values "Name, Location or Tournament" in a different spelling.

SCID is using these parameters per default to find out, what is double... And having a look - manual - at about 450.000 players - somehow impossible :D

These games are from different sources. So that there are no duplicates is almost impossible. I will have a look at this, tomorrow. Maybe some of the values "Name, Location or Tournament" in a different spelling. SCID is using these parameters per default to find out, what is double... And having a look - manual - at about 450.000 players - somehow impossible :D

There are many duplicates in the players' names too.
For example you have 3 duplicates games played on 2002.12.08 between Diulger, Alexey vs Chirila, C. - Chirila, C.. - Chirila, Ioan-Cristian

The best approach would be to spell-check the player names and events before deleting the games.
If you search duplicate games with the options "same moves", "same year, month, day" and "First 4 letters only" more than 2 MILLION duplicate games are found.

There are many duplicates in the players' names too. For example you have 3 duplicates games played on 2002.12.08 between Diulger, Alexey vs Chirila, C. - Chirila, C.. - Chirila, Ioan-Cristian The best approach would be to spell-check the player names and events before deleting the games. If you search duplicate games with the options "same moves", "same year, month, day" and "First 4 letters only" more than 2 MILLION duplicate games are found.

This topic has been archived and can no longer be replied to.