I‘ve taken a look at this site - and yes, very different compared to my version of a database.
But I don‘t think, the content doesn’t really fit for my plans. F.i. I deleted all computer games from the database, as far as I could find them.
My main focus is on classical games. Yes I added the lichess Elite Database, but mainly because there are only plays above 2400 in it. These guys normally play there openings while sleeping - regardless of the time control.
But in general, this site is worth to creat a database of. The provided files are all PGN, so this shouldn’t be a problem for anyone.
I‘ve taken a look at this site - and yes, very different compared to my version of a database.
But I don‘t think, the content doesn’t really fit for my plans. F.i. I deleted all computer games from the database, as far as I could find them.
My main focus is on classical games. Yes I added the lichess Elite Database, but mainly because there are only plays above 2400 in it. These guys normally play there openings while sleeping - regardless of the time control.
But in general, this site is worth to creat a database of. The provided files are all PGN, so this shouldn’t be a problem for anyone.
A new version of the database (version 2024-02-27) was uploaded yesterday.
The database will be updated weekly, usually Tuesdays, after the release of the most recent TWIC file. The following files will be uploaded:
- Database files (si5, si4 format)
- A differential PGN-file containing the new games since release of the last database.
- A monthly PGN-file, containing the new games
F.i.: The current database has been released on 02/27/2024. The last database of march will be released on 03/26/2024. These are 4 weeks, so the monthly update file will contain all weekly updates between the two releases.
A new version of the database (version 2024-02-27) was uploaded yesterday.
The database will be updated weekly, usually Tuesdays, after the release of the most recent TWIC file. The following files will be uploaded:
- Database files (si5, si4 format)
- A differential PGN-file containing the new games since release of the last database.
- A monthly PGN-file, containing the new games
F.i.: The current database has been released on 02/27/2024. The last database of march will be released on 03/26/2024. These are 4 weeks, so the monthly update file will contain all weekly updates between the two releases.
123
@Lumbra74 said in #32:
A new version of the database
Hey man, thanks for awesome work. How can I remove online games from database in Scid vs Pc, thanks again.
@Lumbra74 said in #32:
> A new version of the database
Hey man, thanks for awesome work. How can I remove online games from database in Scid vs Pc, thanks again.
@Lumbra74 said in #1:
Hello out there,
I've created a database with over 11 million games. I started based on several existing databases, some of which were already several years old in this form. A game collection has now been created from the following sources:
- The Week in Chess (One file per week with all possible tournaments)
- PGN Mentor (Extensive archive with individual files for players, openings, opening variations and various tournaments)
- Millionbase (Database until approx. 2017)
- Kingbase (A database project that was discontinued in 2019 and is now only available in the Internet Archive)
- DATABASE4U (A database of a user of chess.com)
The data preparation process
After merging the databases, a number of measures were taken to compress the database:
- All games with less than 10 half-moves have been deleted.
- All player names were corrected using Scid’s maintenance function, as far as Scid was able to do so.
- All tournament locations and names have been corrected using Scid’s maintenance function, as far as Scid was able to do so.
- All games in which both players have an ELO rating lower than 1800 ELO have been deleted.
- All games without any ELO rating for both players have been deleted.
- ECO codes have been added to all games.
- All remaining games were checked for duplicates. The following parameters had to match in order to declare the game a double:
- Exact matches for player names
- The same player colors.
- The same place.
- The same year.
- The same moves.
Contents of the database
- 11.494.169 Games
- 434.475 Player
- 93.713 Events
- 24.177 Locations
- 5.963 Rounds
The database is available for free at LumbrasGigaBase.de
How can you support me?
I love coffee! You are welcome to buy me a coffee!
The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem. But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on my website.
Regards,
Lumbra/Michael
Can you create a blog post about this instead of a thread so it can be more easily accessible? (it will have its own thread too)
Also, before deep-diving into anything else, I want to know that are the latest games from 2023 and 2024 in the database or you are working on it?
@Lumbra74 said in #1:
> Hello out there,
>
> I've created a database with over 11 million games. I started based on several existing databases, some of which were already several years old in this form. A game collection has now been created from the following sources:
>
> - The Week in Chess (One file per week with all possible tournaments)
> - PGN Mentor (Extensive archive with individual files for players, openings, opening variations and various tournaments)
> - Millionbase (Database until approx. 2017)
> - Kingbase (A database project that was discontinued in 2019 and is now only available in the Internet Archive)
> - DATABASE4U (A database of a user of chess.com)
>
> The data preparation process
>
> After merging the databases, a number of measures were taken to compress the database:
>
> - All games with less than 10 half-moves have been deleted.
> - All player names were corrected using Scid’s maintenance function, as far as Scid was able to do so.
> - All tournament locations and names have been corrected using Scid’s maintenance function, as far as Scid was able to do so.
> - All games in which both players have an ELO rating lower than 1800 ELO have been deleted.
> - All games without any ELO rating for both players have been deleted.
> - ECO codes have been added to all games.
> - All remaining games were checked for duplicates. The following parameters had to match in order to declare the game a double:
> - Exact matches for player names
> - The same player colors.
> - The same place.
> - The same year.
> - The same moves.
>
> Contents of the database
>
> - 11.494.169 Games
> - 434.475 Player
> - 93.713 Events
> - 24.177 Locations
> - 5.963 Rounds
>
> The database is available for free at LumbrasGigaBase.de
>
> How can you support me?
>
> I love coffee! You are welcome to buy me a coffee!
>
> The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem. But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on my website.
>
> Regards,
> Lumbra/Michael
Can you create a blog post about this instead of a thread so it can be more easily accessible? (it will have its own thread too)
Also, before deep-diving into anything else, I want to know that are the latest games from 2023 and 2024 in the database or you are working on it?
I added a blog post:
https://lichess.org/@/Lumbra74/blog/lumbras-giga-base/fK37nLCn
@bartroth said in #34:
Hey man, thanks for awesome work. How can I remove online games from database in Scid vs Pc, thanks again.
You're welcome!
You can remove the most online games, if you search for the extra tag "SOURCE" and the value "LichessEliteDatabase". The result can be deleted via the maintenance menu.
@bartroth said in #34:
> Hey man, thanks for awesome work. How can I remove online games from database in Scid vs Pc, thanks again.
You're welcome!
You can remove the most online games, if you search for the extra tag "SOURCE" and the value "LichessEliteDatabase". The result can be deleted via the maintenance menu.
Come here to say how I did it. Search general, pgn contain text blitz , ignore case, then maintenance window, delete filter games, compact database. I am now down to about 8.000.000 games, hopefully all standard. :) Thanks again.
Come here to say how I did it. Search general, pgn contain text blitz , ignore case, then maintenance window, delete filter games, compact database. I am now down to about 8.000.000 games, hopefully all standard. :) Thanks again.
@bartroth said in #38:
Come here to say how I did it. Search general, pgn contain text blitz , ignore case, then maintenance window, delete filter games, compact database. I am now down to about 8.000.000 games, hopefully all standard. :) Thanks again.
The problem is games from many sources don't say blitz or rapid anywhere, but just have a TimeControl tag. The pgn format has a specification, but I think it should be updated, and then people should follow it, which is not going to happen. The same game from different sources may have even the main 7 tags written differently. Such things make curating any database a difficult task. It is admirable that OP is trying to do that.
@bartroth said in #38:
> Come here to say how I did it. Search general, pgn contain text blitz , ignore case, then maintenance window, delete filter games, compact database. I am now down to about 8.000.000 games, hopefully all standard. :) Thanks again.
The problem is games from many sources don't say blitz or rapid anywhere, but just have a TimeControl tag. The pgn format has a specification, but I think it should be updated, and then people should follow it, which is not going to happen. The same game from different sources may have even the main 7 tags written differently. Such things make curating any database a difficult task. It is admirable that OP is trying to do that.
@kajalmaya said in #39:
The problem is games from many sources don't say blitz or rapid anywhere
Just noticed "Titled Tuesday" .. Will sort this out eventually.
@kajalmaya said in #39:
> The problem is games from many sources don't say blitz or rapid anywhere
Just noticed "Titled Tuesday" .. Will sort this out eventually.