lichess.org
Donate
Snapshot of the Lichess Arena Rankings

https://lichess.thijs.com/rankings

Lichess Arena Rankings

LichessTournamentChess PersonalitiesSoftware DevelopmentChess
As we reach the milestone of 10 years of (regular) Lichess arenas, this blog post reflects on the hundreds of thousands of arenas that have taken place since then, and various of the rankings and statistics that I have been gathering over the past few years on the Lichess Arena Rankings webpages.

Introduction

Have you ever wondered who has won the most arena tournaments on Lichess? Or who has scored the most points in a single hourly arena? Or maybe you'd like to know what are the berserk rates in these events, or how the average rating of participants evolved over time? These and many more statistics are at your finger tips when you visit the Lichess Arena Rankings webpages, which have now been running for several years, bringing you all the data you never knew you needed.

Since today marks the 10 year anniversary of the first regular Lichess arena, taking place on April 10, 2014, today is a good opportunity to reflect on all the arenas that have taken place since then, and specifically all the records and statistics related to these events that can be found on the arena rankings webpages.

What started out as a hobby coding project, which I hoped might interest some people in the community, ended up as quite a massive project in terms of time to run/maintain and data to store. At the same time, it also led to various nice interactions with people from the community who appreciated these rankings, making all the efforts worth it.

In this blog post, I will first go over the "boring" details of how these rankings came to be, and the coding process that went into generating the website that you can now find online. Then I will discuss the results of this coding project: the website, the lists, the graphs, the records, fun facts, and anything else that may be worth mentioning. Finally I will give an outlook on the future, and what may happen in the next decade of arenas.

Creating the rankings

As a large part of the world was sitting at home during Covid-19, pondering what to do with their free time, at some point in 2020 I thought: wouldn't it be nice if we had statistics and data about the many, many arenas that have been held on Lichess so far? For instance, to see who has won the most arenas, who has achieved the highest scores of all time, and how many people participated in all these arenas? As Lichess was not making such data available to the public themselves (which I understand now, as it's quite resouce-consuming to maintain), a coding project was born: to build cumulative arena rankings, making use of the data of all the regularly scheduled arenas, and showing all this information for everyone to see. And unlike various other hobby coding projects, which get started but never properly get finished, with this one I actually pushed through to make sure it got done.

The Data

While I won't bore you with all the ugly details of how the rankings were built, and how the scripts could definitely use some refactoring to improve the code quality, maintainability, and efficiency, maybe it is nice to give some idea how the script works. For this, let me first describe all the data that is currently stored locally on my external SSD. The data consists of three main folders:

  • The arena data (1.7M files, 17.8GB). For each arena included in the rankings, I locally store the .json file one gets when querying the Lichess API endpoint /api/tournament/<id>, and the .ndjson file one gets when querying the Lichess API endpoint /api/tournament/<id>/results. This data is sorted into categories (bullet, blitz, chess960, crazyhouse, ...) and events (hourly, daily, titled, marathon, ...), similar to how the rankings are eventually organized on the website. Besides all this data, this folder also contains lists of arena IDs for which the data exists locally, so that the updater can compare against this list to see for which arenas the data still needs to be downloaded.
  • The arena rankings (118K files, 17.7GB). For each combination of category and event (e.g. "HyperBullet" + "Shield"), and for each type of ranking as on the Lichess Arena Rankings website (e.g. "Players by trophies"), this folder contains rankings based on events that happened so far. Generally this list of events is in sync with the folder of arena data, but it locally tracks which events have been included in these rankings so far to be able to robustly resolve any inconsistencies (e.g., in case the data downloading succeeded but the rankings updater crashed). Each of these rankings is complete, in the sense that it contains all users and all arenas that happened so far, rather than say just the top 1000 users. Apart from these rankings, this folder also contains cumulative user scores needed to generate plots such as this one; for the top users, the folder stores their progress over time, so that these graphs can be generated from this data.
  • The HTML webpages (11K files, 0.7GB). The website is hosted on GitHub, and each time the script runs, it generates fresh HTML pages with lists and figures to be uploaded to the servers, so that they become visible to the public. For each combination of category, event, and type of ranking, this includes (1) a webpage with the top list, and (2) a webpage with a graph visualizing the data (see this link for an example). As these webpages only show the top 1000 users with the most trophies, it is impossible for others to update these rankings based solely on the information on this webpage, as users outside the top 1000 are not listed there but could make it to the top 1000 later.

Although I would be happy to share the data with anyone who wants to do some of their own data analysis, as illustrated by the folder sizes (around 18GB each for the data and the rankings) there is no easy way for me to share all this data online. If anyone is interested and seriously wants to work towards transferring the data for their own analysis, feel free to contact me by DM.

The Code

With the above data structure in mind, the code flow of the rankings can be summarized roughly as follows, assuming a current ranking is being updated to incorporate new events that have occurred since the last update:

  1. Obtain the list of new arenas from Lichess. This involves (1) loading the lists of arenas for which data is stored locally in the data folder, (2) asking Lichess which arenas have taken place recently, (3) comparing this stream of events against the list of events for which data already exists locally, and then (4) recording the list of new events. Unfortunately, for this part there is no nice Lichess API endpoint to get the latest events in a proper way, so this currently requires scraping the various arena history pages and extracting the tournament IDs manually from those webpages.
  2. Fetch new arena data from Lichess. After acquiring the new arena IDs of all the new events, the scripts proceed with downloading the tournament information for these events. Concretely, this means querying the API endpoints /api/tournament/<id> and /api/tournament/<id>/results for all new events. The resulting json and ndjson files are then stored locally in the data folder, and the list of events in the data folder is updated to reflect these newly added events. (After finishing this step, the arena data folder should be up-to-date.)
  3. Update the arena rankings lists. For each ranking stored in the rankings folder, we retrieve the list of arenas included in this ranking, and compare it against the list of arenas for which we (now) have data in the arena data folder. If any new events were added, this means that the arena rankings must be updated, and this is done in the way you might expect: updating trophy counts, point counts, maximum scores, etc., based on these new arenas. Note that the time for these updates is mostly determined by the size of the rankings: for the hourly bullet arenas, the rankings file contains >700K unique users, so even updating based on one event may already cost quite some time, as the participants of this one arena may be anywhere in this list of users, and may move up and down this list as well. In comparison, updates to the monthly racing kings arena rankings are relatively fast, since no one plays those events.
  4. Update the arena rankings graphs (optional). Although updating the rankings lists is already quite expensive, this is a breeze compared to updating the graphs pages. To explain: if one looks at e.g. this graph, one can see the progress of the top 10 players in this category over the years. That means that for each of these users, we must figure out their cumulative scores over all events in the past. If a user is stable in the top 10 for this plot, then updating their cumulative scores is not so time-consuming, as we already have their cumulative data stored somewhere and we just need to update it based on the new events. But occasionally players from outside this top 10 (or even top 100) will enter the top 10, and no such cumulative data is stored locally yet. The script then needs to go over all the past arenas, look for the user in all these arenas, and generate cumulative rankings accordingly. For big rankings this takes ages to run, and therefore updating the graphs is only done very sporadically.
  5. Update the webpages. After the previous step, the data and rankings should both be up to date, and the script can start generating the updated webpages. For any list requiring updating, this is done from scratch; it just throws away the old HTML file, and builds the new HTML pages using the data in the rankings folder. Updating the webpages is very fast compared to the other steps, as we only upload a small part of the rankings to the servers. (Note that I cannot add more data to the website, as GitHub has limits in terms of the sizes of their websites, and the sizes of commits to the associated repository.)

After completing these steps, the website is up to date up to the point of when the script started gathering the list of new arenas from the Lichess servers. As the time it takes to run the script is generally longer than an hour, this means that when the website finally gets updated, it is already no longer up to date due to new arenas that took place since it started gathering the IDs of new events. This however is inevitable due to the frequency of arenas taking place, and the (lack of) speed of the Lichess API for gathering all the necessary data to update the rankings.

Going Live

After all the scripts were in a decent shape to set up the above code flow, some time in September 2020 I started running the script to fetch all the arena data up to that point, starting from scratch. As by that point we had already approached the 400K arena mark, and the API quickly rate-limits you when you try to get too much data from the servers, just running the initial script took about a month(!) to get all the data for a first version of the website.

On October 1, 2020, the initial version of the GitHub repository went live, with the first version of the rankings appearing online not long after. Through word of mouth, the site slowly but steadily got more viewers, and the website got polished further based on feedback and requests from others. Over time, some new additions were added to the rankings webpages as well, including:

  • The graphs. The progress graphs of the top 10 users (such as this one), as well as visualizations of the arenas themselves (such as this one), were not available at the start, and were only added later. This mainly required getting familiar with matplotlib for generating plots, formatting them nicely, and making sure they automatically get formatted and displayed in a nice manner.
  • Titled arenas and marathons. As these events are not as regular as other events, and are planned manually by the Lichess team, including these events in the rankings also required some manual changes to the code on my end. The statistics about titled arenas and marathons are still some of the most popular pages on the website, judging from feedback received over time.
  • The Lichess Bundesliga. During the Covid-19 quarantaines, the Lichess Bundesliga took off in popularity, and at some point these were also included in the webpages. The statistics are based on including all different divisions from every event, and are only about individual performances; for team statistics, the Rochade Europe website (in German) has all the statistics you may ever want.
  • Additional rankings per event/category. Did you know you can also find plots of, for instance, average ratings in certain arena categories over time? Or find out other completely useless information, such as a list of the arenas with the highest berserk rates? Some of these additional lists, besides just "players with the most trophies/points", were also added later on, based on what could be done with the data, and what might be interesting to add.

Recently, no updates have been made to the website, other than running the update script to update it based on the latest tournaments.

The Lichess Arena Rankings

With all the introductory stuff out of the way, let us now take a look at some of the rankings themselves. To quote the main page, which includes all arenas:

This ranking is based on 872K events held between Apr 10, 2014 and Apr 10, 2024. In total, these arenas featured 493M games (with 32B moves), and the 181M participants (3.0M unique players) scored a total of 1.2B arena points. In these games, white scored 50% wins, 3% draws, and 47% losses. The average berserk rate is 10.2%, and the average rating is 1686.

That is quite a lot of data! After hundreds of millions of games played in all these arenas, the data has spoken and concluded that indeed, averaged over all these games, white has a slight edge, scoring 50% wins, 3% draws, and 47% wins for black. Including all these arena events, a player has roughly a 1/10 chance of seeing their opponent go berserk, and the average rating faced in all arenas is slightly below 1700.

Statistics per category

Of course, all these statistics vary wildly between different variants, time controls, and types of events. To get an idea of the breakdown of the games/tournaments into these categories, the statistics page has quite a lot of data. To highlight a few of these plots here, the following plot shows how the arenas are distributed over the different variants/time controls. The inner ring further shows the distribution over different categories (hourly, daily, weekly, shield, titled, ...), with color coding similar to the colors used on the website.

As one can see, Bullet, SuperBlitz, and Blitz together cover more than half of all arenas. If we then further add all the "classical" chess variants with different time controls (Rapid, Classical, HyperBullet), we get around 75% of all arenas, meaning that variants which are not really "chess" (including UltraBullet) together make up for about 1/4 of all the arenas. Out of these variants, CrazyHouse is by far the most popular with the most events.

To elaborate on the 50% win rate for white, it is further interesting to note that this win percentage varies quite wildly between time controls and variants. The following plot summarizes how well white scores for each combination of a variant and an arena category.

In the left 7 columns, covering the main chess variants, we see a quite steady score for white around 51-52%, with only a slightly higher score for white in the Elite and Titled arenas; at the higher rating bands, the white advantage seems to matter more than in the lower rating bands. For variants however, the white (dis)advantage varies from one variant to another, with white scoring best in Atomic and Three-Check, and white scoring less than 50% in Horde. Apparently only having a bunch of pawns to storm at your opponents king does not always pay off.

Finally, let us take a look at one of the plots from the lower part of the statistics page, illustrating how long games last for different variants.

As one can see, the average regular chess game lasts about 35 moves per player (except for UltraBullet which isn't really chess). Out of the variants, the games in Horde last the longest, as it presumably takes a while for black to capture all white pawns and win the game. Games end the quickest in Racing Kings (wait, is that actually a real variant?), Atomic, and King of the Hill.

Amazing Achievements

Apart from the aggregate statistics discussed above (see this page for more of that), let us take a look at some of the most astonishing records on these rankings pages achieved by Lichess users.

Overall Victories: @Hoegi

First of all, we have to congratulate @Hoegi for his amazing tournament victory streak over the past few years, amassing over 4600(!) tournament victories so far. Perhaps the clearest way to visualize how far ahead he is of the rest (in terms of tournament victories) is the following plot, showing the top 10 players with the most arena victories over all the >800K arenas included in the rankings to date:

Whereas the battle for 2nd place and below is still quite close, it is clear that no one can match Hoegi's unparalleled strong and consistent performance, scoring victory after victory, day after day. In the last few years, he has averaged over 1000 arena victories per year (mostly in Hourly HyperBullets, and some in Hourly UltraBullets), which corresponds to about 3 arena victories per day, every single day, for the entire year! Even if he would stop now, and others are somehow able to match his pace of tournament victories, it would take others years to catch up and beat his record.

Marathon Victories: LM @Lance5500

Another legend of Lichess is of course @Lance5500, who has managed to score 11 victories in Lichess marathons. With no one else winning more than 2 marathons, as the following graph shows, this is a record which will likely never be beaten.

Although these marathon victories were achieved in the "early days" of Lichess, when the battle for first place was not as strongly contended as it is now, winning 11 marathons shows dedication and perseverance. Apart from these 11 victories, he has also scored 7 silver medals, 3 more top-10 finishes, 11 more top-50 finishes, and 1 top-100 finish, giving him a total of 33 marathon trophies on his profile. Although this definitely looks very impressive on his profile, trying to fit all these trophies on his profile must be quite a challenging task for Lichess developers.

Arena Participations: @german11

No list of Lichess legends would be complete without perhaps the greatest legend of them all, @german11. Having played the most games ever on Lichess, with 632K games (and counting), it is no surprise that he takes the top spot in some of the rankings as well. Among others, he holds the record of the most arena participations out of everyone on Lichess.

His presence is mostly felt in the faster time controls, as he takes the top spot overall in terms of Bullet and SuperBlitz participations. With a modest rating, he has only scored 6 tournament victories in the >45K arenas he has participated in. Chess-wise, perhaps his biggest achievement in these arenas came 8 years ago, when in the Daily HyperBullet Arena he managed to score a major upset, defying the odds of a rating gap of more than 1300 points and checkmating FM JusticeBot in just 15 moves! link to the game

Titled Trophies: GM Magnus Carlsen

Having scored a combined total of 19 titled arena victories on his various accounts (@DrNykterstein (10), @DrDrunkenstein (4), @manwithavan (1), @DrGrekenstein (1), @DrChampionstein (1), @DannyTheDonkey (1), @damnsaltythatsport (1)), Magnus Carlsen must be satisfied to know that, although he no longer holds the World Champion title in classical chess, he still holds the equally prestigious title of having scored the most Titled Arena victories on Lichess. Should he wish to "defend" this title however, he will need to return to Lichess soon to win some more titled arenas, as others are approaching his record and are looking to take his title.

Quality Trophy Hunting: GM @Zhigalko_Sergei

Another legend of Lichess, GM Sergei Zhigalko needs no introduction. Besides sporting high ratings in almost all variants and time controls, he also does very well in the more prestigious arenas. Among his achievements are a clear first place in Yearly Arena victories, topping the list of Shield Arenas victories, and being in the top 5 of players with the most trophies in the Monthly and Elite Arenas. He is still waiting for his first Titled Arena victory, scoring 2 silver medals and 10 bronze medals.

Honorable mentions

While there are many more achievements to highlight, let me wrap up this section with a few more honorable mentions of people who have achieved amazing results, and who deserve praise for their accomplishments as well:

  • CM @Kingscrusher-YouTube has been an icon on Lichess for over a decade, as a chess streamer and as a very active player on Lichess. He is currently in 2nd place in overall trophies in all arenas (behind the unstoppable @Hoegi), he has many top 10 places in various rankings, and he has been racking up marathon trophies on his profile as well.
  • IM @opperwezen has been around on Lichess for a long time, and has managed to achieve some exceptional and unbelievable scores in arenas. Specifically, the list of HyperBullet Hourly Highscores is dominated by his results, often achieving perfect scores of ~30 berserked victories in less than half an hour of play!
  • @decidement has the most points in all arenas combined in the rankings, and with >420K games played on Lichess, he is in third place overall on Lichess in the total number of games played. In terms of overall arena participations, he is just behind the legendary @german11. He has definitely earned legendary status on Lichess, and deserves credit for his active presence and high scoring ability in the arenas he participates in.
  • GM @Alexsur81 is one of the few grandmasters in the top of the arena rankings in terms of trophies, having scored most of them in SuperBlitz arenas, where he is far ahead of the rest in terms of tournament victories. He does not shy away from a challenge either, as he has managed to score the most tournament victories in elite arenas, which are inherently tough arenas to win due to the rating requirement to join these events.
  • @tamotdons has been consistently dominating the three-check arena scene, scoring over 2100 three-check arena victories, which is far ahead of the rest (with the number two scoring fewer than 700 arena victories). That impressive number of victories also puts him in the 3rd place of overall tournament victories, behind the aforementioned @Hoegi and @Kingscrusher-YouTube.
  • GM @Night-King96 has the most titled arena victories if we don't add up Magnus' wins over his many accounts - with 17 titled arena victories, GM Oleksandr Bortnyk "just" needs two more victories to match Carlsen's record of 19 wins. Moreover, he has scored the most points in all titled arenas combined, having participated in 97 of the 120 titled arenas to date, showing he is a loyal participant in these events and he is always one of the main contenders for the $500 first place.

As the author of these rankings, I am pleased to say I currently hold one record as well: the record of the highest score in Hourly Chess960 Arenas. Recently the record was tied by @LollaRossa, but the record still stands at 76 points. As records exist to be broken: does anyone think they can beat this record?

The Future

Looking ahead, I must admit that I have not had the time to properly maintain the code or to consider adding new features, and my efforts recently have been limited to running the automatic update scripts and occasionally debugging any changes on Lichess' end that broke the scripts. Moreover, the script takes quite a while to run, and as the rankings site is completely free with no revenue, I only lose money in paying for the energy used by the laptop running the scripts.

On the other hand, not updating the pages is not really an option either, as I understand it's a nice page to have for the community. As my updates have been rather sporadic recently, I have often received requests to please update the rankings again. It then often took me a while to get around to running the script again, and since the script itself usually takes a few days to catch up after not updating the rankings for a while, I can imagine that it must have been somewhat frustrating for users who achieved a new record and were waiting to see their name and new record appear on the website.

In the near future I do not expect the situation to change much: I will try to occasionally run the updater to keep the lists up to date. Instead, I will probably focus most of my spare time (that is: the spare time I wish to spend on such coding projects) on other hobby projects, of which I will highlight two related to Lichess which are already live:

  • The Lichess Knockout script allows you to run your own knockout events on Lichess, by having a script keep track of the KO results and send pairings to the Lichess API via a "Swiss event". This still requires testing on a larger scale, but should be (almost) bug-free at this point.
  • The Lichess Ladder tries to objectively rank the top Lichess players based on their overall performance against one another, rather than based on day-to-day form. Unsurprisingly, Magnus' accounts rank highly in this list as well, despite having only played relatively few games on most of these accounts.

For those who are still interested in more statistics, don't forget to read my other blog post dedicated to Lichess Marathon Statistics. And if you want to be informed of any future blog posts, feel free to give me a follow!