
Drawing by Gia - https://www.pixiv.net/member.php?id=34624
Post-Mortem of our Longest Downtime
From 14:54 UTC on the 12th September until 01:33 UTC on the 13th September, Lichess experienced the longest downtime in its history.Surprisingly, this downtime was even longer than that time the datacenter where some of our servers were burned down.
We’re a community platform, which is solely funded by our community through small charitable donations — so you, as our beneficiaries and stakeholders, who support us and encourage us — deserve to get clarification on what happened, what we did, and what we’ll do in the future to mitigate this risk again.
What happened
Essentially, our main server (known as manta) lost connection to our private network at 14:54 UTC. Thibault, our founder and main developer, and Lucas, our charity President and main system administrator, immediately began investigating.
Within minutes, they had both attempted our usual failsafes and identified the cause was likely a physical hardware issue in our provider’s (OVH) data centre. After that issue was finally fixed by OVH technicians, a secondary issue cropped up on the same server, this time affecting access to the internet, which could also only be fixed physically by a technician in the data centre. From start to finish, the interventions lasted approximately 10 hours.
What we did
With Thibault and Lucas trying multiple fixes, the content team also quickly stepped into motion. The downtime was unfortunate, given the Chess Olympiad was in progress and our daily commentary was live with hours left.
When it became clear the downtime would be more than a few minutes, they began setting up our commentator, GM Illia Nyzhnyk, to a private sandbox version of Lichess. Illia — streaming for Lichess solo for the first time — handled the situation exceptionally professionally, and continued his excellent commentary almost seamlessly.
With our own options quickly exhausted on the technical side, Lucas submitted a support ticket to our provider at 15:33 UTC.
There wasn’t much more we could do other than wait for OVH to send a technician to fix the physical issue with our server. Thibault, Lucas, and by now several other developers and systems administrators, were now discussing increasingly outlandish (and exceptionally expensive) ideas to get Lichess temporarily back online.
After a follow up on our ticket to OVH, around two hours later, OVH let us know that a network connector had been replaced and the intervention was deemed successful.
However, we were then almost immediately hit by an additional issue; now it looked like the other network connector of the server was out of order! We then had to send another support ticket, but with it coming up to 19:00 UTC, or 21:00 in Europe, and without getting further replies from OVH, it seemed our next intervention had been pushed until morning.
By now, virtually all of our systems administrators and developers were in the same channel, with over 1,000 messages being sent in the infrastructure channel of our Zulip. Unfortunately, there was once again nothing more we could do other than wait.
Coming up to 22:00 UTC, and having heard nothing further from OVH, we assumed the issue likely would only be looked at in the morning. Lucas decided to call it a night (with call alerts set up) an hour later. Thibault stayed up.
At 00:41 UTC it seemed action was being taken on the server, judging from the logs and some bumpy restarts. In the end, Lichess was up and stable from 01:25 UTC.
What we’ll do in the future
We’re in the process of getting further information from OVH to understand which component(s) had to be physically replaced, and why the intervention seemed to take an unusually long time.
Beyond that, we are investigating failsafes we can set up for these issues. The issue is not solely with OVH — to some extent, we also should have further failsafes set up. However, these failsafes take a considerable amount of time and expertise to set up, dealing with very sensitive parts of Lichess. With the time and expertise it takes, and on the budget that we have, we must ruthlessly prioritise.
While our developers and system administrators are incredibly talented, they are generally part-time volunteers; Lucas has a day job as a system administrator. Another of our sysadmins is a CTO at an innovative big geo startup. Another is in aeronautics for a living.
This is not shared to excuse us, but to highlight that the skills and resources we already rely on are often stretched thin, with other obligations — and that these individuals have all contributed their time because they play on Lichess and like the site.
FAQ
I lost rating points — will these be refunded?
Unfortunately, any rating lost will not be refunded. We appreciate this is unfortunate, but technically it is very difficult to be able to refund rating after such a crash.
On the bright side, rating is not a currency, it automatically adjusts to your skill. It will return to normal after just a few more rated games.
I’ve only been getting games as white / black. Is that connected to the crash?
We recently made a change where players can no longer choose to play as black or white in all games except for direct challenges. Some players have a debt to a certain colour, to rebalance the games they owe with that colour. These issues are not linked. However, we will be adding a small change so that the rebalancing will be more gradual, and not 20+ games of one colour in a row.
More blog posts by Lichess

Lichess Game of the Month: January '25
An entertaining material imbalance
Announcing the ChessMood 20/20 Grand Prix
A $20,000 prize event open to all Lichess streamers’ teams
Tata Steel Chess 2025 R13: Praggnanandhaa Wins Electrifying Tiebreaks vs. Gukesh
GM Praggnanandhaa R and GM Gukesh D both lost their games in the final round of Tata Steel Chess 202…