- Blind mode tutorial
lichess.org
Donate

Post-Mortem of our Longest Downtime

@OctoPinky said in #52:

This is 100% right but also can become quite expensive, eating a fair share of the budget in order to prevent a rare event. Not sure it is efficient unless you can't really afford some downtime.
I think that's correct, BUT we're talking about a major online chess community. Let's raise the money to prevent this from happening again.

@OctoPinky said in #52: > This is 100% right but also can become quite expensive, eating a fair share of the budget in order to prevent a rare event. Not sure it is efficient unless you can't really afford some downtime. I think that's correct, BUT we're talking about a major online chess community. Let's raise the money to prevent this from happening again.

@rushkeldon said in #13:

I would have thunk that Lichess was running on AWS or Google Cloud.
Is it perhaps the cost that makes that path less interesting / viable?
I would think that most of the "required physical action in the datacenter" types of issues would not be present in a cloud environment.
If it is a cost issue that keeps Lichess from running on AWS or similar I would be happy to reach out to the cloud computing platform of Lichess' choice and see if special pricing could be arranged for such an esteemed .org as ourselves!

I love Lichess and appreciate the transparency re the outage - and the quick action(s) to rectify.

Best regards,
Keldon Rush
keldon@spiral9.com
Just go for it Keldon, make it happen, if that's going to help.

@rushkeldon said in #13: > I would have thunk that Lichess was running on AWS or Google Cloud. > Is it perhaps the cost that makes that path less interesting / viable? > I would think that most of the "required physical action in the datacenter" types of issues would not be present in a cloud environment. > If it is a cost issue that keeps Lichess from running on AWS or similar I would be happy to reach out to the cloud computing platform of Lichess' choice and see if special pricing could be arranged for such an esteemed .org as ourselves! > > I love Lichess and appreciate the transparency re the outage - and the quick action(s) to rectify. > > Best regards, > Keldon Rush > keldon@spiral9.com Just go for it Keldon, make it happen, if that's going to help.

Many thanks for this explanation and for all the work put into the mitigation. These things happen and have happened to some of the largest companies in the world (Facebook, Amazon). The Lichess team does an amazing job.

Many thanks for this explanation and for all the work put into the mitigation. These things happen and have happened to some of the largest companies in the world (Facebook, Amazon). The Lichess team does an amazing job.

This attitude makes me say, "East or West, Lichess is the best."
Thankyou Lichess for giving so much to us without expecting anything in return.

This attitude makes me say, "East or West, Lichess is the best." Thankyou Lichess for giving so much to us without expecting anything in return.

@OctoPinky said in #52:

This is 100% right but also can become quite expensive, eating a fair share of the budget in order to prevent a rare event. Not sure it is efficient unless you can't really afford some downtime.

Fair point. So, you think this is just a budget concern?

@OctoPinky said in #52: > This is 100% right but also can become quite expensive, eating a fair share of the budget in order to prevent a rare event. Not sure it is efficient unless you can't really afford some downtime. Fair point. So, you think this is just a budget concern?
<Comment deleted by user>

Thanks for the detailed explanation, downtimes like this are bound to happen in any non-distributed system. It was not a human error, but a hardware failure. Most people have no idea how much money it would cost to host a server with this many users, especially when it comes to scaling with distributed systems and load balancers. As a free platform, Lichess understandably doesn't have the financial resources to implement such infrastructure. I encourage everyone to keep this in mind when comparing Lichess to other paid alternatives.

A huge thank you to the entire Lichess team for your hard work and for continuing to offer such a fantastic service for free!

Thanks for the detailed explanation, downtimes like this are bound to happen in any non-distributed system. It was not a human error, but a hardware failure. Most people have no idea how much money it would cost to host a server with this many users, especially when it comes to scaling with distributed systems and load balancers. As a free platform, Lichess understandably doesn't have the financial resources to implement such infrastructure. I encourage everyone to keep this in mind when comparing Lichess to other paid alternatives. A huge thank you to the entire Lichess team for your hard work and for continuing to offer such a fantastic service for free!

Lol, that got me a big laugh, that's all I have to say. But now seriously, go get a life, stop trying to get some kind of 'advantage' over everything and everyone you can, especially when it comes to money, really

Lol, that got me a big laugh, that's all I have to say. But now seriously, go get a life, stop trying to get some kind of 'advantage' over everything and everyone you can, especially when it comes to money, really

WOW????!?!?!?!??!?!?!?!????????????

amazing post THANKS!!!!

WOW????!?!?!?!??!?!?!?!???????????? amazing post THANKS!!!!