lichess.org
Donate

Lag spike at game start. Unable to move and game is aborted after time limit is reached.

I play a bunch of 1/0 bullet, and every 3 to 8 games I'll start a game, but I can't make my first move.

I believe this is the result of a lag spike that usually occurs at the beginning of games, as this is the only time I ever have problems with connection.

Sometimes at the start of games, and no other times, there will be a massive disconnect. It will often say reconnecting right after I realize that although my piece has been moved, it doesn't yet say that it is my opponents turn. It will then do one of a few things.

1. It may get back connection after 5 or so seconds and I can play as if nothing happened.
2. It will go down to the wire and last second I'll get a connection
3. It won't ever reconnect between game start and abort timer end. In other words, I can move and the game is aborted.

Of the three above, it seems like 3 is the most likely. 2 almost never happens, and 1 happens almost as much as 3.

When starting a game, the first step seems to be to connect two players together, once connected, I believe that is when the abort timer begins, in my mind, there is an additional network step between when I can move and when the timer begins, and I think it is this step where the problem arises. I imagine it would also have to occur in an operation which would be expected to take almost no time at all and would be very consistent and thus would be difficult to find. This is all my speculation, I don't really have a clue about the systems involved, but I still hope my ponderings can provide some inspiration into the problem.

It may be a me problem instead of a server problem as well, as when I play bullet on my phone with low net speed and on app, I don't have any problems, but when I play on my computer sometimes I get a major lag spike (a temporary total disconnection of computer or server) only at the start of games. This I imagine would make the problem much harder to find, so here is a bit of computer info for you.

Windows 10, lotsa ram, Brave browser (like chrome but with ad-blockers and data tracker blockers), I'm pretty far from France I think and I don't use a VPN.

I give this problem a criticality rating of:
5/10
I get peeved when the site allows me to start a game but am actually just disconnected.
@viktor-laslow said in #2:
> I get peeved when the site allows me to start a game but am actually just disconnected.
If you are disconnected how can you start a game?
"there is an additional network step between when I can move and when the timer begins, and I think it is this step where the problem arises"-#1

No the problem arises because you get disconnected or take alot of time to load the chess board in.
Both are problems with connection or your pc

lets first test network
stop all downloads and bandwidth heavy tasks
then
open cmd/terminal/powershell
type ping lichess.org -n 100
where -n 100 means 100 times
And WAIT for the stats at the end

What your average ping is does not matter
Range does:
the lower your range the better your connection
over 100 is bad over 200 difference is really bad
you should aim or less than 0 dropped packets
I ran 1000 pings to lichess.org

Results
155ms min
689ms max
0% loss
256 byte packages
@for_cryingout_loud said in #4:
> the lower your range the better your connection
> over 100 is bad over 200 difference is really bad
It seems like I may have a terrible internet. But nothing in the range of 20 second disconnects. My problem can be described as a somewhat frequent, extremely specific period of time where connection to lichess fails entirely.

How can a 600ms response time turn into a 20 second response time and an aborted match?

And sorry for the slow response. I should have posted expecting a quickish response.
@Klelik said in #5:
> I ran 1000 pings to lichess.org
>
> Results
> 155ms min
> 689ms max
> 0% loss
> 256 byte packages
>
> It seems like I may have a terrible internet. But nothing in the range of 20 second disconnects. My problem can be described as a somewhat frequent, extremely specific period of time where connection to lichess fails entirely.
>
> How can a 600ms response time turn into a 20 second response time and an aborted match?
>
> And sorry for the slow response. I should have posted expecting a quickish response.

for the ping section you have a high range which means your connection is not good(but not bad enough to cause this)
there is nothing on lichess side as the test proved that your connection would not cause that to happen

as for what could cause that 20 second period maybe your browser try using a different one
@for_cryingout_loud said in #6:
> maybe your browser try using a different one

I tried chrome. But there was seemingly no change from brave. That only means that it wasn't something Brave did. As Brave is essentially chrome + ad-blocker/anti-tracker
I would try other browsers, but it looks like I uninstalled them...

I haven't had many games lag spike abort recently. But nothing seems to be different between then and now. If anything, I should have more lag spikes with all the processes I'm running.

Since it seems to vary in intensity without my changing anything, it is probably not something so tied to software that I'm using.
@Klelik said in #7:
> I tried chrome. But there was seemingly no change from brave. That only means that it wasn't something Brave did. As Brave is essentially chrome + ad-blocker/anti-tracker
> I would try other browsers, but it looks like I uninstalled them...
>
> I haven't had many games lag spike abort recently. But nothing seems to be different between then and now. If anything, I should have more lag spikes with all the processes I'm running.
>
> Since it seems to vary in intensity without my changing anything, it is probably not something so tied to software that I'm using.

I did read you say that you should get more spikes but its the only factor i can think of left as we covered everything from server to you.
server: no because more people would have it and server lag would prop be high
connection: no because we tested it
browser: tried different one didn't change
software: Next time you get the lag see what your pc is doing maybe thats the problem
This is real problem now and has been got worse on last weeks.
OP describes the issue very well.

I can confirm similar issue with those symptoms, mostly affecting gameplay in 1:0 bullet tournaments:

Tournament starts. There often is big amount of time, to get all present players paired and started with games.
I do not mean those players who went after sandwich or did'nt finish peeing ontime and so missed game start.

It doesn't matter too much, is there 800 players on bullet tournament or just 70.

What i want to say, starting tournament and games, looks like temporary difficulty.

The "lag issue" tend to be more prominent at starting countdown and on every random next game start.
This issue results on "disconnected" and "reconnecting .."messages on client side, in browser interface.

It does'nt depend what browser is used -chrome, vivaldi, firefox or brave as OP told - all of same have that same behaviour.
Those reconnects can also be anytime in game in progress, sometimes several times in one game. Adblock -ON/OFF does not treat issue as well.

Some days this "feature" is not too prominent, some days it is really bad.
mostly from +2 standard time, from 8-10 server is more playable, it really get more freezing after 11-12 at +2 EET.

So as talking about lag, may not be pointer to actual problem.

What is allready seen from peoples talk, the mostly do have more or less stabile internet connection, and several hundred ms-s should not be root cause of disconnects cousing 5-20 seconds time loss.

For example, i have run ping to lichess.org or manta.lichess.ovh and to my ISP simultanousely as i do while writing.

My pings to ISP are stable 3-4 ms hours and days if you wish (i am on optical connection)
Ping to lichess server outer interfaces are smooth 48-51 ms depends on day.

Ping remain stable while disconnects occure, both to ISP and to lichess.

Now, lichess moderators will argue, that server internal ping is 1 ms?

What does that all mean?

First, this is not peoples internet connection issue as such. So there is no need to treat "classic LAG"
Disconnect occures despite good connection with zero packet dropped while no other processes are affecting clientside part on machine. I have 0,1-0,5 % load of my ryzen 5800 cpu. Also downloading ISO-s or heavy amount of content same time does not change anything on chess client-server behaviour.

That means, this is not clienside load problem. 32 GB of ram, no swap and good enough videocard - there are plenty of ressources.

I have previously told on another post - there is possibly some kind internal lag.

But - most prominent is now - what you can not dispute - connection loss. Clients need to recconnect while still connected to internet well.
I do not know, do you use keepalive feature, but browser looses connections, and reconnets thereafter, then allready made premove go throw. Usually that takes 5-12 seconds, if disconnect happens.
And i repeat, there is NO connection loss to internet, not to lichess org or OVH as such.

That means, connection is interrupted in server system.
This can be because routing in hosted system, throtling on switches or on some of their ports, routers, network interfaces, or in any step of backend till filesystem or database itself.

As most people have in common -trouble starting game, you should check not only "Ping to server internally, but LOAD on different server components and possible queues in all server parts.

It can be anything- scala database, it's optimisation, backend server CPU, hosting servers load balancer, it's scaling optimisations, file systems load, cahces and etc.
At some point, clients are cut off from this "perfect" and maybe very mighti system.
If you are using hosting - you have to take into account also load on hosting system - that is on OVH part.

Domain Name: lichess.org
Registry Domain ID: d921d94e6ae34f1fbf0969b704a27a4a-LROR
Registrar WHOIS Server: http://whois.ovh.com
Registrar URL: http://www.ovh.com
Updated Date: 2021-10-29T14:57:12Z
Creation Date: 2010-06-14T09:06:45Z
Registry Expiry Date: 2027-06-14T09:06:45Z

And please, do not make anymore assumptions like - "clients should have packetloss less than 0".
How can packetloss be less than 0??

And this is just not you !get disconnected", it means you often will loose possibly winning strike, affects overall tournament result and ruins overall gameplay experience.

just for stats, while i was writing:
--- lichess.org ping statistics ---
8160 packets transmitted, 8160 received, 0% packet loss, time 8166533ms
rtt min/avg/max/mdev = 47.190/48.299/52.292/0.309 ms
"Now, lichess moderators will argue, that server internal ping is 1 ms?"
They are not agruying the are just relaying the info provided in lichess.org/lag
The only argument to be made is if the method which can be found on the github does not accurately track lag.
How can we who dont have access to console and are not running our own lichess say whether that 1ms is right.
It does sound very low but there is currently no easy way to prove it unless We setup a controlled setup where we ping or play a game through lichess and measure the exact lag accounting for our lag as well.

"mostly from +2 standard time, from 8-10 server is more playable, it really get more freezing after 11-12 at +2 EET."

This seems to be an opinion and not a fact. Which is fine but we cant use it to track down the source of the problems. Also those times are not peak times that's normally 4pm utc and onwards based on my viewing.

"i am on optical connection"
optical connection means light connection
you are on a optical fibre connection

"First, this is not peoples internet connection issue as such. So there is no need to treat "classic LAG" Also downloading ISO-s or heavy amount of content same time does not change anything on chess client-server behaviour.

That means, this is not clienside load problem. 32 GB of ram, no swap and good enough videocard - there are plenty of ressources.

I have previously told on another post - there is possibly some kind internal lag.

But - most prominent is now - what you can not dispute - connection loss. Clients need to recconnect while still connected to internet well.
I do not know, do you use keepalive feature, but browser looses connections, and reconnets thereafter, then allready made premove go throw. Usually that takes 5-12 seconds, if disconnect happens.
And i repeat, there is NO connection loss to internet, not to lichess org or OVH as such."
"And please, do not make any mo assumpions like - "clients should have packetloss less than 0".
How can packetloss be less than 0??"

The reason for no downloads or heavy bandwidth stuff is to remove factors that could cause problems or meaningless data hence why we use terminal to remove browsers as a factor.Removing as many factors as possible makes it easier to point exactly at what the problem is. The reason why i say less than 0 is not because i want it litteral to be less than 0 but more to emphasise that we want that number as low as possible.

"Disconnect occures despite good connection with zero packet dropped while no other processes are affecting clientside part on machine. I have 0,1-0,5 % load of my ryzen 5800 cpu"

Well then We removed Browser, connection and pc as problems

" do not know, do you use keepalive feature, but browser looses connections, and reconnets thereafter, then allready made premove go throw. Usually that takes 5-12 seconds, if disconnect happens.
And i repeat, there is NO connection loss to internet, not to lichess org or OVH as such.

That means, connection is interrupted in server system.
This can be because routing in hosted system, throtling on switches or on some of their ports, routers, network interfaces, or in any step of backend till filesystem or database itself.

As most people have in common -trouble starting game, you should check not only "Ping to server internally, but LOAD on different server components and possible queues in all server parts.

It can be anything- scala database, it's optimisation, backend server CPU, hosting servers load balancer, it's scaling optimisations, file systems load, cahces and etc.
At some point, clients are cut off from this "perfect" and maybe very mighti system.
If you are using hosting - you have to take into account also load on hosting system - that is on OVH part."

I would love to check the server load
and see if thats a factor but i dont have console access
but you are right it can be anything under the sun but the devs can't search under the sun because they will get burnt
So how can we narrow down what the problem is then or what causes the disconnect even thruther.

Maybe setup a bot that requests stuff from the api and see if that has dc issues or message the devs on discord and we start finding ways to narrow it down.

This topic has been archived and can no longer be replied to.