@for_cryingout_loud said in #12:
But start wider and narrow problems down if needed.
"
No i will abuse the sun until it gives me answers
wider takes longer narrow will be faster just only narrow if we are a 100% sure based on facts
For analyzing such issues you have consider whole picture = think wider.
Then narrow down potential bottlenecks and via monitoring, exclude first some areas quite fast - factors what are irrevelant.
You can use of course "abuse the SUN method" if you wish, me and general public will wait till Sun answers to you.
"How long are servers now on OVH hosting? At least several (6 or more) months ago there was no such problems or it was not so prominent.
So what else has been changed? scala version itselt, but besides that?"
Not easy to link it to 1 change as users have complained for years (search la gon forum and scroll down or disconnect)
Even harder to say if its gotten worse because of new updates since no one has been measuring it in any way besides feeling
but maybe we can ask for load stats before and after that update and other ones to see if we can spot overload
maybe we can also see if the server system can run full thorlate with no problems in terms of heat power etc
Who said it must be easy? If not able to say, if perfomance of servers has gotten worse or not, means someone should implement measuring standards to be able to say those things in future. Interesting, who might that be?
"So access to hosted system monitoring is crucial and server operators should have some kind of service mechanism -persons to contact who provide such task or access if needed. There is no need "to guess" but create such inquri to hosting providers."
lichess should be able to see form console so just get lichess to ask and see.
but what data exactly are we looking for
avg cpu usage
temps
requests for games
requests for everything
disk usage % and read and write
mem usage and mem free and memcached
maybe we can ask them to save cpu and other values over time and then we can plot it and see if any meaningful trend aries
All of them in the very beginning and even worse - do that in timely manner. And not only those parameters you told. There is more.
But let every person do task he is skilled on.
- OVH persons can tell overall load/perfomance of hosted servers, manage their load balancing and etc and provide perfomance graphs.
- Some network specialist from OVH should be able to answer questions related to networks, firewalls, redirections, network equipment, their loads, queue times and possible even attacks if happening on systems.
- Some system-database architect - can seek and measure overall load between frontend and backends and their interactions.
Database measurings, optimisations and etc. Database access times, response times and etc.
You want facts - generate them by analyzing and measuring.
In this place, i would ask, if those problems affect only bullet or ultrabullet, or how prominent are "lag/throtle" issues on slower tournaments. Are they on same virtual machine/cluster/network and etc. What is common, what is different.
Differenciate your findings to narrow things down and then start details search.
What i can tell, by just "feeling" - in peak times there is around 90000 clients in server and some 45000-46000 games in progress, and that has not changed too much.
About lag and forums - there is need to differenciate real lag problems and load problems. And you know, this can also be done only by measuring.
"That bot thing can help too, specially if this counts disconnets for bot client. But i would start from overall system perfomance stats. But bot's can't play 1:0 tournaments, can they?"
bots sadly cant play tournaments according to this(lichess.org/api#tag/Bot) but
tournament games are only different from normal games by parring and points
if we just have bot a play bot b
we should still find the issue
just make both bots run ping to see if the whole connection drops or just server response to our requests so we can be certain of it.
Incorrect. You may have finds and may not.
You possibly find bot a vs bot b issues, if you are lucky enough to run them on affected part of the system.
To be more sure - you have make a bot challenge on same server as bullet (because this one is propably most complained about ) system runs, and better make plenty of those bots- i mean some hundreds, to mimic normal tournament startup, and make them move in game progress more like normal humans do - with random move delay (i mean no opening book play with 30 moves in first 0,001 s).
But this bot test is more meaningful, if overal server perfomance analizing process is allready done.
And who restricts making just bot test arena for perfomance testing purposes?
It depends on human ressourses you have, how wide or narrow one starts.
"Wish you good luck on solving."
You can't run away i am roping you into this
Dont worry, i just bought good custom M390 steel knife against roping.
@for_cryingout_loud said in #12:
> But start wider and narrow problems down if needed.
> "
> No i will abuse the sun until it gives me answers
> wider takes longer narrow will be faster just only narrow if we are a 100% sure based on facts
For analyzing such issues you have consider whole picture = think wider.
Then narrow down potential bottlenecks and via monitoring, exclude first some areas quite fast - factors what are irrevelant.
You can use of course "abuse the SUN method" if you wish, me and general public will wait till Sun answers to you.
> "How long are servers now on OVH hosting? At least several (6 or more) months ago there was no such problems or it was not so prominent.
> So what else has been changed? scala version itselt, but besides that?"
>
> Not easy to link it to 1 change as users have complained for years (search la gon forum and scroll down or disconnect)
> Even harder to say if its gotten worse because of new updates since no one has been measuring it in any way besides feeling
>
> but maybe we can ask for load stats before and after that update and other ones to see if we can spot overload
> maybe we can also see if the server system can run full thorlate with no problems in terms of heat power etc
>
Who said it must be easy? If not able to say, if perfomance of servers has gotten worse or not, means someone should implement measuring standards to be able to say those things in future. Interesting, who might that be?
> "So access to hosted system monitoring is crucial and server operators should have some kind of service mechanism -persons to contact who provide such task or access if needed. There is no need "to guess" but create such inquri to hosting providers."
>
> lichess should be able to see form console so just get lichess to ask and see.
> but what data exactly are we looking for
> avg cpu usage
> temps
> requests for games
> requests for everything
> disk usage % and read and write
> mem usage and mem free and memcached
> maybe we can ask them to save cpu and other values over time and then we can plot it and see if any meaningful trend aries
All of them in the very beginning and even worse - do that in timely manner. And not only those parameters you told. There is more.
But let every person do task he is skilled on.
1. OVH persons can tell overall load/perfomance of hosted servers, manage their load balancing and etc and provide perfomance graphs.
2. Some network specialist from OVH should be able to answer questions related to networks, firewalls, redirections, network equipment, their loads, queue times and possible even attacks if happening on systems.
3. Some system-database architect - can seek and measure overall load between frontend and backends and their interactions.
Database measurings, optimisations and etc. Database access times, response times and etc.
You want facts - generate them by analyzing and measuring.
In this place, i would ask, if those problems affect only bullet or ultrabullet, or how prominent are "lag/throtle" issues on slower tournaments. Are they on same virtual machine/cluster/network and etc. What is common, what is different.
Differenciate your findings to narrow things down and then start details search.
What i can tell, by just "feeling" - in peak times there is around 90000 clients in server and some 45000-46000 games in progress, and that has not changed too much.
About lag and forums - there is need to differenciate real lag problems and load problems. And you know, this can also be done only by measuring.
> "That bot thing can help too, specially if this counts disconnets for bot client. But i would start from overall system perfomance stats. But bot's can't play 1:0 tournaments, can they?"
> bots sadly cant play tournaments according to this(lichess.org/api#tag/Bot) but
> tournament games are only different from normal games by parring and points
> if we just have bot a play bot b
> we should still find the issue
> just make both bots run ping to see if the whole connection drops or just server response to our requests so we can be certain of it.
Incorrect. You may have finds and may not.
You possibly find bot a vs bot b issues, if you are lucky enough to run them on affected part of the system.
To be more sure - you have make a bot challenge on same server as bullet (because this one is propably most complained about ) system runs, and better make plenty of those bots- i mean some hundreds, to mimic normal tournament startup, and make them move in game progress more like normal humans do - with random move delay (i mean no opening book play with 30 moves in first 0,001 s).
But this bot test is more meaningful, if overal server perfomance analizing process is allready done.
And who restricts making just bot test arena for perfomance testing purposes?
It depends on human ressourses you have, how wide or narrow one starts.
>
> "Wish you good luck on solving."
> You can't run away i am roping you into this
Dont worry, i just bought good custom M390 steel knife against roping.