Kramnick's Current Study Of Cheating In On-Line Chess

@boilingFrog said in #41:
> Go hijack somebody else's thread ...
This is a forum, a place to share ideas, and also to see if they hold up or are fallible. When on a forum, you should try to listen to people's reasoning instead of swearing at them because they don't agree with you, that you you have a better idea if your ideas hold up in real life!

What I have said is on topic, relevant to the assertions, and if you have no counter argument other than name calling, then maybe it's you who shouldn't be on this thread.

sosumisai

#43

@AlexiHarvey said in #26:
> I have looked through report, although there is a number of interesting points there are aspects that make the analysis weak.
>
> (1) Report does not state the source of the 'ratings'. If chess.com's blitz rating is being used this would make the report very dubious as clearly there would be a circularity.
>
> (2) Defining the 'best move' based only on 1 sec of engine analysis is weak, imo. As a ~1500 player using a fairly powerful modern PC I would analyse my own games at a minimum of 10 seconds per move - and these games would be of the throwaway type, rated OTB games I use 60seconds with even long times of 1+ hour at key conjunctions. The report makes no attempt or mention that to indicate whether 1 second is sufficient. Does the analysis change if 2 seconds etc are used? who knows. At the very least some sort of qualification check should have been performed, possibly using a larger time interval with a correspondently smaller dataset.
>
> (3) The report's key metric is 'average diversion of a over game' and ignores just how powerful selective usage of engine can be on the result of a game. As many elite chess players has pointed out, just knowing an engine evaluation of the game would provide a significant advantage, far less than recommended move in a given position.
>
> The principle basis of Kramnik's analysis is a comparison between on-line blitz play and OTB ratings, with the reasonable assumption that the level of cheating on the latter is likely to be extremely small compared to on-line. This is a very solid way of detecting cheating, as the main purpose of cheating in this context is money based on results. Caruana in a recent c2squared podcast has stated he would not expect there to be any difference in the quality of play between elite player whether playing OTB or Titled Tuesday. - i.e. there is no such thing as a 'blitz specialist' at these level - he put the difference below 25 rating point at worst. If this was true then clearly even 1SD difference over sufficient game would be suspicious. From my understanding of the mental differences between OTB/Blitz I would actually expect to see a bias towards the higher OTB rated player compared to strict ELO probabilities - in short events like Titled Tuesday should have an inherent bias towards higher rated OTB players.

============

Agreed. Dorian Quelle made some aggressive claims and put up some fancy looking charts and graphs, but his reasoning was superficial and sloppy. He carefully analyzed the wrong numbers while ignoring the right numbers--a classic Straw Man fallacy.

1 second of engine analysis per move is not nearly enough to judge grandmaster chess.

The discrepancies between some players' OTB FIDE tournament performance numbers and their chess.com Titled Tuesday performance numbers are among the most obvious red flags. Quelle ignores such red flags.

Among the players with the most suspicious-looking discrepancies there is one who seems extremely suspicious. Online he is among the Top 10 Titled Tuesday players of all time in multiple categories, including total event wins and highest score/winrate in a single event. He's a Titled Tuesday SuperGM. OTB he has never been in the top 100, never won a single major international event, and his FIDE ratings have always been under 2570 at every time control--classical, rapid, and blitz. He is over age 25, so he is not an underrated junior--his main FIDE rating has stayed in the 2500-2570 range over the past 5 years. How do you explain such a difference between OTB and online results? A freak genius-level mouse-clicking talent in an otherwise average GM?

some Titled Tuesday stats:
www.chess.com/article/view/titled-tuesday

mkubecek

#44

@sosumisai said in #43:
> Among the players with the most suspicious-looking discrepancies there is one who seems extremely suspicious. Online he is among the Top 10 Titled Tuesday players of all time in multiple categories, including total event wins and highest score/winrate in a single event. He's a Titled Tuesday SuperGM. OTB he has never been in the top 100, never won a single major international event, and his FIDE ratings have always been under 2570 at every time control--classical, rapid, and blitz. He is over age 25, so he is not an underrated junior--his main FIDE rating has stayed in the 2500-2570 range over the past 5 years. How do you explain such a difference between OTB and online results? A freak genius-level mouse-clicking talent in an otherwise average GM?

When in university, I met people who performed great in written exams but struggled when they had to face the professor in an oral exam. Others performed great in long term work at home but had trouble focusing in the foreign environment of a classroom. One of my friends at work gets frustrated whenever a cleaning lady moves his monitor by few centimeters up or down. And I also find it much harder to focus when I have to work on someone else's keyboard or monitor (not to mention if I have to use a laptop) while most people have no problem with it and don't understand why I do.

So what you pointed may be a result of cheating - but it may just as well be sign of someone who feels way more comfortable in his familiar environment of his home but not nearly as comfortable when playing an OTB tournament, and it shows in his results. People have all kinds of psychological issues and it doesn't have always to be as extreme as in the Rain Man movie. After all, 2500-2570 FIDE ELO doesn't sound like anything mediocre to me...

AlexiHarvey

#44

@SaltWaterRabbit said in #31:
> Fair enough
> The Dorian Quelle report includes some choices in the method and assumptions that can be objectively questioned.
>
> What about the Kramnik report? Are there any objective assessments of its method and assumptions? How does this approach relate to current literature / best practice?
>
> It is well beyond my personal skill set to objectively assess the statistical methods/ assumptions used in these reports. I have to rely on others.
> Thanks for your analysis.

I have only seen a podcast - c2podcast I think - where Kramnik outlined his basic method. What I can say without knowing the exact details is that Kramnik's method of comparing OTB ratings with On-line performance would have the potential of being far more robust statistically than on-line cheat detection methods of which Dorian Quelle report is one simple variety.

AlexiHarvey

#45

@mkubecek said in #44:
> When in university, I met people who performed great in written exams but struggled when they had to face the professor in an oral exam. Others performed great in long term work at home but had trouble focusing in the foreign environment of a classroom. One of my friends at work gets frustrated whenever a cleaning lady moves his monitor by few centimeters up or down. And I also find it much harder to focus when I have to work on someone else's keyboard or monitor (not to mention if I have to use a laptop) while most people have no problem with it and don't understand why I do.
>
> So what you pointed may be a result of cheating - but it may just as well be sign of someone who feels way more comfortable in his familiar environment of his home but not nearly as comfortable when playing an OTB tournament, and it shows in his results. People have all kinds of psychological issues and it doesn't have always to be as extreme as in the Rain Man movie. After all, 2500-2570 FIDE ELO doesn't sound like anything mediocre to me...

Very true. And you highlight a common error people make when dealing with statistics. Statistics is about testing hypothesis and has nothing to say about individual data points.

In short all the statistic comparisons between OTB, On-line & engines can't be used to identify any cheating individual. If the hypothesis is; say," Are some players cheating on Titled Tuesday" you can't point the finger at any of the 'outlining' data points - you would have to create a large set of data points around a given individual's performance and ask "Is this person likely to be cheating with a probability of being wrong roughly in 1 in 20 cases#". In there is the rub, in order to home in on the individual you need a heck of a lot of datapoints and very solid and well as as few as possible assumptions. A single game can almost never be used as evidence of cheating, except in very extremely cases, only likely to occur with 'very noob' players.

Caranna recently said something very important, he stated he had only become suspicious of Title Tuesday like events over the last three years - since covid. A very simple test would be to compare statistics from the last year with those of say five years ago. Again you would only be able indicate if cheating was recently occurring, not who was cheating. Such a test would be capable of having fewer assumptions than say Kramnik's and be even more robust.

#This is not strictly correct - technical meaning is; if you repeat the exact same analysis a significant number of multiple times with different datapoints each time you would get the 'wrong' result roughly with a probability of 1/20. In short there can be no certainty with statistics, however given that a decision has to be made then statistic techniques give an objective method of making such decisions in a complex environment/systems, where time, cost, and benefit considerations are vital. The alternative is for people to argue till they are blue in the face with considerably less utility!

sosumisai

edited

#46

@mkubecek said in #44:
> When in university, I met people who performed great in written exams but struggled when they had to face the professor in an oral exam. Others performed great in long term work at home but had trouble focusing in the foreign environment of a classroom. One of my friends at work gets frustrated whenever a cleaning lady moves his monitor by few centimeters up or down. And I also find it much harder to focus when I have to work on someone else's keyboard or monitor (not to mention if I have to use a laptop) while most people have no problem with it and don't understand why I do.
>
> So what you pointed may be a result of cheating - but it may just as well be sign of someone who feels way more comfortable in his familiar environment of his home but not nearly as comfortable when playing an OTB tournament, and it shows in his results. People have all kinds of psychological issues and it doesn't have always to be as extreme as in the Rain Man movie. After all, 2500-2570 FIDE ELO doesn't sound like anything mediocre to me...

Yes, it is possible that some top players might play much better online than OTB--almost anything is possible. But how common are such players today? And how common were they in the era before affordable chess engines became superhuman? Before it became easy to cheat your way to the top of online chess?

24/7 internet chess servers with elo/glicko rating lists have been around since 1992 (casual online chess games were played since long before that). Easily available chess engines did not reach super-GM levels until around 1997-1998, and did not reach superhuman levels until around 2004-2006. So we have a window of time (1992-1997 and maybe also 1998-2003) during which top OTB players were playing rated games online but chess engines were not yet strong enough to consistently outplay the top 10 human grandmasters. During that period, were there any human players who got ranked among the top 10 online, but could never make the top 100 OTB? I don't know of any from the 1990's.

Does anyone know?

Another issue is that certain players seem to play much better than their OTB rating in SOME online events, and then play roughly at their OTB level in OTHER online events. One player I have in mind sometimes plays at 2800-2900 Magnus level, and other times plays at ordinary 2500-GM level. I saw him get a 7.0-to-7.0 score against Magnus in one series of online games, and then I saw him get blown off the board in another series of online games. He has won 8 Titled Tuesdays (chess.com) and 10 Titled Arenas (Lichess) and got 41 podiums (top3 finishes) in these events, beating many superGM's in the process, sometimes hitting 9-win streaks or going undefeated. That puts him in the top 10 for all time best results in online titled Tuesday/arena events with cash prizes. Some other times he plays online just as you would expect from his ~2500 FIDE rating.

So it's not just a difference between online and OTB performance, but a difference between some online performances and other online performances. The issue is players with performance swings of 300-600 rating points, up and down. A dozen games at 2500-2600, followed by a dozen games at 2800-3000, followed by a dozen games back at 2500-2600, How common are such swings and how common were such swings BEFORE chess engines reached super-GM level strength?

MedvedevSemen

#47

Я люблю обучение шахматам

MedvedevSemen

#48

Привет всем своим друзьям и близким людям которые вы 6 в друзья с любимым человеком в моей

Akanechan

#49

I would not look at all moves between the first novelty and endgame because a top player will likely find something good when there are multiple good choices. Rather it would be more indivative to look only at those situations where:
- there are very few good moves
- there are many plausible-looking moves (as calculated at a very shallow depth, maybe?)
- the player took suspiciously little time to figure out the best move
This would be looking for "engine moves"... or just players who are faster/luckier at finding the winning sequence.

NM MrPushwood

#50

@boilingFrog said in #18:
> Hey - Go poop in your own pool, Pushy ...

Whatsamatter? Run out of 12-move wins to post?

This topic has been archived and can no longer be replied to.