lichess.org
Donate

Analysis of Lichess' cheating detection with Machine Learning (ML) - a mis-use of ML & doesn't work

Sorry, I forgot to reply to this one and it is a really good point:

@phobbs5 said in #13:
> Regarding the model training on its own labels, how would you go about reframing the problem to avoid the predict-your-labels problem? Given that cheating detection has a massive scale problem and confessions are rare, would you rely on supplying human arbiters with heuristics and deriving labels from the arbiter's decisions? Could you augment the dataset with "artificial" cheating to make the training data more robust?

Disclaimer: I already had a beer, so this may be flawed, but thinking about how to get labels for cheating detection properly is a good thought.

Unless Lichess proves me wrong, I would say the percentage of confessions out of flagged cheaters is negligible. Relying on this to get labels is not going to work.

Here is how I would do it and it's wild - ASK PEOPLE TO CHEAT. No, don't stop reading here and start writing a response to call me crazy, hear me out and let me describe how this could work:
Lichess could reach out to players to sign-up for a "anti-cheater program". The selection of players would have to be as random as possible (this is important!), but you'd probably have to make some restrictions (e.g., you could use only accounts that have existed for more than a year) and I'd rely on Lichess to make the right call here. Once players, which have signed up for this program, queue for a new game, there would be a low percentage probability that you'd have a window pop-up to tell them that their next game is a "cheating game" and ask them to "prepare their engine(s)" (or you could ask if it would be ok if they played this game for the anti-cheating program). Once they confirm readiness, they could play the game and afterwards confirm they have cheated. You could even have a post-game labeling process asking them "how did you cheat?" (e.g., "used a separate mobile", "clicked on my other open browser tab with stockfish" etc.) and potentially even ask them to mark moves where they cheated (this would address the issue of occasional cheaters).

The players playing against those "programmatic cheaters" would obviously get a refund on their rating (and the programmatic cheater would not gain rating) as well as a message telling them that Lichess is grateful they played the last game against a cheater to improve Lichess mechanisms against actual cheating. It's important they do not know during the game that they are playing a programmatic cheater. You could even reward those people with some icon or whatever (I am not a marketing/gamification person, so maybe someone has a better idea for rewards...). The mechanism would also allow two programmatic cheaters to play against each other with different cheating methods.

This is not fleshed out in all details, but if you'd ask me to address this, the approach outlined above would be my first starting point to obtain better cheating labels. If you obtain sufficiently large quantities of such labels generated by programmatic cheaters, even cases where signed-up players try to mess up your data set by deliberately not cheating would not be significant on the model.

This is just a proposal, but once you have read this, feel free to down-vote me and throw all kinds of insults ;-)
I can't express how glad I am that @IrwinCaladinResearch has posted about the inadequacies of cheat detection and the unfair treatment of false positives here with a solid foundation.

Thank you for that!
@odoaker2015 Of course, you're absolutely ignoring the mod response.

> Alright, let's talk about your post. Unsurprisingly, we strongly disagree with your central claims that our systems are "fundamentally flawed" and that "it is very likely that [Lichess] punishes a lot of non-cheating players as well" - especially if the latter refers to decisions taken about accounts.
> More generally, we think your claims are rather strongly stated given the limited details that you provide to support them, and even those details need to be probed further. For example, just from what you've posted, we'd have concerns about the inferences you have drawn from the available data, the assumptions and logic behind your 'false positive' estimate, and whether your analysis has fully accounted for our ML systems' primary role to inform a multifactorial decision process.
> In addition, your characterisation of the feedback loops in our models is completely off the mark, because we have taken proactive measures to avoid exactly what you describe. You really ought to give us a bit more credit! We know we're dealing with applied ML here, where "ground truth" data never truly reflects actual ground truth, and perfect labels are the exception, not the norm.
because all you do is cheering for bold claims that suit you for obvious reasons, even when they're already contested by the team and the poster realizes that they need to reconsider their perspective.

Lichess doesn't suck at cheat detection. Period.

Btw what's described in #21 could just be done among staff members itself (or users that know they're engaged in cheat tests), no need to involve unaware players.
@Cedur216 said in #26:
> because all you do is cheering for bold claims that suit you for obvious reasons,...
Don't you do the same? You just don't like what @IrwinCaladinResearch has to say. That's all.

>Lichess doesn't suck at cheat detection. Period.
How do you know that? Are you a Moderator who is involved in cheat detection?
Do you have any expertise in cheat detection?
Unlike you, @IrwinCaladinResearch has a clue what he's talking about.
To the Lichess moderators: Please confirm that @Cedur216 is not a Lichess moderator and is not involved in cheat detection. And that he only speaks for himself.
> How do you know that? Are you a Moderator who is involved in cheat detection?

I communicated with them every once in a while and I know what they say about it. Sources like the video I just linked in this thread again are enough to give you a picture of what the websites do, even if you don't know the details.

I'm not experienced in detecting smart cheaters, I just have some ideas of how smart cheating can be realized (and that everyone can cheat), and I know some basic patterns and anomalies that are solid enough confirmations of cheating, e.g. an excessive series of almost-perfect games. Of course Lichess bans with more sophisticated ground, but it makes absolutely no sense to doubt a ban that even shows clearly visible indication.

The claim that there are innocent banned players left and right just doesn't stand up my observation.
Your observation is obviously incomplete. And the fact that your observation doesn't show that there are left and right innocent banned players doesn't mean that there aren't any. This is a wrong conclusion.

@Cedur216

This topic has been archived and can no longer be replied to.