Uhm, has anyone really read this? This answers exactly that there's little chance to cheat "intelligently" in the long run.
See posting #5. For example: "Is black suddenly playing like a GM whenever there's one really important move?"
www.reddit.com/r/chess/comments/9s9ml8/how_does_chesscom_and_lichess_deal_with_gms/e8n9uwg/"... Playing the "second best move every time" suggests that you're thinking of this detection system as an "expert system". This is old school AI, and not very effective. You might think there are two top moves a chess engine would make, and every time a player makes one, they get one "cheating" point. Above some threshold the detector goes off. If detectors worked like this you're right, there'd be a huge number of false positives or false negatives (depending on the threshold).
Instead these systems are based more on probability. Big websites have millions of games to analyse - including games with confirmed GMs to learn from. A fraud detection algorithm can consider unlimited questions like:
If black has a 3 pawn lead but lots of unguarded pieces and down a knight, how likely are they to aggressively recapture instead of defend?
If black has been playing aggressively, how likely are they to play a very difficult to calculate defensive move?
When black has 15 available moves, 10 of which are pretty good, how long will it take for black to play a move?
Black has played 20 straight GM level moves and has a strong material lead. What are the odds that black will now play an amateur blunder? A "look I'm not cheating" move?
Is black suddenly playing like a GM whenever there's one really important move?
Does black suddenly play very well mid game if they are slightly behind? How likely is a human to do that?
Is black playing hard to calculate moves just as fast as easy to calculate recaptures?
Note how in all those cases, whenever I said "mid game" it could have been 10 turns in, 15, 17, whatever, each with different probability calculations. When I said "engine" it could have been any one of twenty engines. When I said 3 pawns and 1 knight, it could have been X pawns and Y knights.
If you analyse millions of games and programmatically construct probabilistic answers to thousands, even millions of questions like this, you can make a "threshold" system that is far more accurate than anything a human could come up with by comparing naively to an engine. Some questions turn out to be poor predictors of cheats. Some turn out to be great predictors - a human does not decide which are the best questions to ask, or with what parameters. I'm basically describing machine learning - which is definitely what the better systems must be doing.
There is still a fundamental problem of false positives (too sensitive) and false negatives (too cautious) by setting a threshold though."