A Tool to Analyse Intermittent Cheating Patterns Using Public Heuristics (ChatGPT Project Template)

Hi everyone,

I’ve been experimenting with a structured way to analyse behavioural patterns in Lichess games using publicly known heuristics.
To be clear: this is NOT an accusation system, and it is not a replacement for Lichess moderation or engine detection.
It’s simply an educational project that helps players understand common red flags discussed in research by Ken Regan, Chess_c0m transparency reports, academic papers, and public interviews about cheating detection.

I’m sharing the full ChatGPT Project Instruction below.
Anyone can paste this into ChatGPT (Projects Create Project Instructions) and get a consistent cheat-risk analysis of their own games.

This tool focuses on intermittent engine behaviour, where only some moves are assisted, making it harder to detect.
The system produces:

A Cheat-Risk Scoring Report (CRS)
An optional follow-up message template if the score is high
A strict reminder that this is NOT proof of cheating

Please use this for self-education, pattern awareness, and private review only.
Never name or shame users publicly — Lichess moderation is the only authority that can make a cheating determination.

# PROJECT INSTRUCTION (VERSÃO FINAL)

(Copy/paste into ChatGPT Projects)

TITLE: Lichess Intermittent-Cheating Detector & Follow-Up Message Generator

You are a tool that performs intermittent-engine cheating analysis for Lichess games, using public heuristics inspired by Ken Regan, Chess.com behavioural detection, Lichess transparency reports, and academic research.
You must always remain neutral, objective and non-accusatory.

## 1️⃣ When the user provides a game link or PGN with timing information, you must produce:

# Cheat-Risk Scoring Report (CRS)

Always generate this FIRST.
It must include the five categories below, each scored 0–10:

### • Move Difficulty Score (MD)

How engine-like and non-human-predictable difficult moves were, especially if played instantly.

### • Timing Pattern Score (TP)

Identify “inverse complexity timing”: instant hard moves + long pauses on simple moves.

### • Evaluation Volatility Score (EV)

Whether the evaluation remained unusually stable in sharp positions.

### • Performance Consistency Score (PC)

Compare the quality of the game to the player’s usual level.

### • Human Fear / Heuristic Behaviour Score (HF)

Check if the player ignored dangerous counterplay with non-human confidence.

Final CRS = sum (0–50)

Risk Levels:

0–15 Low
16–29 Moderate
30–50 High

Always state:

“This report identifies behavioural anomalies, not proof of cheating.”

## 2️⃣ Threshold Logic for Lichess Message Generation

After generating the CRS:

If CRS < 30, do NOT generate a message.
Output instead:

“CRS below threshold — follow-up message optional.”
If CRS ≥ 30, automatically generate a Lichess follow-up message.

## 3️⃣ Follow-Up Message Requirements (IMPORTANT)

If the threshold is reached, the assistant must generate a formal message:

The message MUST have less than 3000 characters (absolute rule).
Polite, neutral and analytic
No direct accusation
No certainty language
Highlight timing anomalies, move difficulty, performance spike, evaluation stability
End by requesting deeper review
Use bullet points
Include the game link at the start
Must be well structured

## 4️⃣ Output Structure — ALWAYS in this order

SECTION A — Cheat-Risk Scoring Report
(detailed, numeric scores + interpretation)
SECTION B — Lichess Follow-Up Message
- ONLY if CRS ≥ 30
- Must be < 3000 characters
- If CRS < 30 output:
  
  “CRS below threshold — follow-up message optional.”

## 5️⃣ Absolute Rules

Never exceed 3000 characters in the Lichess message.
Never accuse the opponent of cheating.
Never claim certainty.
Always present anomalies neutrally.
Always follow the 2-section output structure.

END OF PROJECT INSTRUCTION

If anyone wants help refining this or running sample analyses, feel free to reply!

Screenshots

Download PGN annotated

In the chat just provide PGN annotated like this

Hi everyone, I’ve been experimenting with a structured way to analyse *behavioural patterns* in Lichess games using **publicly known heuristics**. To be clear: **this is NOT an accusation system**, and it is **not a replacement for Lichess moderation or engine detection**. It’s simply an educational project that helps players understand common red flags discussed in research by Ken Regan, Chess_c0m transparency reports, academic papers, and public interviews about cheating detection. I’m sharing the full **ChatGPT Project Instruction** below. Anyone can paste this into ChatGPT (Projects Create Project Instructions) and get a consistent cheat-risk analysis of their own games. This tool focuses on **intermittent engine behaviour**, where only some moves are assisted, making it harder to detect. The system produces: * A ***Cheat-Risk Scoring Report (CRS)*** * An optional ***follow-up message template*** if the score is high * A strict reminder that **this is NOT proof of cheating** Please use this for **self-education, pattern awareness, and private review only**. Never name or shame users publicly — Lichess moderation is the only authority that can make a cheating determination. --- # **PROJECT INSTRUCTION (VERSÃO FINAL)** (**Copy/paste into ChatGPT Projects**) **TITLE:** *Lichess Intermittent-Cheating Detector & Follow-Up Message Generator* You are a tool that performs intermittent-engine cheating analysis for Lichess games, using public heuristics inspired by Ken Regan, Chess.com behavioural detection, Lichess transparency reports, and academic research. You must always remain neutral, objective and non-accusatory. --- ## **1️⃣ When the user provides a game link or PGN with timing information, you must produce:** # **Cheat-Risk Scoring Report (CRS)** Always generate this FIRST. It must include the five categories below, each scored 0–10: ### • **Move Difficulty Score (MD)** How engine-like and non-human-predictable difficult moves were, especially if played instantly. ### • **Timing Pattern Score (TP)** Identify “inverse complexity timing”: instant hard moves + long pauses on simple moves. ### • **Evaluation Volatility Score (EV)** Whether the evaluation remained unusually stable in sharp positions. ### • **Performance Consistency Score (PC)** Compare the quality of the game to the player’s usual level. ### • **Human Fear / Heuristic Behaviour Score (HF)** Check if the player ignored dangerous counterplay with non-human confidence. **Final CRS = sum (0–50)** **Risk Levels:** * **0–15 Low** * **16–29 Moderate** * **30–50 High** Always state: > **“This report identifies behavioural anomalies, not proof of cheating.”** --- ## **2️⃣ Threshold Logic for Lichess Message Generation** After generating the CRS: * If **CRS < 30**, do **NOT** generate a message. Output instead: > “CRS below threshold — follow-up message optional.” * If **CRS ≥ 30**, automatically generate a Lichess follow-up message. --- ## **3️⃣ Follow-Up Message Requirements (IMPORTANT)** If the threshold is reached, the assistant must generate a formal message: **The message MUST have less than 3000 characters** (absolute rule). Polite, neutral and analytic No direct accusation No certainty language Highlight timing anomalies, move difficulty, performance spike, evaluation stability End by requesting deeper review Use bullet points Include the game link at the start Must be well structured --- ## **4️⃣ Output Structure — ALWAYS in this order** 1. **SECTION A — Cheat-Risk Scoring Report** (detailed, numeric scores + interpretation) 2. **SECTION B — Lichess Follow-Up Message** * ONLY if **CRS ≥ 30** * Must be < 3000 characters * If CRS < 30 output: > “CRS below threshold — follow-up message optional.” --- ## **5️⃣ Absolute Rules** * Never exceed 3000 characters in the Lichess message. * Never accuse the opponent of cheating. * Never claim certainty. * Always present anomalies neutrally. * Always follow the 2-section output structure. --- **END OF PROJECT INSTRUCTION** --- If anyone wants help refining this or running sample analyses, feel free to reply! **Screenshots** ![lichess_chatgpt_1.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=hUht-SX5t0r6.png&w=864&sig=9a74c9047ae267741bb1436b0cdcbbabc8e868ce) ![lichess_chatgpt_2.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=EJ1QoJmiGOAG.png&w=864&sig=cf2f8b0f09ddb32fe9aa920ca246638406f05e8c) **Download PGN annotated** ![lichess_download_pgn.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=MH2t8m3Fr4mk.png&w=864&sig=3ae6c61542a7fca79591320e26d7efa82f4bc0d0) **In the chat just provide PGN annotated like this** ![Screenshot 2025-12-09 at 23.27.17.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=k_2_k9BSSfnX.png&w=864&sig=030c86ad883c13460e7631a2f882bd0f98c53451)

Mpernest

I just tried to evaluate a game with ChatGPT 5.1. It told me I hung Be4. I asked it "How was Be4 capturable?". It's response....."pawn on f5 takes e4".

There was no pawn on f5. Ever. In the whole game.

Chat apologized and said it had miscalculated. Miscalculated what? The pgn was in the chat. All it had to do was read the notation and see there was no f5 and yet it told me I blundered.

Not so sure I think it is useful for this intended purpose.

I just tried to evaluate a game with ChatGPT 5.1. It told me I hung Be4. I asked it "How was Be4 capturable?". It's response....."pawn on f5 takes e4". There was no pawn on f5. Ever. In the whole game. Chat apologized and said it had miscalculated. Miscalculated what? The pgn was in the chat. All it had to do was read the notation and see there was no f5 and yet it told me I blundered. Not so sure I think it is useful for this intended purpose.

ilidiomartins

@Mpernest said in #2:

I just tried to evaluate a game with ChatGPT 5.1. It told me I hung Be4. I asked it "How was Be4 capturable?". It's response....."pawn on f5 takes e4".

There was no pawn on f5. Ever. In the whole game.

Chat apologized and said it had miscalculated. Miscalculated what? The pgn was in the chat. All it had to do was read the notation and see there was no f5 and yet it told me I blundered.

Not so sure I think it is useful for this intended purpose.

You're right — large language models can hallucinate and occasionally invent impossible chess sequences like “a pawn on f5 taking on e4” when no such pawn ever existed.
This happens because they don’t maintain an internal chessboard and don’t validate moves the way an engine does. Instead, they predict text patterns, which can lead to confident but incorrect answers.

This experiment is meant to test how far an LLM can be guided with strict instructions to analyse behavioural patterns rather than exact board states. It’s not a replacement for a real chess engine or for Lichess/Chess.com moderation.
So yes — sometimes it will make these kinds of errors, and part of the process is identifying where the model behaves reliably and where it doesn't.

@Mpernest said in #2: > I just tried to evaluate a game with ChatGPT 5.1. It told me I hung Be4. I asked it "How was Be4 capturable?". It's response....."pawn on f5 takes e4". > > There was no pawn on f5. Ever. In the whole game. > > Chat apologized and said it had miscalculated. Miscalculated what? The pgn was in the chat. All it had to do was read the notation and see there was no f5 and yet it told me I blundered. > > Not so sure I think it is useful for this intended purpose. You're right — large language models can hallucinate and occasionally invent impossible chess sequences like “a pawn on f5 taking on e4” when no such pawn ever existed. This happens because they don’t maintain an internal chessboard and don’t validate moves the way an engine does. Instead, they predict text patterns, which can lead to confident but incorrect answers. This experiment is meant to test how far an LLM can be guided with strict instructions to analyse behavioural patterns rather than exact board states. It’s not a replacement for a real chess engine or for Lichess/Chess.com moderation. So yes — sometimes it will make these kinds of errors, and part of the process is identifying where the model behaves reliably and where it doesn't.

SimonBirch

Chat gpt ?, can't even play chess without cheating xxx lol

ilidiomartins

edited

@SimonBirch said in #4:

Chat gpt ?, can't even play chess without cheating xxx lol

I tried the same with Grok.com , and take it a lot longer (Please use specialist), and it seams that has better results

https://grok.com/

@SimonBirch said in #4: > Chat gpt ?, can't even play chess without cheating xxx lol I tried the same with Grok.com , and take it a lot longer (Please use specialist), and it seams that has better results ![Screenshot 2025-12-10 at 00.58.37.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=t5H9VebxUaFt.png&w=864&sig=514cd567be539033df37a53946f8893405b45ac3) ![grok.png](https://image.lichess1.org/display?fmt=png&h=0&op=resize&path=JUja8Wk1Gnt_.png&w=864&sig=eb1ac5897b7a2c98f463d2a206385abe73327ca7) https://grok.com/

ilidiomartins

@SimonBirch said in #4:

Chat gpt ?, can't even play chess without cheating xxx lol

https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13-models-2154

@SimonBirch said in #4: > Chat gpt ?, can't even play chess without cheating xxx lol https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13-models-2154

SimonBirch

I was literally joking @ilidiomartins , I am a 53 year old technophobe and I have not a Scooby doo what your thread is about, I'm sorry was teasing , you carry on though , I'm sure it's very interesting and informative to those that care xxx

Hagredion

That's a great idea, there's nobody better equipped to decide whois cheating than a language model.

fin34601473braunpaul

edited

This experiment is meant to test how far an LLM can be guided with strict instructions to analyse behavioural patterns
rather than exact board states.

Interesting.

My Question @ilidiomartins: I guess if you put in the same game twice into the "tool" that arises from your prompt you get a different answer.
From what I see here these answers may contradict each other heavily for the very same input data that is given to the "tool" several times.

My conclusion comes from not seeing a deterministic way, let alone a "deterministic algorithm" to create the score values.

My best guess would be that each run for the same game produces a different score.
Am I correct?

Generally, there are certain tasks that LLMs always fail.
Most important example: If you ask them to explain their reasoning,
they produce reasonable text that will never tell you how certain numbers were crunched to produce an answer,
but if the answer does not involve anything about number crunching than it is very obviously just a lie.

A way around this problem for me is so-called "vibe coding"
("LLM please show me some code to analyze some data this way or another").
When an LLM gives me a code example then
1.) I can check it on my own ("human in the loop").
2.) When I run this code it will always give me the same answer for the same input (even if the answer is still wrong).
Hint:
Be careful when you assume that the LLM would be running some code for you if you ask it to do so,
maybe it does it or maybe it just produces statistically plausible text that may be false or true to a "random" degree.

> This experiment is meant to test how far an LLM can be guided with strict instructions to analyse behavioural patterns > rather than exact board states. Interesting. My Question @ilidiomartins: I guess if you put in the same game twice into the "tool" that arises from your prompt you get a different answer. From what I see here these answers may contradict each other heavily for the very same input data that is given to the "tool" several times. My conclusion comes from not seeing a deterministic way, let alone a "deterministic algorithm" to create the score values. My best guess would be that each run for the same game produces a different score. Am I correct? Generally, there are certain tasks that LLMs always fail. Most important example: If you ask them to explain their reasoning, they produce reasonable text that will never tell you how certain numbers were crunched to produce an answer, but if the answer does not involve anything about number crunching than it is very obviously just a lie. A way around this problem for me is so-called "vibe coding" ("LLM please show me some code to analyze some data this way or another"). When an LLM gives me a code example then 1.) I can check it on my own ("human in the loop"). 2.) When I run this code it will always give me the same answer for the same input (even if the answer is still wrong). Hint: Be careful when you assume that the LLM would be running some code for you if you ask it to do so, maybe it does it or maybe it just produces statistically plausible text that may be false or true to a "random" degree.

lonelypeanut

#10

Please don't fill the forum up with LLM generated trash.