On the limits of engine analysis for cheating detection in chess
Computers & Security
Volume 48, February 2015, Pages 58-73
David J Barnes & Julio Hernandez-Castro
School of Computing, The University of Kent, Canterbury, Kent CT2 7NF, United Kingdom
Received 12 July 2014, Revised 14 September 2014, Accepted 10 October 2014, Available online 22 October 2014.
https://doi.org/10.1016/j.cose.2014.10.002
Abstract
The integrity of online games has important economic consequences for both the gaming industry and players of all levels, from professionals to amateurs. Where there is a high likelihood of cheating, there is a loss of trust and players will be reluctant to participate — particularly if this is likely to cost them money.
Chess is a game that has been established online for around 25 years and is played over the Internet commercially. In that environment, where players are not physically present “over the board” (OTB), chess is one of the most easily exploitable games by those who wish to cheat, because of the widespread availability of very strong chess-playing programs. Allegations of cheating even in OTB games have increased significantly in recent years, and even led to recent changes in the laws of the game that potentially impinge upon players’ privacy.
In this work, we examine some of the difficulties inherent in identifying the covert use of chess-playing programs purely from an analysis of the moves of a game. Our approach is to deeply examine a large collection of games where there is confidence that cheating has not taken place, and analyse those that could be easily misclassified.
We conclude that there is a serious risk of finding numerous “false positives” and that, in general, it is unsafe to use just the moves of a single game as prima facie evidence of cheating. We also demonstrate that it is impossible to compute definitive values of the figures currently employed to measure similarity to a chess-engine for a particular game, as values inevitably vary at different depths and, even under identical conditions, when multi-threading evaluation is used.
On the limits of engine analysis for cheating detection in chess
Computers & Security
Volume 48, February 2015, Pages 58-73
David J Barnes & Julio Hernandez-Castro
School of Computing, The University of Kent, Canterbury, Kent CT2 7NF, United Kingdom
Received 12 July 2014, Revised 14 September 2014, Accepted 10 October 2014, Available online 22 October 2014.
https://doi.org/10.1016/j.cose.2014.10.002
Abstract
The integrity of online games has important economic consequences for both the gaming industry and players of all levels, from professionals to amateurs. Where there is a high likelihood of cheating, there is a loss of trust and players will be reluctant to participate — particularly if this is likely to cost them money.
Chess is a game that has been established online for around 25 years and is played over the Internet commercially. In that environment, where players are not physically present “over the board” (OTB), chess is one of the most easily exploitable games by those who wish to cheat, because of the widespread availability of very strong chess-playing programs. Allegations of cheating even in OTB games have increased significantly in recent years, and even led to recent changes in the laws of the game that potentially impinge upon players’ privacy.
In this work, we examine some of the difficulties inherent in identifying the covert use of chess-playing programs purely from an analysis of the moves of a game. Our approach is to deeply examine a large collection of games where there is confidence that cheating has not taken place, and analyse those that could be easily misclassified.
**We conclude that there is a serious risk of finding numerous “false positives” and that, in general, it is unsafe to use just the moves of a single game as prima facie evidence of cheating.** We also demonstrate that it is impossible to compute definitive values of the figures currently employed to measure similarity to a chess-engine for a particular game, as values inevitably vary at different depths and, even under identical conditions, when multi-threading evaluation is used.