AI unmasks anonymous chess players • page 2/3 • General Chess Discussion • lichess.org

since privacy issues were a topic addressed in the paper, i assume the data were anonymized before the analysis (for example by assigning a random number to each account and stripping the identity information). However as I understand it, basically all our games are accessible to anyone in a publically accessible database. Its up to you to protect your identity on your lichess account.

An interesting implication is that one could create AIs playing in the style of anyone who's game you have and access to and 'impersonate them'. Also one could link public and private accounts (which are allowed by the TOS under some circumstances). This would allow private accounts of super GMs to be revealed.

since privacy issues were a topic addressed in the paper, i assume the data were anonymized before the analysis (for example by assigning a random number to each account and stripping the identity information). However as I understand it, basically all our games are accessible to anyone in a publically accessible database. Its up to you to protect your identity on your lichess account. An interesting implication is that one could create AIs playing in the style of anyone who's game you have and access to and 'impersonate them'. Also one could link public and private accounts (which are allowed by the TOS under some circumstances). This would allow private accounts of super GMs to be revealed.

reidmcy

#12

@dampooo , @EmaciatedSpaniard We used the Lichess database which is under a license that allows research use. The only information is the usernames which we just use a unique identifiers. We haven't released the model, code or data so we're hoping this acts as a warning about the privacy risks.

@Algernon12 What I meant is that we are using behavior (i.e. .move in a game) without anything. This is a different type of task then previous work so we gave it a name "Behavioral Stylometry" to distinguish it from other types of behavior identification.

@EmaciatedSpaniard We already have a paper looking making models of specific people: "Learning Personalized Models of Human Behavior in Chess" (https://arxiv.org/abs/2008.10086)

@dampooo , @EmaciatedSpaniard We used the Lichess database which is under a license that allows research use. The only information is the usernames which we just use a unique identifiers. We haven't released the model, code or data so we're hoping this acts as a warning about the privacy risks. @Algernon12 What I meant is that we are using behavior (i.e. .move in a game) without anything. This is a different type of task then previous work so we gave it a name "Behavioral Stylometry" to distinguish it from other types of behavior identification. @EmaciatedSpaniard We already have a paper looking making models of specific people: "Learning Personalized Models of Human Behavior in Chess" (https://arxiv.org/abs/2008.10086)

dampooo

#13

@reidmcy thx for your openness.

Did you consider asking Lichess for permission?

@Lichess
What is your stand on using your data to profile anonymous people?

@reidmcy thx for your openness. Did you consider asking Lichess for permission? @Lichess What is your stand on using your data to profile anonymous people?

EmaciatedSpaniard edited

#14

@dampooo The data from lichess is freely available and lichess permits its use in research. Look at the paper. They only discuss their success in identifying players in a general statistical sense. Individual players data are not revealed and anyway your games are already available and identified with your account. They also make no links between multiple accounts, only stating that this would be feasible to do with the algorithm.

@reidmcy as i understand it from reading the paper,the algorithm does not do so well at identifying top level players (though much better than previous methods). They seem to inhabit a different region of 'stylometry space' so it would be unlikely that you could say some amateur played like a given strong professional player. In fact this seems to indicate that amateurs generally play in a completely different style than the strong professional.

Generally, I don't understand why you can't identify individuals with similar stylometry. You are identifying the players by 100 of their games which tells you a location in the mapped space and then you look for the person whose candidate games place them most closely to this location and you predict the candidate is that person ~85% of the time correctly. So if two persons had similar styles wouldn't they map to locations which were closer to one another? Just as a simple example: you show in Figure 3c of the paper that e4 and d4 players inhabit different spaces even if you look at games starting from move 15 (k=15).

@dampooo The data from lichess is freely available and lichess permits its use in research. Look at the paper. They only discuss their success in identifying players in a general statistical sense. Individual players data are not revealed and anyway your games are already available and identified with your account. They also make no links between multiple accounts, only stating that this would be feasible to do with the algorithm. @reidmcy as i understand it from reading the paper,the algorithm does not do so well at identifying top level players (though much better than previous methods). They seem to inhabit a different region of 'stylometry space' so it would be unlikely that you could say some amateur played like a given strong professional player. In fact this seems to indicate that amateurs generally play in a completely different style than the strong professional. Generally, I don't understand why you can't identify individuals with similar stylometry. You are identifying the players by 100 of their games which tells you a location in the mapped space and then you look for the person whose candidate games place them most closely to this location and you predict the candidate is that person ~85% of the time correctly. So if two persons had similar styles wouldn't they map to locations which were closer to one another? Just as a simple example: you show in Figure 3c of the paper that e4 and d4 players inhabit different spaces even if you look at games starting from move 15 (k=15).

reidmcy

#15

@dampooo Lichess has a very clear licence that allows research use. We have not and do not plan on releasing our data, and we don't do analysis at the individual level. I'm the only person who's looked closely at the raw data, and that's so that I can create summaries (or debugging).

@EmaciatedSpaniard We were hoping that people would be close to other people with the same style. My main research goal is to build systems that can teach people. We found that there was not a strong similarity between people who are close together (besides weak clustering by Elo and preferred opening). This is an annoyingly common result of the type of machine learning model (transformer) they often create nonlinear spaces. So we focused the paper on the one thing the model could measure instead of our original goal.

@dampooo Lichess has a very clear licence that allows research use. We have not and do not plan on releasing our data, and we don't do analysis at the individual level. I'm the only person who's looked closely at the raw data, and that's so that I can create summaries (or debugging). @EmaciatedSpaniard We were hoping that people would be close to other people with the same style. My main research goal is to build systems that can teach people. We found that there was not a strong similarity between people who are close together (besides weak clustering by Elo and preferred opening). This is an annoyingly common result of the type of machine learning model (transformer) they often create nonlinear spaces. So we focused the paper on the one thing the model could measure instead of our original goal.

tedwong

#16

@reidmcy Any plan on reusing the methods on creating a human-style AI chess playing algorithm? For example, imitate a particular chess player on Lichess. Also, what're your views on the application of your paper to anti-chess cheating? Would that be possible to detect similarity to computer chess algorithms? Beautiful if we can classify an engine and player with 80%+ accuracy!

EmaciatedSpaniard

#17

@tedwong
re. creating AIs playing in the style of a certain human player:
see @reidmcy 's answer to me in response #12 of this thread.

@reidmcy , so the ML transformer algorithm creates a nonlinear space making clustering unreliable, but as i understand it the previously used algorithm which gave less accurate identification is linear. Are you planning to pursue that to cluster different decision making styles?

@tedwong re. creating AIs playing in the style of a certain human player: see @reidmcy 's answer to me in response #12 of this thread. @reidmcy , so the ML transformer algorithm creates a nonlinear space making clustering unreliable, but as i understand it the previously used algorithm which gave less accurate identification is linear. Are you planning to pursue that to cluster different decision making styles?

reidmcy

#18

@tedwong That was our original goal, but we've not been able to get it working without using lots (20,000+) of games. Anti-cheat is an interesting application, but it's difficult for me to look at. Most anti-cheat techniques are kept secret, which makes publishing results difficult. I'm a PhD student so I don't have the time (or funding) to work on something that won't lead to a publication. The openness of Lichess is a big benefit of using their data as I want to be open about my work as much as possible.

@EmaciatedSpaniard The previous method is basically comparing histograms of moves, so it's mostly a comparison test of which openings the player prefers. We also have a method that trains a complete deep neural network for the player, but that's the one which takes 20,000+ games and is even less linear than the transformer. I think the best approach would be to modify the transformer to make the space more linear/useful, but that would be a complete research paper on its own.

@tedwong That was our original goal, but we've not been able to get it working without using lots (20,000+) of games. Anti-cheat is an interesting application, but it's difficult for me to look at. Most anti-cheat techniques are kept secret, which makes publishing results difficult. I'm a PhD student so I don't have the time (or funding) to work on something that won't lead to a publication. The openness of Lichess is a big benefit of using their data as I want to be open about my work as much as possible. @EmaciatedSpaniard The previous method is basically comparing histograms of moves, so it's mostly a comparison test of which openings the player prefers. We also have a method that trains a complete deep neural network for the player, but that's the one which takes 20,000+ games and is even less linear than the transformer. I think the best approach would be to modify the transformer to make the space more linear/useful, but that would be a complete research paper on its own.

tedwong edited

#19

@reidmcy Thanks. For the anti-cheating, you probably don't need their secrets. What about taking PGN files from known bots and known cheaters (from their profile - violation of TOS)? Can your method identify between regular players and bots/semi-bots? Cheat detection will be a very useful application for your paper. That'd be something super relevant for chess.com & lichess, they are currently spending enormous resources on manual reviews. If successful, less volunteer time needed on reviewing lichess's cheaters. Even a 50% chance of success rate will cut the number of cheat reports our mods need to manually review.

reidmcy

#20

@tedwong Yes, that would be a good start. I don't know if there's a whole research paper in it without being able to show a real world result. I agree that getting better anti-cheat is a good thing, and if a developer from Lichess contacted us I'd happily give them access and help. It's not something that I'm likely to do on my own, I've got a many more research ideas than hours in a day.