Identifying the Style of Different Players • page 3/6 • Community Blog Discussions • lichess.org

I'd like to see something about risk-taking worked into the 'attacking' Component 2, or the test of it; for example, Topalov is known for being very open towards sacrificing the exchange, so I think if you put some variable that tests for material disadvantage for long-term compensation (maybe just the evaluation staying generally steady), you'd start to see more from him. Great article and, as someone who doesn't understand this stuff at all, it seems to me a great achievement.

jk_182

#22

@Gusonian said in #21:

I'd like to see something about risk-taking worked into the 'attacking' Component 2, or the test of it; for example, Topalov is known for being very open towards sacrificing the exchange, so I think if you put some variable that tests for material disadvantage for long-term compensation (maybe just the evaluation staying generally steady), you'd start to see more from him. Great article and, as someone who doesn't understand this stuff at all, it seems to me a great achievement.

I test for material disadvantage, but I don't use the engine evaluation, as evaluating all the games with engines would take weeks.
For a future version, I'd like to include more attacking metrics, but I'll need to find some that can be easily checked

@Gusonian said in #21: > I'd like to see something about risk-taking worked into the 'attacking' Component 2, or the test of it; for example, Topalov is known for being very open towards sacrificing the exchange, so I think if you put some variable that tests for material disadvantage for long-term compensation (maybe just the evaluation staying generally steady), you'd start to see more from him. Great article and, as someone who doesn't understand this stuff at all, it seems to me a great achievement. I test for material disadvantage, but I don't use the engine evaluation, as evaluating all the games with engines would take weeks. For a future version, I'd like to include more attacking metrics, but I'll need to find some that can be easily checked

Schlosstrunk

#23

Great idea!
I'd definitely interested to see where modern players would be on that 2d graph. For instance, it is said that Carlsen plays similarly to Karpov. Will this get confirmed based on your variables and PCA? Or will we see an unexpected similar player to Carlsen? What about Nakamura, Gukesh, Pragg, Firouzja?

Other than that I think it would be nice to actually output the top correlations of the variables to the two components, for instance in a table.

Great idea! I'd definitely interested to see where modern players would be on that 2d graph. For instance, it is said that Carlsen plays similarly to Karpov. Will this get confirmed based on your variables and PCA? Or will we see an unexpected similar player to Carlsen? What about Nakamura, Gukesh, Pragg, Firouzja? Other than that I think it would be nice to actually output the top correlations of the variables to the two components, for instance in a table.

Hagredion

#24

The choice of these so called "components" is confusing and seems arbitrary to me. I think you could chose different weights and switch a couple of these components from the x to the y axis or vice versa and get a different graph.

jk_182

#25

@Hagredion said in #24:

The choice of these so called "components" is confusing and seems arbitrary to me. I think you could chose different weights and switch a couple of these components from the x to the y axis or vice versa and get a different graph.

The components were determined using principal component analysis, not some arbitrary choice from me.
Principal component analysis works by analysing the dataset with the original variables and finding how they are correlated. The first principal component is then a combination of these original variables that captures most of the variance in the dataset. The second component explains most of the variance of what is left over and so on. So I used PCA to reduce the original 40+ variables to just 2 that capture most of the information from the original variables.

@Hagredion said in #24: > The choice of these so called "components" is confusing and seems arbitrary to me. I think you could chose different weights and switch a couple of these components from the x to the y axis or vice versa and get a different graph. The components were determined using principal component analysis, not some arbitrary choice from me. Principal component analysis works by analysing the dataset with the original variables and finding how they are correlated. The first principal component is then a combination of these original variables that captures most of the variance in the dataset. The second component explains most of the variance of what is left over and so on. So I used PCA to reduce the original 40+ variables to just 2 that capture most of the information from the original variables.

jk_182

#26

@dboing said in #19:

@jk_182

when you talk about distances, is it only for endgames king to king distance, or in any phase, any of one side piece could be having its own distance (king or Manhattan or other) to the opponent king, and you would take the minimum over all pieces?

I might be missing some context (and it can always be considered to be my reading fault, my great reading fault, :)))

I calculate the relative number of moves that reduce the distance to the enemy king. I take all moves, so no specific game phase or piece. My hypothesis behind this was that attacking players often move their pieces towards the opponent's king to attack.

@dboing said in #19: > @jk_182 > > when you talk about distances, is it only for endgames king to king distance, or in any phase, any of one side piece could be having its own distance (king or Manhattan or other) to the opponent king, and you would take the minimum over all pieces? > > I might be missing some context (and it can always be considered to be my reading fault, my great reading fault, :))) I calculate the relative number of moves that reduce the distance to the enemy king. I take all moves, so no specific game phase or piece. My hypothesis behind this was that attacking players often move their pieces towards the opponent's king to attack.

Hagredion

#27

@jk_182 said in #25:

The components were determined using principal component analysis, not some arbitrary choice from me.
Principal component analysis works by analysing the dataset with the original variables and finding how they are correlated. The first principal component is then a combination of these original variables that captures most of the variance in the dataset. The second component explains most of the variance of what is left over and so on. So I used PCA to reduce the original 40+ variables to just 2 that capture most of the information from the original variables.

So basically if I understand correctly on one axis you have something that captures most variance in the dataset and on the other axis is most of the variance of what is left over?

@jk_182 said in #25: > The components were determined using principal component analysis, not some arbitrary choice from me. > Principal component analysis works by analysing the dataset with the original variables and finding how they are correlated. The first principal component is then a combination of these original variables that captures most of the variance in the dataset. The second component explains most of the variance of what is left over and so on. So I used PCA to reduce the original 40+ variables to just 2 that capture most of the information from the original variables. So basically if I understand correctly on one axis you have something that captures most variance in the dataset and on the other axis is most of the variance of what is left over?

n1000

#28

@Hagredion said in #27:

So basically if I understand correctly on one axis you have something that captures most variance in the dataset and on the other axis is most of the variance of what is left over?

That's exactly right. You can imagine each player being a point in space so with several players in the data you have some like an oddly-shaped point cloud. PCA gives a sequence of unit vectors, with the first pointing along the axis of most variation. The second vector maximizes variance once you've removed everything in the direction of the first, etc.

The second vector has to be perpendicular to the first, the third is perpendicular to the first two... For intuition say you want to describe a bunch of people based on measurements of height and weight. Tall people tend to be heavier, so your first component will capture that correlation into a single variable representing "largeness", say for variables height and weight, PC1 = [0.71, 0.71], it points equally in the height and weight direction. Then PC2 will point to [-0.71, 0.71] (or opposite), capturing something like "stockiness" vs "lankiness"

@Hagredion said in #27: > So basically if I understand correctly on one axis you have something that captures most variance in the dataset and on the other axis is most of the variance of what is left over? That's exactly right. You can imagine each player being a point in space so with several players in the data you have some like an oddly-shaped point cloud. PCA gives a sequence of unit vectors, with the first pointing along the axis of most variation. The second vector maximizes variance once you've removed everything in the direction of the first, etc. The second vector has to be perpendicular to the first, the third is perpendicular to the first two... For intuition say you want to describe a bunch of people based on measurements of height and weight. Tall people tend to be heavier, so your first component will capture that correlation into a single variable representing "largeness", say for variables height and weight, PC1 = [0.71, 0.71], it points equally in the height and weight direction. Then PC2 will point to [-0.71, 0.71] (or opposite), capturing something like "stockiness" vs "lankiness"

dboing

edited

#29

Most variances among linear combinations of subsets of input space dimensions.
It does not consider other combination types (see next post for that).
Moreover, it does not have a clue about which would be the most explaining of what we would call style. Although we could use the op's guess as guiding hypotheses or existing guesses in the literature.
Any projection form a big space into a small subspace can be losing information toward the purpose.

This is for now, an exploration.

idea (in case I am not mistaken about the current situation)
Perhaps, it might be better to consider games or time series of games as the basic unit (segments of player time series of games, periods?), as this might already be a clustering confounding factor, to put all games of a player under the same data point? Did that get discussed already?

Question: In PCA, where does the notion of independence come in? And what is the difference, if any, between linear independence of each component, and statistical independence assumptions?

I might have confused myself while trying to read the below on ICA.... where I understand things possibly better, but then it showed me I might have never been curious enough about PCA logic... or this is just time doing its thing, or words needing refresh.

Most variances among linear combinations of subsets of input space dimensions. It does not consider other combination types (see next post for that). Moreover, it does not have a clue about which would be the most explaining of what we would call style. Although we could use the op's guess as guiding hypotheses or existing guesses in the literature. Any projection form a big space into a small subspace can be losing information toward the purpose. This is for now, an exploration. idea (in case I am not mistaken about the current situation) Perhaps, it might be better to consider games or time series of games as the basic unit (segments of player time series of games, periods?), as this might already be a clustering confounding factor, to put all games of a player under the same data point? Did that get discussed already? Question: In PCA, where does the notion of independence come in? And what is the difference, if any, between linear independence of each component, and statistical independence assumptions? I might have confused myself while trying to read the below on ICA.... where I understand things possibly better, but then it showed me I might have never been curious enough about PCA logic... or this is just time doing its thing, or words needing refresh.

dboing

edited

#30

leads for me to read further:
https://en.wikipedia.org/wiki/Independent_component_analysis#Defining_component_independence (ICA)
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction (e.g. SOM)
https://en.wikipedia.org/wiki/Factor_analysis (e.g. PCA)

To clean up my attic. And have the new nomenclatures so I don't share my ramblings in vain.
I am perhaps most likely to not be wrong in the second one. But even that must be bigger than one small brain to fully capture.

just saw this:
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Nonlinear_PCA
It uses NN as the basis of functions to transform the input space into another where to seek better "separation" and "congruence" at the same time.. I wonder if that is also guided by dispersion. My words under quotes for they might not be terms of art.

Quoting the small paragraph there:

Nonlinear PCA (NLPCA) uses backpropagation to train a multi-layer perceptron (MLP) to fit to a manifold.[37] Unlike typical MLP training, which only updates the weights, NLPCA updates both the weights and the inputs. That is, both the weights and inputs are treated as latent values. After training, the latent inputs are a low-dimensional representation of the observed vectors, and the MLP maps from that low-dimensional representation to the high-dimensional observation space.

This paragraph might need work to understand the "latent" thing, as applied to changing the inputs, but that would correspond, for the linear PCA, to considering new combinations from previous input space raw data dimensions, while the NN weights would correspond to the linear combinations weights being explored in PCA.

I am wondering about the task or objective function being optimized and back propagated from (but so do I, now to think of it, back for PCA, part of my self-critical looping tendencies: am I BS-ing myself and others?).

But the lingo here is about optimizing that, the objective functional, by exploring a bigger space... I have no experience there. But curious.

Latent might just mean, here, being searched during optimization. Maybe, not in the same phases. There might be some alternance given the nature. But it might be a full hypercubic set of latent variable as well. Would it matter?

Latent might be in the eyes of the beholder. From what we give the monster algo, and what we target the NN to spit out, and in turn optimize. But clearly we won't just use either of those not "hidden" variable sets (input and such output), but seek which of the new input "latent" variables we end up with that do optimize the objective function.

In more typical MLP training, latent just means the hidden layers between input layer and last layer (decision layer if classification task).

It seems though that they called the NN MLP weights as latent "values". This might be my bad, they do determine in my understanding the transformed variables through the determination of those weights, so maybe this slight slip is warranted or we end up like me rambling for hours on end (and reader steam out of ears might ensue). Latent are the transformed input variables through the NN entrails, and so are the weights of each unit determining those variables (as each unit output). Values being searched in the process seems good enough interpretation of "latent".

leads for me to read further: https://en.wikipedia.org/wiki/Independent_component_analysis#Defining_component_independence (ICA) https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction (e.g. SOM) https://en.wikipedia.org/wiki/Factor_analysis (e.g. PCA) To clean up my attic. And have the new nomenclatures so I don't share my ramblings in vain. I am perhaps most likely to not be wrong in the second one. But even that must be bigger than one small brain to fully capture. just saw this: https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Nonlinear_PCA It uses NN as the basis of functions to transform the input space into another where to seek better "separation" and "congruence" at the same time.. I wonder if that is also guided by dispersion. My words under quotes for they might not be terms of art. Quoting the small paragraph there: > Nonlinear PCA (NLPCA) uses backpropagation to train a multi-layer perceptron (MLP) to fit to a manifold.[37] Unlike typical MLP training, which only updates the weights, NLPCA updates both the weights and the inputs. That is, both the weights and inputs are treated as latent values. After training, the latent inputs are a low-dimensional representation of the observed vectors, and the MLP maps from that low-dimensional representation to the high-dimensional observation space. This paragraph might need work to understand the "latent" thing, as applied to changing the inputs, but that would correspond, for the linear PCA, to considering new combinations from previous input space raw data dimensions, while the NN weights would correspond to the linear combinations weights being explored in PCA. I am wondering about the task or objective function being optimized and back propagated from (but so do I, now to think of it, back for PCA, part of my self-critical looping tendencies: am I BS-ing myself and others?). But the lingo here is about optimizing that, the objective functional, by exploring a bigger space... I have no experience there. But curious. Latent might just mean, here, being searched during optimization. Maybe, not in the same phases. There might be some alternance given the nature. But it might be a full hypercubic set of latent variable as well. Would it matter? Latent might be in the eyes of the beholder. From what we give the monster algo, and what we target the NN to spit out, and in turn optimize. But clearly we won't just use either of those not "hidden" variable sets (input and such output), but seek which of the new input "latent" variables we end up with that do optimize the objective function. In more typical MLP training, latent just means the hidden layers between input layer and last layer (decision layer if classification task). It seems though that they called the NN MLP weights as latent "values". This might be my bad, they do determine in my understanding the transformed variables through the determination of those weights, so maybe this slight slip is warranted or we end up like me rambling for hours on end (and reader steam out of ears might ensue). Latent are the transformed input variables through the NN entrails, and so are the weights of each unit determining those variables (as each unit output). Values being searched in the process seems good enough interpretation of "latent".