Calculating the Sharpness of Different Players

Comments on lichess.org/@/jk_182/blog/calculating-the-sharpness-of-different-players/JgqgiwXI

I like that questions "flow". That is key for me for getting in on the research plan. (fast fly by, will come back).
In other words I read the headers. When the headers help getting some map of the presentation, that is a good thing for readers like me, who are curious more about the reasoning first, and the details later. So I know I want to read the details later.

Now is a bit later than above: (jumped to conclusion, about more sharing about the higher level plan).

> Whenever testing something new like this, .... (clipped, used as backref pointer)
Yes, I agree simpler first and then, if not sufficient, another layer of nuance. I am not against that.

I do think, that one can share that process, it helps in accepting the proposal as being a hypothesis, not a done deal. So it might be more productive if one is thinking that they have gotten the real deal to give a chance to, and further, imagination or research, but that would be still of working hypothesis status or purpose or intent: working hard on it, to see what it can give as the simple (but not simpler) WH for iits duration as WH (? how long.. might be good to try very long as a populatoin, and to end of life as individual, see below, as this is inline edit thinking of that).

BTW: I also find in chess theory, there is not enough of that attitude with respect to illlustrious sources of working hypotheses (did anyone check for proof of them really having been in the "I have spoken" intent?).

jk_182

@dboing said in #2:
> I like that questions "flow". That is key for me for getting in on the research plan. (fast fly by, will come back).
> In other words I read the headers. When the headers help getting some map of the presentation, that is a good thing for readers like me, who are curious more about the reasoning first, and the details later. So I know I want to read the details later.

That's exactly how I'm structuring my posts right now, so I'm happy that you like it. My current approach is to test many different things with engines to see what kind of things are working. Later I want to refine them and combine different measures I've tested in the blog posts.

dboing

edited

I should not edit my post in real time.. sorry. did not change the quote though.

I guess, I may have hunch aleady or have been past life professionally deformed into looking into distributions, for they can be considered as metrizable spaces (beyond even the mere probablity invariant under the curve), but for their many dimensions of shapeable and signature potential.

That might be simpler to me, with its associated toolery I would love to swim within, than a finite, one at a time, sequence of 1D features. And the frustration of getting an engine analytical tool that acts like it knows the answer is ultimately "42", but never shares its question flow with us (or that might be past 5 years, as SF is a-changing it seems).

I agree with your approach though, as my hunch that chess board, chess player, and other chess player in front, fogs and antifogs has jumped to needing more dimensions, well that hunch might looks like twilight zone stuff, and it needs some illustration of being an avenue out pulling on bootstraps harder for ever, a demonstration.

And if in the process, we (you) find that it was not needed to look a bigger but simpler (for the twisted internal models of science like me) for those who can handle one dimension at a time, walks in the big dimensional world of chess. I am glad thought that you are sharing this path by mentioning the ambient "universe" restriction.

I wanted to also share, why I pull in another direction. For I find a huge gap of intellectual traditions in the science of chess. I am glad sometime has started takng the definition bull by the horns.. I will follow. First ask the questions, you did. Thank you.

sjcjoosten

I love seeing posts like this. I like how simple the calculation is to explain, but I'm wondering whether there's a metric that is more resistant against long endgames skewing the average. What I'm thinking of is to only or mainly look at moves that lead to objectively worse positions, i.e. to look at moves that don't match the engine's top choice: those are the positions where a player is willing to give up some advantage in order to make the resulting position more or less sharp.

ar1s

hi, i found your post interesting so i traced it back to your definition of sharpness.

Indeed Leela may be "hiding" some information when 50% win and 50% loss looks the same as 100% draw, so it is worth thinking how to form a single metric, which accurately represents what is going on.

Any function will do, you could for example use something akin to the date convention used atypically and perhaps now more typically by software engineers. We don't bother about mm.dd.yyyy. and dd.mm.yyyy or think maybe just use two numbers for the year instead of four, we just construct a "monotonic" number that we can then easily sort to and is guaranteed to preserve the sequence so we use yyyymmdd this is an increasing number and bigger numbers mean that date is further in the future.

However there are gaps. so we will never get 20240145 it goes from 20240131 to 20240201, one day suddenly becomes 70 units away by this metric, and then drops to 1 unit a day 20240202.

For sorting and filing this is fine, however if i were to use this for more nuanced stuff, say sensitivity, then a change of some other value I observe daily say, chess games played, becomes skewed. Since on a per day basis it adds 70 going for jan 31 to feb 1instead of just the one day

Long story short, sorry I do not have the time to delve deeper on this and if I am wrong sorry about wasting your time too, the thing is, by using a sigmoid you perhaps assume, for example that wins and loses become more certain as the curves are flatish there, whereas near the draw, the slope is very sharp.
That is possibly not the case in chess, a blunder can happen and turn the game around even near the end and swing evaluations sharply. I'd try for something linear to start with. Anyway as long as you know, you know how to compensate too.

Again thank you and hopefully that does tell something about the players and can possibly help them improve too

dboing

edited

As long as you have a layer on top of what you are doing, and that it is shared, you and peers can follow with you how to compensate or not when many heads on the same task can be recruited that way.

Previous post last statements, and including individual fallibility into the reasoning I just proposed. Thanks for the last postures for joining, however sustained that is, I am sure it is food for op (as it is for others).

To op:
Differential sharpness through depth? (Finished the conclusion), now there is a question, and a hook for me to hunt reading the body of the blog post. I might have to go back to the first sharpness post as well. Sharpness_WDL_LC0 that is.

You also make a claim or conjecture about some interactions being negligible. I would like to understand your reasons for saying so. Perhaps I do not understand the things being said to not interfere with each other. (give me some time to try myself after doing the above).

I definitely agree that more heads (and engines have those behind their designs, if not literally in the design...:)), are better than one, ever if by silicon intermediates. One number to rule all life timescale of learning trajectories, leave me skeptical anyway. Be it per position.

jk_182

@sjcjoosten said in #5:
> I love seeing posts like this. I like how simple the calculation is to explain, but I'm wondering whether there's a metric that is more resistant against long endgames skewing the average. What I'm thinking of is to only or mainly look at moves that lead to objectively worse positions, i.e. to look at moves that don't match the engine's top choice: those are the positions where a player is willing to give up some advantage in order to make the resulting position more or less sharp.

That's an interesting idea. But only looking at moves that reduce the evaluation might disregard moves that go for complicated positions and are the best move on the board. Also by only looking at moves that aren't objectively the best, players will have a very different number of moves for the sharpness average, which might make the comparison more difficult.
But it's certainly something I'll keep in mind.

jk_182

@dboing said in #7:
> You also make a claim or conjecture about some interactions being negligible. I would like to understand your reasons for saying so. Perhaps I do not understand the things being said to not interfere with each other. (give me some time to try myself after doing the above).

I meant that there certainly is a correlation between sharpness change and the accuracy of play, but I think that it isn't a big problem since I would look at both metrics anyway and keep this possible relation in mind.
It would certainly be better if there wouldn't be any interaction between the metrics, but I think that this isn't really possible. If one side makes a mistake in a boring drawn position, which gives the other player chances, the sharpness will certainly increase, so it's hard to separate mistakes that lead to a sharpness change or a sharpness change due to taking more risks.

themarginaleye

#10

Me seeing the image at the top of the post:

Is this a hit piece on Firouzja???