Which chess engine is best in a particular phase?

Ok, I hope my comments help. I'm done now with this topic by now, because it can take time and I don't want to get even shorter in time than I'm now.

Me too no time for that yet. However in some future i will try to figure out what you meant. I was curious about my mental block in understanding the usual explanations for MCTS and PUCTS in A0 tree searches. That can be a can of worm finding out what is wrong in ones understanding. I am not sure what you mean by convolution at the input layer. I suspect we were not talking about representation in the same way. As most of the whole NN magic comes from transforming the input layer representation internally so the heads get to find their kittens in that transformed view of the input.. My question was taking NN as black box. As the explanations are still made at that level. As the TS in MCTS. But I will contact you in inbox later (no ETA).

@EvilChess said in #30: > Ok, I hope my comments help. I'm done now with this topic by now, because it can take time and I don't want to get even shorter in time than I'm now. Me too no time for that yet. However in some future i will try to figure out what you meant. I was curious about my mental block in understanding the usual explanations for MCTS and PUCTS in A0 tree searches. That can be a can of worm finding out what is wrong in ones understanding. I am not sure what you mean by convolution at the input layer. I suspect we were not talking about representation in the same way. As most of the whole NN magic comes from transforming the input layer representation internally so the heads get to find their kittens in that transformed view of the input.. My question was taking NN as black box. As the explanations are still made at that level. As the TS in MCTS. But I will contact you in inbox later (no ETA).

Toscani

#32

Testing the engines strength with books and without.

https://chess.massimilianogoi.com/benchmarks/#:~:text=Testing%20the%20engines%20strength%20with%20books%20is%20legit%3F

With Book: (Polyfish vs Honey): Honey wins, Poly could not exploit that ECO opening or Poly is weaker in the "Middle phase".
No Book: (Polyfish vs Honey): Polyfish wins, Honey is weaker in the "Opening phase" that Polyfish created.

An opening may look balanced but once it's played out things tend to look different.
I would assume the results depend on the opening used. Like getting a head start in a race. An engine that maybe able to exploit a position better than an other. Self-play the ECO codes may help see the ± sway on the graph or weight factor of a used opening.

The quantity of pieces are not the same all the way through a game.
So some engines may be able to deal better with a full chessboard (opening phase), than other engines.
Some can probably calculate or analyse better a middle game, than other engines.

The solution is sorting the engines, not by won games, but by phase advantages. The only way I know how is using centipawn values or the results from the analysis on the Lucas chess software.

Centipawn values : (Add a plus or a minus to make it good or bad)
0.0 Consider position balanced (=) ;
0.5 Slight advantage, like Interesting move (!?) or Dubious (?!) ;
1.5 Moderate advantage ! or Poor Move (?) ;
3.0 Decisive advantage !! or Blunder (??) ;
5.5 Considered game won (++); In an arena tournament, if you have -5.5 it's time to think of resigning.

With values like the above, it helps to make decisions.

Testing the engines strength with books and without. https://chess.massimilianogoi.com/benchmarks/#:~:text=Testing%20the%20engines%20strength%20with%20books%20is%20legit%3F With Book: (Polyfish vs Honey): Honey wins, Poly could not exploit that ECO opening or Poly is weaker in the "Middle phase". No Book: (Polyfish vs Honey): Polyfish wins, Honey is weaker in the "Opening phase" that Polyfish created. An opening may look balanced but once it's played out things tend to look different. I would assume the results depend on the opening used. Like getting a head start in a race. An engine that maybe able to exploit a position better than an other. Self-play the ECO codes may help see the ± sway on the graph or weight factor of a used opening. The quantity of pieces are not the same all the way through a game. So some engines may be able to deal better with a full chessboard (opening phase), than other engines. Some can probably calculate or analyse better a middle game, than other engines. The solution is sorting the engines, not by won games, but by phase advantages. The only way I know how is using centipawn values or the results from the analysis on the Lucas chess software. Centipawn values : (Add a plus or a minus to make it good or bad) 0.0 Consider position balanced (=) ; 0.5 Slight advantage, like Interesting move (!?) or Dubious (?!) ; 1.5 Moderate advantage ! or Poor Move (?) ; 3.0 Decisive advantage !! or Blunder (??) ; 5.5 Considered game won (++); In an arena tournament, if you have -5.5 it's time to think of resigning. With values like the above, it helps to make decisions.

dboing

#33

If the engine is biased against an opening, even a slight bit, as some position. I would expect self-play to keep amplifying it.

unless that engine score difference is within some margin of error of the engine scoring function (which we don't have a clue about).

Although, maybe doing repeated engine experiment over many different human trusted openings (to be drawish), might give us some clue about the precision..

from the lazy eye of statistics we can forget** that the FEN is a strong factor, and act as if experiments with different FENs was the same event under which to do frequency statistics.

in that context (perhaps needed if self-play does not have more diversity of outcome) we coud figure out what score difference (forget the human error bins. or glyph advantage category, ,keep quantitative at first, engines do it, why not us).
makes for a sure outcome in its direction.. which would mean 100% odds.

Maybe get a sense of variability of outcome for same FEN under non-interfered self-play (with replicates), would help decide. If one FEN is enough. then we could include FENs in the statistical factors...

Recap: hypothesis: should not an engine self play out of a position where that same engine in single root search finds some advantage, result, during self-play continuation from there, at least keep that advance... all side being equal strenght. And given that chess has an attrition bias (eventually either terminal outcome or material attrition, maybe even under random play), isn't an engine which already in its root legal tree search saw an advantage at some deep leaf, going to keep fructifying that advantage.

now how to refute that with experiements. or more likely how to find the conditions under which that is true.
Can we construct the conversion curves that lichess uses for accuracy for engine self play (and are supporte by its own database about relation between engine score and outcome odds, conditional to pair rating average).. see the FAQ on accuracy (i also dug out a maia paper figure where the basis for that FAQ might be).

That is what the previous post raises as background questions to figure out in paralell or to give it some support. one might need to use the starting position depth also as a potential factor.. So starting position depth, position FEN itself, engine single root search at that position, engine parameters, and replicated experiments to terminal outcome.. This could be feasible and might allow to put some quantitative in the above post context.

** This may seem unorthodox, as often statistics don,t spell out factors, and then ask you to forget one of them... But philosophically, classical statistics have always been doing that.. With the "let data speak" motto, where one should not even think about what the statistical model should be having as internal structure, as it could inject some bias into the data analysis, they would actually inject the notion that all non considered factors were equally small factors, and that using a normal distribution would be the occam's razor first set of data expereiment and analysis. And then learn from outliers... or normal test failing.. if the case.. Everything can be assumed "random" at first... if you don't want to look inside (might be a conclusion... ).

well i wonder, if the outcome from a position where a single root search by engine is already having an advantage per its scoring, is not just going to implement the search tree in self play, and each time it update, being of equal strengths on both sides, assuming, is bound to fructify that advantage.. even if the position early score was not accurate chess evaluation of the outcome odds under perfect chess. If the engine is biased against an opening, even a slight bit, as some position. I would expect self-play to keep amplifying it. unless that engine score difference is within some margin of error of the engine scoring function (which we don't have a clue about). Although, maybe doing repeated engine experiment over many different human trusted openings (to be drawish), might give us some clue about the precision.. from the lazy eye of statistics we can forget** that the FEN is a strong factor, and act as if experiments with different FENs was the same event under which to do frequency statistics. in that context (perhaps needed if self-play does not have more diversity of outcome) we coud figure out what score difference (forget the human error bins. or glyph advantage category, ,keep quantitative at first, engines do it, why not us). makes for a sure outcome in its direction.. which would mean 100% odds. Maybe get a sense of variability of outcome for same FEN under non-interfered self-play (with replicates), would help decide. If one FEN is enough. then we could include FENs in the statistical factors... Recap: hypothesis: should not an engine self play out of a position where that same engine in single root search finds some advantage, result, during self-play continuation from there, at least keep that advance... all side being equal strenght. And given that chess has an attrition bias (eventually either terminal outcome or material attrition, maybe even under random play), isn't an engine which already in its root legal tree search saw an advantage at some deep leaf, going to keep fructifying that advantage. now how to refute that with experiements. or more likely how to find the conditions under which that is true. Can we construct the conversion curves that lichess uses for accuracy for engine self play (and are supporte by its own database about relation between engine score and outcome odds, conditional to pair rating average).. see the FAQ on accuracy (i also dug out a maia paper figure where the basis for that FAQ might be). That is what the previous post raises as background questions to figure out in paralell or to give it some support. one might need to use the starting position depth also as a potential factor.. So starting position depth, position FEN itself, engine single root search at that position, engine parameters, and replicated experiments to terminal outcome.. This could be feasible and might allow to put some quantitative in the above post context. ** This may seem unorthodox, as often statistics don,t spell out factors, and then ask you to forget one of them... But philosophically, classical statistics have always been doing that.. With the "let data speak" motto, where one should not even think about what the statistical model should be having as internal structure, as it could inject some bias into the data analysis, they would actually inject the notion that all non considered factors were equally small factors, and that using a normal distribution would be the occam's razor first set of data expereiment and analysis. And then learn from outliers... or normal test failing.. if the case.. Everything can be assumed "random" at first... if you don't want to look inside (might be a conclusion... ).

This topic has been archived and can no longer be replied to.