Which chess engine is best in a particular phase?

If chess engines were tested only for their performance in one phase of the game, which would be 1st, 2nd, and 3rd place in each phase of a particular ECO code ?

Toscani

Which engine is best in the openings?
Which engine is best in the middle games?
Which engine is best in the end games?

Which engine is 2nd best in the openings?
Which engine is 2nd best in the middle games?
Which engine is 2nd best in the end games?

Which engine is best in the openings? Which engine is best in the middle games? Which engine is best in the end games? Which engine is 2nd best in the openings? Which engine is 2nd best in the middle games? Which engine is 2nd best in the end games?

CM Sarg0n

The 7-men endings (meanwhile parts of 8 men) are solved perfectly hence there is no difference.

EvilChess

Well I think in any part of the game, it really depends on how open or closed is the position. If the position is fairly open, then deep analysis is preferred, so Stockfish is preferred. Though if the position is not open and positional play is preferred, then Lc0 should do a better job. End games are usually quite open thus Stockfish will usually do much better.

I think by combining both engines with perfection it could be possible to create an engine even stronger than Stockfish. A simple way of combining them is this: whenever Stockfish is not really sure about the best move, let Lc0 make the choice. If Lc0 is pretty sure of which is the best move, just do it. If Lc0 is also not so sure, than chose the best Lc0 move within the best moves predicted by Stockfish.

Well this seems fairly easy to be developed and tested with a computer that has a good CPU and a good GPU for Lc0 :-)

Assuming Lc0 is still the top NN engine, I would prefer it for openings and Stockfish for the end games. For middle games, it depends on how open or closed the game is. Usually Stockfish but for closed positions probably Lc0. Well I think in any part of the game, it really depends on how open or closed is the position. If the position is fairly open, then deep analysis is preferred, so Stockfish is preferred. Though if the position is not open and positional play is preferred, then Lc0 should do a better job. End games are usually quite open thus Stockfish will usually do much better. I think by combining both engines with perfection it could be possible to create an engine even stronger than Stockfish. A simple way of combining them is this: whenever Stockfish is not really sure about the best move, let Lc0 make the choice. If Lc0 is pretty sure of which is the best move, just do it. If Lc0 is also not so sure, than chose the best Lc0 move within the best moves predicted by Stockfish. Well this seems fairly easy to be developed and tested with a computer that has a good CPU and a good GPU for Lc0 :-)

Toscani

Thanks for the inputs...
BanksiaGui does engine combinations.
https://banksiagui.com/create-new-engines-by-mixing-existing-ones/
It works well for my needs.

Thanks for the inputs... BanksiaGui does engine combinations. https://banksiagui.com/create-new-engines-by-mixing-existing-ones/ It works well for my needs.

dboing

edited

@Sarg0n said in #3:

The 7-men endings (meanwhile parts of 8 men) are solved perfectly hence there is no difference.

I think the question is about letting loose the engine with no crutches to see its programmed biases and holes.

EGTB is cheating, TB assisted engine.. missing opportunity to reall look at engine preferences in many, as you point out solved legal starting positoins.. one could even make a well meshed response "surface" and have a solid ground for a distance/metric between that and the function obtained on same input set by the EGTB.. Hint kind of like the things that NN networks learn to minimize, during trainng. or any global optimization set up....

Endgames should be where the building blocks of chess play should be tested.. if the engine need the EGTB as a crutch there, how can we trust it output in more complex positions.. leap of faith.. ELO is proof? if so, then it should show in endgames too?

any data anaylsis using EGTB as external referential function baseline. the thing to approximate and get some sense of how far the engine is, when on its own? sorry for the repeat.. I just think this is a blind spot for engine performance measures.. so 2 paraphrasing attempt might be better than one...

@Sarg0n said in #3: > The 7-men endings (meanwhile parts of 8 men) are solved perfectly hence there is no difference. I think the question is about letting loose the engine with no crutches to see its programmed biases and holes. EGTB is cheating, TB assisted engine.. missing opportunity to reall look at engine preferences in many, as you point out solved legal starting positoins.. one could even make a well meshed response "surface" and have a solid ground for a distance/metric between that and the function obtained on same input set by the EGTB.. Hint kind of like the things that NN networks learn to minimize, during trainng. or any global optimization set up.... Endgames should be where the building blocks of chess play should be tested.. if the engine need the EGTB as a crutch there, how can we trust it output in more complex positions.. leap of faith.. ELO is proof? if so, then it should show in endgames too? any data anaylsis using EGTB as external referential function baseline. the thing to approximate and get some sense of how far the engine is, when on its own? sorry for the repeat.. I just think this is a blind spot for engine performance measures.. so 2 paraphrasing attempt might be better than one...

dboing

edited

an engin that has own ways to giving a score and the confidence on its score... That would be great.. Does it already exist? what are the assumptions under which such feature make sense.?

but yes such measure might be helping in combining sub-optimal expert "systems" so that each compensate the other over each other biases or poor coverage (with lower confidence), if we are lucky and the different engine complement each others....

I have seen a recent result about such sub-optimal expert combinations beating even better solo experts. . the sum being better than the parts.. because the weak region of expertise are not the same for all the parts of the combination. I hope i did not say more than the paper... trying to make this generally meaningful..

It might be better than LC0, which has more end points in its "bad games" in early phases, lots of fools mates, and which deep games might come from better play batches, which might not be as exploring over legal set (RL dilemna). Maybe that explanation is not right, but it is something that i did hear too.. that lc0 is not as good in endgame as in openings.. it has tactical holes.. But that does not make SF good at endgames...

I trust human ELOs to represent a better coverage of chess. because of more humans playing chess than engine programmers.. (who also will share a winning recipe across compatible design engines... LMR and etc....).

more different biases of experience confronting each other in well mixed pairing systems.. And we can count on human error, for not being able to sustain cooperative and conformist play for too long... so that the biases are likely to be all represented in some ELO measure over many games in such human player pools...

For engines, given the small design "gene" pool, we would not know if all of them were competing over the same restricted region of chess space. Unless some outlier engine fast enough to not have that same small region predilection.. The engine would have to be both less biased AND fast enough to beat the other type of design which made a career out of speed improvements which are enough for ELO to keep improving.

This does not mean we should not try to get better by combining the 2 very different species of engine. based on above hypothesis. we might get lucky...

How much does SF know about how sure it is about best move. Shouldn't this be based on some legal tree exploration (in game) statistics, and some external referential somewhere... I am curious about shared, preferably reproducible statistics of the kind... an engin that has own ways to giving a score and the confidence on its score... That would be great.. Does it already exist? what are the assumptions under which such feature make sense.? but yes such measure might be helping in combining sub-optimal expert "systems" so that each compensate the other over each other biases or poor coverage (with lower confidence), if we are lucky and the different engine complement each others.... I have seen a recent result about such sub-optimal expert combinations beating even better solo experts. . the sum being better than the parts.. because the weak region of expertise are not the same for all the parts of the combination. I hope i did not say more than the paper... trying to make this generally meaningful.. It might be better than LC0, which has more end points in its "bad games" in early phases, lots of fools mates, and which deep games might come from better play batches, which might not be as exploring over legal set (RL dilemna). Maybe that explanation is not right, but it is something that i did hear too.. that lc0 is not as good in endgame as in openings.. it has tactical holes.. But that does not make SF good at endgames... I trust human ELOs to represent a better coverage of chess. because of more humans playing chess than engine programmers.. (who also will share a winning recipe across compatible design engines... LMR and etc....). more different biases of experience confronting each other in well mixed pairing systems.. And we can count on human error, for not being able to sustain cooperative and conformist play for too long... so that the biases are likely to be all represented in some ELO measure over many games in such human player pools... For engines, given the small design "gene" pool, we would not know if all of them were competing over the same restricted region of chess space. Unless some outlier engine fast enough to not have that same small region predilection.. The engine would have to be both less biased AND fast enough to beat the other type of design which made a career out of speed improvements which are enough for ELO to keep improving. This does not mean we should not try to get better by combining the 2 very different species of engine. based on above hypothesis. we might get lucky...

Toscani

Is there a software to estimated the next move popularity?
1.d4 Nf6 2.c4 e6 is it really the normal order for all rating levels?

For me to see an approximate chance of White winning in any particular move. I use Nibbler Gui.
Example (Depth: 5, Q: 0.035, WDL: 55 925 20) With this info I get to see the win, draw, loss rate before playing the move.

Are there other softwares that can display a matrix of info before playing a move?
If one exists others could run multiple engines using the same opening line and see what could be the expect normal order for the next move. In a way it could become a main game line if it becomes popular.

I think a popular line does not mean it's following best moves. It might just have an opening plan to gain a Bishop.

Is there a software to estimated the next move popularity? 1.d4 Nf6 2.c4 e6 is it really the normal order for all rating levels? For me to see an approximate chance of White winning in any particular move. I use Nibbler Gui. Example (Depth: 5, Q: 0.035, WDL: 55 925 20) With this info I get to see the win, draw, loss rate before playing the move. Are there other softwares that can display a matrix of info before playing a move? If one exists others could run multiple engines using the same opening line and see what could be the expect normal order for the next move. In a way it could become a main game line if it becomes popular. I think a popular line does not mean it's following best moves. It might just have an opening plan to gain a Bishop.

EvilChess

@Toscani said in #5:

BanksiaGui does engine combinations.

Nice, but what is the logic it uses to decide which engine to follow at a given move? If the criteria is just the game stage (opening, middlegame and endgame), then that's not what I suggested.

To test the hypothesis that I brought, you really need some code development, but just a little.

@Toscani said in #5: > BanksiaGui does engine combinations. Nice, but what is the logic it uses to decide which engine to follow at a given move? If the criteria is just the game stage (opening, middlegame and endgame), then that's not what I suggested. To test the hypothesis that I brought, you really need some code development, but just a little.

Toscani

#10

BanksiaGui shows a roll down menu of choices to combine engines.
Range by scores
Range by move order
Range by piece total
Random
Sequence
The highest vote
The lowest vote
The highest score
The lowest score
The highest depth
The most nodes

BanksiaGui shows a roll down menu of choices to combine engines. Range by scores Range by move order Range by piece total Random Sequence The highest vote The lowest vote The highest score The lowest score The highest depth The most nodes

This topic has been archived and can no longer be replied to.