Compressing Chess Moves Even Further, To 3.7 Bits Per Move

@JosephChessKing6 said in #10:
> There is a mistake in your article. You said "3.7 Bits Per Move". You must have meant 3.7 bytes per move, or something similar.
> 3.7 bits (or any fractional value of bits) is impossible because a bit is a binary value: 1 or 0, represented by a switch in a computer being either 'on' or 'off', like a light switch in a house. A switch cannot be 'seven-tenths on'. You can only have a whole number of bits.
>
> If I am completely wrong, which is very very likely, please explain what is actually going on because I am confused.

I need to read that article again but i think its just as saying that white pieces win 52% times in a standard classical TC chess game :) -white can either win, lose or a draw/stalemate can be achieved - no 52% not even 50% - its either a win or not :)
Statistics is partly a lie :)

Polish philosopher once said that there are three truths:
- holy truth
- indeed / "that's also truth"
- BullSheet truth [typoed intentionally]

I really am impressed about the topic and goal :).
EDIT: My impact is that for 90% of games there are only 4 most popular moves being played - and all the other moves comprehend only 10% of plys, so the enthropy limit for a "best" algorithm is more than 2 bits and i think it lies between 2,3 and 2,5 bit per ply ;)

I can only recommend, input the approach the CombSort algo authors have done: take the real data and run it through the algo parameters. The LLM aproach doesnt help after ply 20 as Chatgpt example shows :) but my hints are:

A. Create the list of rules with our help - tell us which logical predicates for ordering possible moves does the shakmaty provide in each position , i recall there was a library called skidd (or similar chess skid)
Later create a genetic algorithm with it with the goal function being a compression achieved with the latter part you wrote and check which rules are promoted on standard (mostly bullet / blitz) games available in lichess db archive.

B. access the opening book just to check out the statistics tree - just to know where does the opening precedence end, on which ply - for example - untill which move there are only 4-5 most favourite plys in reply? This TOP tree stat can be encoded in some "fraunhoffer laboratory like" table (a matrix :)

PS - i have just run through the standard chess / Lichess games / Opening book and clicking the first move the (2-4 most favourite plys represent > 90% of possible games lasted up to move 20 (ply 40) with only around 10 games left.
(the position after 20 moves was FEN: r3r1k1/pp1q2p1/2p2pQB/3p4/2nP4/2P2P2/P5PP/R3R1K1 w - - 3 21

I have also run the opening book chosing the 3rd popular ply and the opening tree for 3rd move ended after move 7 with only 5 games left in the book. I observed that only at 1 place there were 5 possible popular plys, 4 popular consisted only of 80% of all next games - still very large number.

[ my book settings are ultrabullet and correspondence games disabled (my settings but its not meaningfull for stats i hope, not sure if there are more ultrabullet than justbullet games) and rating ranges from 1000-2500 ]

#24

TheTap

#25

Did not understand what the blog writer was trying to do?

To encode a chess move you need between 10 and 12 bits.
Proof:
Upper bound of 12: there are 64 from squares, and 64 to squares - enabling us to represent any move with two squares of 6 bits each. No more than 12 bits is needed.
Lower bound of 10: a rook can stand on any of the 64 squares, and it has in theory 14 available moves -> 896 moves. Add to that a bishop on any square has at least 7 moves -> 64*14 + 64*7 = 64*21 = 1344 combinations, which is larger than 10 bits can represent.
QED

There is no way to do better in general.

Another thing you could try is to find the "plan" the player is executing. Some examples of plans:

Player is likely to castle early (high score first time). But if they have castling available and choose not to do it, then maybe they don't want to castle so on subsequent moves you can lower the score for castling.

In KRvK there are two common strategies. You can try to identify which one they are using and they will likely keep using it. The player with the king on the other hand is likely to try to maintain as much space as possible for as long as possible - with occasional curveball to see if the other player is awake.

If a position has occured twice it's likely to occur a third time.

Attacker / defender stacking: If the last moved attacked some chessman and it's possible to add another attacker it's likely it will be done. If it's possible to add another defender it's likely to be done.

If a knight has just been moved to a weird square likely it is being manouvered and will move again on next move.

Fianchettoing the bishop is likely after pushing the relevant pawn.

Poosotasty

#29

@TheTap said in #25:
> Lower bound of 10: a rook can stand on any of the 64 squares, and it has in theory 14 available moves -> 896 moves. Add to that a bishop on any square has at least 7 moves -> 64*14 + 64*7 = 64*21 = 1344 combinations, which is larger than 10 bits can represent.

Look at the binary tree: lichess.org/@/lichess/blog/developer-update-275-improved-game-compression/Wqa7GiAA

The origin square is not being stored as "1 of 64". Instead, all possible legal moves are first calculated in terms of probability according to a bare-mininum heuristics. That creates a binary tree that will will be traversed to find the actual move played, which can be as short as 00 or 01 (in other words, 2 bits). This 00/01 is the index of the tree. Assuming the last 10 moves were all the most likely moves as well (00 times 10), and that the game terminates on move 10, an entire game can be stored as 00000000000000000000 (20 zeros).

Poosotasty

#30

From the article:

"compression works in two steps:

1. Map moves to integer indexes. The smaller on average the better.
2. Encode those integers as a stream of bits."

In my example, each 00 is the encoded integer index of the binary tree. The integer index there would be 0 (the first leaf that is reached). If unencoed index were 2 (the third leaf reached) the encoded index would be 10 (binary for 3). In my example the entire game represented by 00000000000000000000 becomes a bitstream that will find and hit one node after another reconstructing the game.

It's just like how you know that when someone says "e4e5" you know white played a pawn to e4 and black played a pawn to e5. None of that "from what square" or "which piece moved" information needs to be described because *you* provide that information by using the chess board. In the same way, some interface which has access to the binary tree also provides the board and interprets the binary stream.