The Gruelling Road to 2800 ELO
The grind is slow, but it continues.You may remember my blog from a few months back describing my journey to code a strong chess engine. Well, we're back once again!
Last time, we ended off with the addition of NNUE to my engine, which caused it to break past 2400. Today, I'll be detailing how we gradually went from 2400 to 2800 (on Lichess, by CCRL standards it's more like 2700 to 3200)!
Much of it boiled down to a few adjustments. Stick along if you want to understand how each of these work and how they made my engine stronger!
Please note that the ELO estimates are merely provided as rough estimates to make the blog more accessible! It would be more accurate to say "passed SPRT" or "failed SPRT". If you sum up all the ELO estimates, it won't quite be 400, but there were a ton of smaller adjustments and re-training of the NNUE with different parameters that piled up to make up the remainder.
Proper Move Ordering
Okay, yeah, I messed up before. Using only the static evaluation of the position to order moves is, frankly, a horrible heuristic.
After properly adding MVV-LVA, Killer moves, history heuristic, and counter-move history, the engine easily passed SPRT, gaining approximately 50 ELO.
I won't go too deep on what these heuristics are, but in essence, they are different ways of keeping track of moves that were consistently good in our search. If a move is a good counter to an opponent move, chances are, that same move is still going to be a good counter to a variety of other opponent moves.
Reverse Futility Pruning
Also known as "static null-move pruning", if our static evaluation tells us that we are winning by so much that we can afford to lose around a pawn every move and still be winning by a lot (in more technical terms, above beta), we can stop the search and assume we are a lot better. This gained around 15 ELO.
Null-move Pruning
Although I implemented this in version 1.0, it would often cause problems in endgame positions. This is because if we are in Zugzwang, the null-move observation doesn't work; namely, choosing to not play a move is actually our best option in those positions. I mitigated this (albeit badly) by disallowing null-move pruning when there were less than 8 pieces on the board. +8 ELO.
EGNN (deprecated)
This idea was something I came up with on a whim. Since the NNUE for evaluation is very general, and endgame positions are often very complicated, it would be helpful to make a completely new network to handle endgame positions. So, I trained another NNUE to handle positions with under 12 pieces specifically. +20 ELO
Aspiration Windows
When we do iterative deepening, usually the evaluation doesn't change very often between depths. So, we do searches for the next depth based on the result of the previous depth, speeding things up quite a bit. However, if the actual evaluation is outside of this window, we have to do a re-search, which takes time. In my previous implementation, I went directly to a full-window re-search. By changing it to gradually widen the window, it gained around 17 ELO.
Time management
Previously, I just allocated a set time for every search depending on how much time was left. However, we can be a bit more efficient with our time. Namely, the branching factor of my engine was around 2.8, meaning that it will take around 2.8x the time to complete a search at the next depth. So, if after searching a depth, we have already used most of our time (meaning we probably won't finish the next search), we stop. +20 ELO
Futility pruning
If we are towards the end of our search and our static evaluation is really bad (far below alpha), odds are we won't be able to improve it and we can just stop. +10 ELO
SPSA Tuning
OpenBench has a really nice tool that does statistics black magic to determine the best values for parameters in your engine. Plugging these values in, I got a +30 ELO gain!
History Gravity
Previously, our history updates were very simple (adding depth * depth to the term). By using a better formula that benefits unexpected updates and reduces expected updates, our history values are more accurate. +16 ELO
Capture History
We can also do history on captures, which actually changes the move ordering formula from MVV-LVA to MVV+CaptHist. +14 ELO.
Output Buckets
Instead of having the NNUE output just one evaluation, we can make it output multiple evaluations that specialize in the amount of pieces on the board. Sort of like a tapered eval system. +14 ELO.
King Input Buckets
Input buckets allow for a lot more specialization since adding more input buckets increases the number of trainable hyperparameters by a lot more than adding more output buckets. The most common method is to split the board into regions, and have buckets for the king being in these regions. For example, one region might be the castled areas (a1, b1, c1, g1, h1) and another one might be the endgame areas (you get the point). Although updates become a lot slower when the king crosses a region, the strength increase makes it worth it. +80 ELO.
CorrHist
Often, in positions with similar features (e.g. same pawn structure), the static evaluation is off from the true evaluation by around the same amount. So, we can maintain a history of "corrections" to make to the static evaluation to make it closer to the true evaluation. +7 ELO (and more to come when I implement more features!)
There are still a ton of other improvements I'm missing, but I just wanted to post a progress update :)
