lichess.org
Donate

Leela PPO?

I wish I knew..... even the formulas and equations for the Algorithm 1 on that first page is mind boggling to me.

The last page all seemed like a different language to me. I find Leela Chess Zero a fascinating project, but I would need "Reinforcement Learning for Dummies" or something.

Maybe someone else here has a answer? Perhaps a programming forum though.
all i can remember is delta,log and the sigma function. How they got there or why they are there and what they are trying to represent no idea. when copying such algorithms to code just copy and paste
the formulas.
It appears I'm not the only person to have questions about Proximal Policy Optimization:
stackoverflow.com/q/46422845

Intuitively, PPO differs from other evolutionary methods by making small (proximal) policy changes with each evolution.
It's not impossible for a computer scientist to do it. However, it is imperative to have knowledge in machine learning.

The biggest problem is financial. A project of this kind requires significant computing power. For example, $25 million was spent to train AlphaGo Zero (Source: en.wikipedia.org/wiki/AlphaGo_Zero). In the case of Leela Zero, the developers provide a software that allows to use the bot, while training it. The more Leela Zero users there are, the more this bot is trained. This is in order to train Leela for free.

Translated with www.DeepL.com/Translator
Continuity seems to be a big problem to solve. The model seems to me more suitable for real-time simulations, and I do not know if the gains in mapping the discontinuity and complexity of the chess into something adaptable to the model outweigh the effort without distorting the model at the same time.

This topic has been archived and can no longer be replied to.