We have 2.5 million puzzles, from which Lichess picks the most popular for us. Assiduous puzzle users do play a few thousands of puzzles, but that's yet around a thousandth of what's available. Hence the puzzle popularity variable is quite important. Today, it has a quite simple formula, the percentage of upvotes:
Popularity = (upvotes - downvotes)/(upvotes + downvotes)
[ EDITED ] And "although votes are weigthed by various factors such as whether the puzzle was solved successfully or the solver's puzzle rating in comparison to the puzzle's." - source: https://database.lichess.org/#puzzles
But there are a few issues with this kind of ranking. Those issues are particularly affecting the harder range, of puzzles rated above 2500.
Issue 1) The puzzle trainer UI is showing the thumbs up/down options too early. If you get the very first move wrong, you can then click the "View Solution" button. That will then bring the "Puzzle Complete!" phrase with the big thumbs up/down (the voting options). The issue here is that the player has not seen yet the remaining moves from that puzzle. The moves are added to the move list but that not what catches the user's attention. So there is a tendency there to capture votes that are only linked to the very first move of the puzzle. Example:
https://i.imgur.com/myY7r2R.png
Expected behavior:
- The voting options should not be shown until the user reaches the puzzle's end position.
- All existing votes that came from non-finished puzzles should be removed from database (if it is possible to identify them), so that popularity can be re-calculated.
Issue 2) The popularity formula may is resulting in a distribution which is too skewed to the high end:
https://i.imgur.com/bF2WVIm.jpg
OBS: Right side charts is using log scale.
As soon as a new puzzle gets it's first upvote, I think it gets as #1 in the ranking. That may explain why I have found a significant deterioration of puzzles quality after the introduction of Stockfish 15 (which is another issue related to the puzzle generator).
Issue 3) Other very important variables which certainly do impact on the vote may not being taken into account by the popularity formula, such as how many rating points this user gained or lost. That certainly can affect the user's mood, thus affecting his vote. I could analyze the relationship between those variables by plotting more charts with python seaborn library, but data I need is not be available publicly, I guess. [ EDITED ]
Issue 4) As a matter of a fact, improvement is needed in the popularity formula and on the puzzle picking criteria. A suggestion I have is to create an automated puzzle quality evaluation variable, the mix that with the puzzle popularity variable when picking new puzzles for a user. I will expand on that idea later. It's about using stockfish evaluation progression to estimate how solid or not is the actual win calculated by the engine in the puzzle end position.
Good puzzles:
https://i.imgur.com/YKDDtix.png
(from the puzzle's end position, stockfish can see an advantage right away)
(the third one still has some complexity going on though)
Bad puzzles:
https://i.imgur.com/BgZQ3bN.png
(from puzzle's end position, stockfish still needs to look 6 ply moves ahead before it can see an advantage)
Please leave your feedback here too.

