lichess.org
Donate

Popularity slope of chess openings

Chess openings gain traction for numerous reasons. But what's behind their surge in popularity, and what methods can we use to track the most trendy openings each day?

Let's start with this one: lichess.org/opening/Rapport-Jobava_System

Since there was no opening description for that open, I started stockfish in the terminal:
$ ./stockfish
position fen rnbqkb1r/ppp1pppp/5n2/3p4/3P1B2/2N5/PPP1PPPP/R2QKBNR b KQkq - 3 3
eval

The results gave the NNUE derived piece values and said it was black to move. It also had a Bucket list and picked 7 and why it picked that line, I don't know, but the end results was:
NNUE evaluation -0.04 (white side)
Final evaluation -0.05 (white side) [with scaled NNUE, ...]

When I pasted the complete eval output to an ai, I got an interesting description.
www.perplexity.ai/search/the-rise-in-chess-opening-popu-p0jRHESPT2GejwTPgHUaYg

Then I looked for other sources to see how a opening trend in popularity.
en.chessbase.com/support-kb/content/details/1319/Reference_search_in_ChessBase

Is there a way to sort the lichess openings by popularity? By using the a to e tsv files and making a bash script.
github.com/lichess-org/chess-openings
I asked an ai to make me a script that automatically downloads the tsv files in the same directory of the script. The results was

$ ./chess_openings.sh
Downloading e.tsv...
Downloading d.tsv...
Downloading c.tsv...
Downloading b.tsv...
Downloading a.tsv...
Analyzing chess openings...
11: Nimzo-Indian Defense: St. Petersburg Variation
10: Ruy Lopez: Morphy Defense, Modern Steinitz Defense
10: Catalan Opening: Closed
8: Sicilian Defense: Closed
8: Bogo-Indian Defense: Retreat Variation
7: French Defense: Winawer Variation, Advance Variation
7: Caro-Kann Defense
6: Ruy Lopez: Closed, Chigorin Defense
6: Dutch Defense: Classical Variation
5: Sicilian Defense: Najdorf Variation

When I started the script a second time it did not download the tsv files because mine were upto date.
$ ./chess_openings.sh
e.tsv is up to date.
d.tsv is up to date.
c.tsv is up to date.
b.tsv is up to date.
a.tsv is up to date.

I then manually paste the opening name given in this link ...
lichess.org/opening/tree

I'm trying to get the script to link it automatically.
Most openings will give you a slight edge to White, no more. If it was otherwise, they would become extinct by Darwinian selection!
However, certain openings are more popular at a certain rating level. If you search databases, you should set the rating range as you will get different statistics depending on whether you search GMs only, amateurs, or a mix. The Ruy Lopez Berlin and classical QGD, for example, are popular at the top whereas at the club you will see Sicilian and KID devotees.
@lizani said in #3:
> Most openings will give you a slight edge to White, no more. If it was otherwise, they would become extinct by Darwinian selection!

Most but not all. I would say that with both the Bowdler Attack and the Trompowsky, White gives away his/her opening advantage with the 2nd move.

> However, certain openings are more popular at a certain rating level. If you search databases, you should set the rating range as you will get different statistics depending on whether you search GMs only, amateurs, or a mix. The Ruy Lopez Berlin and classical QGD, for example, are popular at the top whereas at the club you will see Sicilian and KID devotees.

Not so sure about that. Sicilian is very popular at the top, KID also quite popular. Many variations of Ruy are popular also.

In general, I would guess that the higher up the rankings you go, the most popular openings tend to be the ones that lead to the most strategic richness in the middlegame, therefore more ways for a player to find an edge. I would expect it is not so easy to find opening surprises these days.
Should the chess opening link cover all the legal moves and there NNUE evals? I think it would be a plus.
lichess.org/opening
I counted only 12 popular chessboards from the above link and so there were still 8 other moves not mentioned.
I also discovered that the grob opening was not part of the top 12.

-------------------------------------------------------------------------------
On my linux terminal, I started by using the following commands:
./stockfish
go perft 1
position startpos moves a2a3
eval

The uci moves that are displaying "/pgn/" are assumed less popular because they were not part of the 12 chessboards.
(position startpos moves .... {eval} lichess.org/opening/ ....)

UCI {FINAL EVAL} /Name/move ... {Popularity on May 2024}
d2d4 {+0.16} /Queens_Pawn_Game/d4 ... {25.314}
e2e4 {+0.15} /Kings_Pawn_Game/e4 ... {58.627}
g1f3 {+0.10} /Zukertort_Opening/Nf3 ... {3.42}
e2e3 {+0.05} /Vant_Kruijs_Opening/e3 ... {2.005}
c2c4 {+0.02} /English_Opening/c4 ... {3.191}
g2g3 {+0.00} /Hungarian_Opening/g3 ... {1.664}
d2d3 {-0.02} /Mieses_Opening/d3 ... {0.916}
h2h3 {-0.02} /pgn/h3 ... /Clemenz_Opening/h3 ... {0.073}
a2a3 {-0.03} /pgn/a3 ... /Anderssens_Opening/a3 ... {0.099}
b2b3 {-0.03} /Nimzo-Larsen_Attack/b3 ... {1.661}
b1c3 {-0.03} /Van_Geet_Opening/Nc3 ... {0.55}
a2a4 {-0.04} /pgn/a4 ... /Ware_Opening/a4 ... {0.098}
c2c3 {-0.06} /Saragossa_Opening/c3 ... {0.307}
b1a3 {-0.06} /pgn/Na3 ... /Sodium_Attack/Na3 ... {0.01}
h2h4 {-0.07} /pgn/h4 ... /Kadas_Opening/h4 ... {0.102}
f2f4 {-0.16} /Bird_Opening/f4 ... {0.926}
b2b4 {-0.19} /Polish_Opening/b4 ... {0.42}
g1h3 {-0.25} /pgn/Nh3 ... /Amar_Opening/Nh3 ... {0.017}
f2f3 {-0.32} /pgn/f3 ... /Barnes_Opening/f3 ... {0.135}
g2g4 {-0.40} /pgn/g4 ... /Grob_Opening/g4 ... {0.31}

To get the opening names of the 8 that were not displayed, I manual typed: opening/pgn/ with the move, then lichess redirected me to the correct named link. I was then able to see the sloping popularity graphs and their values. Those 8 other links were much more time consuming to get. If only all legal moves were listed; it would have saved hours of work posting this info.

If I sort by popularity percentage, white's popular initial moves are ideally above one percent:
uci ____ cP ____ Popular % / Opening name / move
e2e4 {+0.15} 58.627 /Kings_Pawn_Game/e4
d2d4 {+0.16} 25.314 /Queens_Pawn_Game/d4
g1f3 {+0.10} 03.4200 /Zukertort_Opening/Nf3
c2c4 {+0.02} 03.1910 /English_Opening/c4

e2e3 {+0.05} 02.0050 /Vant_Kruijs_Opening/e3
g2g3 {+0.00} 01.6640 /Hungarian_Opening/g3
b2b3 {-0.03} 01.6610 /Nimzo-Larsen_Attack/b3
f2f4 {-0.16} 00.9260 /Bird_Opening/f4

d2d3 {-0.02} 00.9160 /Mieses_Opening/d3
b1c3 {-0.03} 00.5500 /Van_Geet_Opening/Nc3
b2b4 {-0.19} 00.4200 /Polish_Opening/b4
g2g4 {-0.40} 00.3100 /Grob_Opening/g4 (This one was not found in the top 12 openings)
c2c3 {-0.06} 00.3070 /Saragossa_Opening/c3 (Instead this one was in the top 12 openings)

The rest were not popular enough to be in the top 12 openings.

f2f3 {-0.32} 00.1350 /Barnes_Opening/f3
h2h4 {-0.07} 00.1020 /Kadas_Opening/h4
a2a3 {-0.03} 00.0990 /Anderssens_Opening/a3
a2a4 {-0.04} 00.0980 /Ware_Opening/a4
h2h3 {-0.02} 00.0730 /Clemenz_Opening/h3
g1h3 {-0.25} 00.0170 /Amar_Opening/Nh3
b1a3 {-0.06} 00.0100 /Sodium_Attack/Na3

I'm still building my bash scripts. In-between time enjoy what I got as results.
I thought you meant the time (human many games many events scale, time) slope of when a new wave of swarm testing at population level would examine a novelty (de facto in the population and events, even if resurrecting previously "refuted" in history), and then explore that bush** in some kind of "all the new rage" until dust settles and people find new "refutations" or evidence that it might just still be just playable, and onto the next wave of promised land.

So, what do you mean by slope. It seems a post interpreted it as sided imbalance of odds (white vs black sides).

I thought you wanted to look at human time, say year time resolution, evolution of the popularity. I think my own thoughts must have spilled over my reading of the title.

However, the question of the post, seems to be more about how to process the Lichess repository of openings sequences with names associated. One has to remember that the purpose of that repository, although dist angled from the opening explorer repository, is to create a structure for the unique naming at each use individual level exploring that decision tree, with knowledge of which decisions from the standard initial were taken (although some input positions do seem to output names, without having that).

The technical aspects. I do not know how to relate that exercise, which is interesting to share, if we were to try to do the same kind of thing, it is good that someone has gone through that trouble. But maybe you could explain the title better, and its relation to the last statement of the op.

> Popularity slope of chess openings

> Chess openings gain traction
> Then I looked for other sources to see how a opening trend in popularity.
> surge in popularity

Those 3 phrases led me to believe it was the human popularity time series were interested in.

But then you start talking about SF. and the named opening sequence repository, which is not at all about putting only the most current population named opening sequences, but providing a complete covering of unique names for all the lichess games bad or elite, that do have some name in some other dispersed sources of chess database of opening knowledge value.

The opening explorer per game position instance (as input) is where the popularity is.

Maybe I thought it was not yet done, this question of data analysis of dynamic trends or just time series of popularity on the Y axis, and some time on the X-axis.. for a given "all the new ragte" or novelty positoin under some group scrutinty at high level and perhaps propagating in lower tiers or bands of ratings

But maybe that is already researched, and you are seeking alternate factors or reasons, that SF analysis might suggest as hypotheses for its models of what matters on the chessboard position information.. for any of opening phase named prefix sequence from that repository.

You would like to use SF to scan for potential for past waves or future waves? or swarm or surge.

Maybe compare the time series of popularities with the hypotheses you might pull out of SF per position anallyses?

But do we have some clarity at data analyssis level, some confirmation that we could actually discern such convergence of attention at many player pairings level, over some human event time window, and up and plateay or peak, and then a drop to background playable frequence of visit level.. Where is such prior work. And maybe then there is the quesiton of many initial seqeunces arriving at the same new or reassessed as new position of interest.

So, did I fill in the gaps between the SF stuff, the .tsv files of named seququences, and the excerpts of your experimental output computer traces? sorry for the wording. I am not sure what those are, if raw or not, or if you are sharing new candiates by SF for future new trends? I know I am missing something. So, you can disregard my attempts to make a story of what I saw, and perhaps present your own design of experiement. The questions. What is known already, that I might be oblivious to, and perhaps how the posts above fit in that research idea of yours. It might just be a part of something, and you are taclking that end of the question.

> Is there a way to sort the lichess openings by popularity?

So now that is the other end. We do not have that at dataset level, as I mentioned we only have an instance query tool.
Also, I am not sure that the question is well posed. As what do you mean by openings? The sequences themselves, the paths of them till the last position that would have the "opning" ID or name you have in mind? or is the "opening" that last positoin itself. Which might be reached more than one path from the initial standard.
I assume a "novelty" might be a new position from a new move, being exploreed, or rediscovered at high level.

So there might be more than one questino in your question, for each possible meaning of "opening".

But that still leaves the title question of slope?

Did you mean: if we had all the openings (however defined) popularity data at some given band of ratings or tier in the event sequence time scale, perhaps month, or year.. There might be fluctuations, and maybe the demographics and human diversity at each pairing events or player pairs, would not have such a grouping of opening steering preference, and then what kind of statistical signal in that would be worthy of the word trend or "slope" positive signal that a trend is surging..

And then the relation to your other side of the research ideas, that of using SF as predictor of which positions in the set of "opening" as you might allow in your statement intent, are likely to be a trend candiate. I don't think it is a bad idea, although it is only a guess or question, that SF might provide some bias to filter among the many fluctuating up slope candidates, to dissmiss those more often that are low on SF radar, and consider more likely those apparent up surges that SF would have favored.

But I think I am missing the basic time series data analysis of that phenomenon you might be thinking well known or characterized. Can anyone help the op on that regard, and me also. I am curious about first characterizing the notions of the op. It would help make well formed questions of statistical or mathematical nature that might harness the scripting struggle and SF vetted hypotheses above.

sorry for the novel. I can delete. now tired. maybe some "AI" could shrink this and not denature it.

** Bush: always a bush at some chess ply depth mask above turn by turn incompressible trivial mask, another side-hypothesis, maybe needing transversal scope too, not just depth, to say it is a bush at that scope of breadth.
The problem is popularity is a bad term.
It includes people so weak you would not play them and people so strong they would not play you!
@fwhchess said in #4:
> Most but not all. I would say that with both the Bowdler Attack and the Trompowsky, White gives away his/her opening advantage with the 2nd move.

If this was the case and widely known or shared, or some time in the past, possibly during the time windows of up-slopes, would it not make then down-slope at the levels that can assess such things or might evaluate among each others that enough is enough this will not do for us, as promising position to work from, in our competitive goals, at many games on records.

Or is this about some openings staying popular because a prefix of it is popular, and white does not have that level of steering control to stay out of such territory?

I think people are forget the human time scales, historical discovery process at group level, and evolution at cultural or population level. The database might have lots of dynamic information, but we are stuck at position single game single player query level.

Have there been serious population data studies of chess database beyond rating statistics? Using the rating as dependent variables, perhaps, but not as the target question, necessarily. Of which openings are the best, but how do swarming populations in various rating strates/event tiers (or other things not in my scope of experience, having come to chess by the sparse hobby trajectory, without much education about the culture of it, the past etc.., that was 5 years ago).

I have accumulated so far an impression of lack of curiosity about the discovery processes at many scales. Everyone being so impatient to win or get their rating into another zone, that chess seems reduced to what the lobby seems to be hammering in the long run of going there everyday, and not having some extension to block all the modules that contribute to that impatient obsession. ooops... I am over it for myself, I am using such extension. but that does not mean my questions are not still obscure to figure out. Perhaps the database traditions are themselves a bit osbcure. I might be having had a non-chess experience or exposure to analoguous profusion of data being stored, but then even when researchers were still at developping curating practices, it was all above board.. Scientific illusion I might have assumed might transpose in chess..
A Relative Strength Index (RSI) is typically used for price movements in financial markets, but what if it was adapted for chess openings, could it potentially provide insights into opening trends?

RSI for Chess Openings = 100 - [100 / (1 + RS)]
Where RS = (Average frequency of the move in recent games) / (Average frequency of the move in older games)

The trend for the following openings are still on the rise. This is what I meant by a slope: Like a hill slope.
I just did not have the correct terminology to describe what I saw on the graph.

Scandinavian_Defense/e4_d5
Caro-Kann_Defense/e4_c6
Pirc_Defense/e4_d6
Modern_Defense/e4_g6

Englund_Gambit/d4_e5
Queens_Pawn_Game_Modern_Defense/d4_g6
Benoni_Defense_Old_Benoni/d4_c5
Queens_Pawn_Game/d4_d6

Englund_Gambit/d4_e5
Queens_Pawn_Game/d4_c6
Zukertort_Opening_Ross_Gambit/Nf3_e5
Zukertort_Opening_Slav_Invitation/Nf3_c6

Queens_Pawn_Game/d4_d6
Queens_Pawn_Game/d4_c6

English_Opening_Anglo-Scandinavian_Defense/c4_d5
English_Opening_Caro-Kann_Defensive_System/c4_c6

Vant_Kruijs_Opening/e3_c6

Nimzo-Larsen_Attack/b3
Nimzo-Larsen_Attack_Classical_Variation/b3_d5
Nimzo-Larsen_Attack_Indian_Variation/b3_Nf6
Nimzo-Larsen_Attack/b3_e6
Nimzo-Larsen_Attack_English_Variation/b3_c5
Nimzo-Larsen_Attack/b3_g6
Nimzo-Larsen_Attack_Modern_Variation/b3_e5
Nimzo-Larsen_Attack/b3_c6
Nimzo-Larsen_Attack/b3_d6
Nimzo-Larsen_Attack_Symmetrical_Variation/b3_b6
Nimzo-Larsen_Attack_Dutch_Variation/b3_f5
Nimzo-Larsen_Attack/b3_a5

Move 1 is complete and the above openings are the only ones I saw that have risen in popularity since 2017.
Adding a cP value to each of them would be like giving them a second opinion.

This topic has been archived and can no longer be replied to.