3. # Which endgames should you practice? A statistical analysis

tldr: Given that relatively few amateur games reach technical endgames, your study time might be better invested elsewhere. But if you're going to study endgames, focus on rook endgames above all other types.

Background
----------

Some endgame books (e.g., Jesus de la Villa) recommend focusing on rook endgames, because they show up often in top-level games. This is interesting, but could be of limited interest to amateur players, whose games often play out very differently than GM games. Other endgame books (e.g., Jeremy Silman) present the Philidor and Lucena position, but label most other rook endgame material as "expert-level content". Unfortunately, Silman doesn't really ofter a justification for that choice, and I'm not sure the endgames he presents first are really most "useful" or frequent.

So what should amateurs like us study? To answer this question, it seems reasonable to ask which types of endgames show up in amateur games most often.

Data
----

database.lichess.org publishes all the games played in PGN files. The September 2017 file includes:

12,564,109 games in total
2,735,642 classical games

Protocol
--------

The criteria I used to identify and extract endgame positions:

* Classical games only
* The game needs to include at least 40 moves by each player
* The position needs to have stayed on the board for at least 4 half-moves
* Maximum amount of material per player: 13 (Q=10,R=5,B=3,N=3,P=1)
* Maximum of 3 pawns per side
* No player has more than 2 pieces on the board (excluding king and pawns)
* No overwhelming material advantage (max difference: 4)

Results
-------

Two main results: (1) share of games that reach an endgame, (2) types of endgame reached.

1)

389,835 classical games reached at least one technical endgame position (14% of classical games).

To me, this number seems low, especially since many of those endgames are "won" endgames, with as much as 4pts up (e.g., bishop and a pawn). The share of games that require technical endgame skills to be won is pretty small.

2)

In amateur games, Rook endgames are absolutely dominant! If my statistical analysis is correct, amateurs should spend most of their endgame study "budget" looking at rooks, and it's not even close.

In the subset of games that reach a technical endgame,

67% of games include an endgame position with rook(s)
38% of games include an endgame position with bishop(s) (with or without rook(s))
18% of games include an endgame position with bishop(s) (without rooks)
31% of games include an endgame position with knight(s) (with or without rook(s))
15% of games include an endgame position with knight(s) (without rooks)
Only 37% of all endgame positions in my database do not include a rook.

Here are the first 25 most common endgame positions, with the % of games in which they are found (p+ means 2 or more pawns).

rp+ vs. rp+ -- 14.8%
rp vs. rp+ -- 14.1%
p vs. p+ -- 11.3%
p+ vs. p+ -- 10.9%
r vs. rp+ -- 7.1%
r vs. rp -- 6.1%
brp+ vs. rp+ -- 5.4%
bp+ vs. p+ -- 5.1%
p vs. p -- 4.9%
rp vs. rp -- 4.7%
p+ vs. rp+ -- 4.4%
nrp+ vs. rp+ -- 4.3%
np+ vs. p+ -- 4%
bp+ vs. rp+ -- 3.7%
p+ vs. qp+ -- 3.2%
np+ vs. rp+ -- 3%
p vs. rp+ -- 3%
bp+ vs. bp+ -- 2.8%
bp+ vs. np+ -- 2.7%
p+ vs. rp -- 2.5%
r vs. r -- 2.4%
p+ vs. r -- 2.3%
qp+ vs. qp+ -- 2.3%
p vs. qp+ -- 2.2%
bp vs. p+ -- 2.1%

hat tip to @BigGreenShrek who suggested I do this and gave me a lot of the ideas in this post.

Capablanca already said: do not study openings, study rook endings.

I find your criteria a bit limiting.
" The game needs to include at least 40 moves by each player" Some openings like the Spanish Ruy Lopez exchange variation or the Spanish Ruy Lopez Berlin Defence reach an endgame quickly and thus are excluded.
"The position needs to have stayed on the board for at least 4 half-moves" This removes all endgames where players abandon or agree on a draw because they know the outcome.
"Maximum amount of material per player: 13 (Q=10,R=5,B=3,N=3,P=1) " This excludes endgames like RB+6p vs RN+5p.
"Maximum of 3 pawns per side" This excludes the important endgames R+5p vs R+4p (won) and R+4p vs R+3p (draw) and also endgames like B+4p vs N+3p etc.
" No overwhelming material advantage (max difference: 4)" this excludes endgames like Q vs R.

"389,835 classical games reached at least one technical endgame position (14% of classical games)."
If you relax the above criteria then this goes up.

What I would like to see is a breakdown in 30 subclasses.
All games start as 32 piece games.
After the first exchange 31, then 30, then 29, then 28.
When 7 is reached, it is in the Lomonosov table bases.
When 6 is reached, it is in the table bases here on lichess.
When 2 is reached it a draw.
I would call all 9 or less endgame.

Btw, if you stumble over „100 endgames you must know“ you can compare the results with that gathered by de la Villa. He did something similar. Thx.

Its worth keeping in mind that to understand more complex endings you must first understand the basics. So to truly understand RpvRp you should study RpVR as well.

@Sarg0n I did see the de la Villa table. His results are pretty similar to mine! Rooks are dominant there as well.

Shouldn't the (detailed) percentages add up to 100%?

Similiarly,
"67% of games include an endgame position with rook(s)"
and
"Only 37% of all endgame positions in my database do not include a rook."
(5 lines later)
contradict each other for sure (typo and it's either 67/33 or 63/37?).

Good points @tpr

I can't change the 40 half-moves rule (it was hard coded in the parsing script, and it took my computer 2 days to run it).

But when I define endgames as those with a maximum of 9 pieces per side and no more than 5 pts of material difference, the statistics are:

R: 77% of games
Q: 43%
B: 48%
N: 49%
B (no R): 18%
N (no R): 21

