Most of low-intrmediate and intermediate chess players try to learn as many openings as possible to develop their opening repertoir. It is very important to know at this stage which openings are popular and widely used by exprienced players. One approach is to start with individual moves, building the whole opening tree. Another one is to memorize hundreds of individual openings. I wanted to create some classification i between.
WITH CLASSIFICATION I PROPOSE, 96% OF FIRST 3 WHITE MOVES FALL IN ONE OF 16 CATEGORIES
Firstly, I decided to classify positions rather than moves. This allows us to reduce the number of combinations and not to take in account the order of moves. Secondly, I decided to take first three moves, because in these moves players have some freedom to choose their preferred opening, later on they must response to the oppnent's play. Thirdly, I decided to study positions of white and black separatedly, in other words, to classify positions and moves of white, discarding positions and moves of black. This is, of course, a very big simplification; however it allows to reduce the number of categories significantly, and, as I said, in first moves the players play more or less independently. It's up to you to decide whether the classification is useful.
DEFINITIONS. STARTER MOVES are 20 moves available from the starter position, they are a3, a4, b3, b4, c3, c4, d3, d4, e3, e4, f3, f4, g3, g4, h3, h4, Na3, Nc3, Nf3, Nh3. I will call CENTRAL MOVES a subset of 10 of them, namely c3, c4, d3, d4, e3, e4, f3, f4, Nc3, Nf3, basing on their importance and frequncy. By the term FEATURE I understand one non-starter move, preceded by one starter move, or some specific position requiring two starter moves in specific order. In most cases, first three moves either have no features (containing only three starter moves) or one feature and one starter move. For example Bb4 is a feature. If white made the move Bb4 in the first three moves, he or she should spend one of three moves for e3 or e4, one move for Bb4 and (most probably) one other starter move. This turns feature Bb4 into a category. (As I mentioned, I discard order of moves, so it can be 1.e4 2.Bb4 or 1.e4 3.Bb4 or 2.e4 3.Bb4) The games when two features are developed within first three moves are possible (say, 1.e4 2.Bb4 3.e5, giving Bb4 feature and e5 feature), but either they are left unclassified because of infrequency or a second feature is explicitly included in the list of allowed moves for a given category. I treat also some combinations of two starter moves as features, say c4 Nc3, because they must be done in specific oreder and not independently.
METHODS. I created a Python script for classification. I used a game database of March 2017 from lichess, selected from there games where both players have rating not below 2000, main time is not below 300 seconds, games contain not less than 9 plies and game scores are given. This selection gave me 125445 games which is suitable for the first test of the approach.
Now enough with introduction, let us see what I got.
CATEGORY 1. CENTRAL QUIET GAMES (22.35%). Formal definition: all three first white moves must belong to the central moves (see definitions) and no white piece should be taken by black within first three moves and white does not develop Nc3+c4 feature or Nf3+f4 feature. Very wide and varying category, however, it contains some very frequent positions. Position 1a, d4+Nf3+c4 in some order, gives 5.6% of all games. Position 1b, d4+Nc3+e4, gives 4.56% of all games. Position 1c, e4+Nf3+Nc3, gives 2.6% of all games. Other 55 positions belonging to this category were found in the data, they give 9.58% of all games.
CATEGORY 2. CENTRAL SHARP (14.03%) Formal definition: all three first white moves must belong to the central moves, at least one white piece should be taken by black within first three moves and white does not develop Nc3+c4 or Nf3+f4 feature. This category has a leading position 2a: white plays e4+Nf3+d4 in some order, while black takes d4 pawn (position of other black pieces is not counted, as usual). This position gives as much as 9.36% of all games. Other 73 positions of white pieces beloging to this category found in the database give 4.66% of all games.
CATEGORY 3. QUEEN RIDER, f4+Nf3 feature (11.90%) Formal definition: first three moves must contain c4 and Nf3, all three first white moves must belong to the set [starter moves]+[c4, Nf3, cxd5]. This category also has a leader: position 3a c4+Nc3+d4 in some order with no takings gives 8.42% of all games. Other 29 positions found in the database belonging to this category give 3.47%.
CATEGORY 4. KINGSIDE BISHOP, features Bc4 or Bb5 or Bd3 (10.70%) Formal definition: first three white moves contain Bc4, Bb5 or Bd3 and all three white moves must belong to the set [starter moves]+[Bc4, Bb5, Bd3]. This category has two sharp leading positions. Position 4a, e4+Nf3+Bb5 in some order, gives 4.27%. Position 4b, e4+Nf3+Bc4 in some order, gives 3.77%. Other 60 positions give 2.66%
CATEGORY 5. SCANDINAVIAN-LIKE, feature exd5 (6.67%). Strict definition: first three white moves must include exd5 move and all three white moves belong to the set [exd5, Bb5, dxc6, Bc4]+starter moves. This category does not include only strict scadinavian countergambit, it just says that white takes d5 pawn within first three moves. Say, a game 1. e4 d6 2. d4 d5 3. exd5 falls into this category. This subcategory does not have very sharp leading positions. Totally 35 positions found in the database.
CATEGORY 6. QUEENSIDE BISHOP, features Bg5 or Bf4 (6.24%). Strict definition: first three moves must include Bg5 or Bf4, and all three initial moves belong to the set [Bg5, Bf4, Bh4, Bxf6]+[starter_moves]. This subcategory has two sharp leading positions. Position 6a d4+Nf3+Bf4 gives 2.01%. Position 6b d4+Bf4+e3 gives 1.75%. Other 53 positions give 2.48%
CATEGORY 7. FIANCETTO (5.57%). Strict definition: white must play either combination g3+Nf3 or combination g3+Bg2 in their first three moves, and all first three moves must belong to the set [g3, Nf3, Bg2]+starter moves. This category has only moderate leading positions, totally 41 positions in the database.
CATEGORY 8. KINGSIDE PROMOTION, e5 feature, (5.00%). Strict definition: first three moves must include e5, all three first moves must belong to [e5]+[starter moves] set. This category was a moderate surprise for me. Position 8a e4+e5+d4 gives 3.08%, other 25 positios in this category give 1.91%.
=== The categories above give 82.45% accumulated ==
CATEGORY 9. SIDE GAMES (2.96%). Strict definition: all three initial moves must belong to the starter moves, but the game explicitly does not fall into categories: central quiet, central sharp, fiancetto, queen rider, king rider. Obviously, one of side moves must be done, (otherwise it will fall into central games, queen rider or king rider categorie), but not combination g3+Nf3 (it would fall into fiancetto). All three moves are starter moves, so no feaure was developed. It is a very broad category with 259 positions and no sharp leading positions, the moves like a3, b3, h3, b4 are common.
CATEGORY 10. KING RIDER (2.42%) Strict definition: combination f4+Nf3 is developed within first three moves and all three moves must be starter moves.
CATEGORY 11. THREE-QUARTER-INDIAN, Nbd2 feature (2.34%). Strict definition: Nbd2 must be done within first three moves and all first three white moves must belong to the set [Nbd2]+[starter moves]. Another surprise for me.
CATEGORY 12. QUEENSIDE PROMOTION, feature d5 (2.12%). Strict definition: d5 must be played within first three white moves and all three first moves must belong to the set [d5]+[starter moves].
CATEGORY 13. QUEENSIDE FIANCETTO, feature Bb2 (1.54%). Strict definition: Bb2 must be played within first three moves and all three moves must belong to Bb2, Bxe5, Bxf6 + starter moves.
CATEGORY 14. QUEENSIDE ATTACKED, feature cxd5 without knight (1.06%). Strict definition: cxd5 must be played within first three moves, all moves must belong to the set cxd5 + starter moves and the game must not explicitly fall into queen rider category. The latter condition excludes Nc3 move.
CATEGORY 15. KINGSIDE ATTAKED, feature dxe5 (1.05%). Strict definition: dxe5 must be played within first three moves, all first three moves must belong to dxe5 + starter moves set.
CATEGORY 16. KNIGHT ATTACK, feature Nxe5 (0.68%). Strict definition: Nxe5 must be played within first three moves, and all first three moves must belong to Nxe5 + starter moves set.
UNCLASSIFIED (3.38%) The classification could be continued either by allowing combinations of the above non-initial moves, or by adding more non-initial moves. The former approach will allow to classify games like e4+e5+Bc4. Polish opening g4+Bg2 is not uncommon among unclassified games, it does not fall into side game catgory because of Bg2 requirement, also it does not fall into fiancetto category because of g3 requirement. The next non-initial move is dxc5, giving no more than 0.2%. So the classification stops here.
WITH CLASSIFICATION I PROPOSE, 96% OF FIRST 3 WHITE MOVES FALL IN ONE OF 16 CATEGORIES
Firstly, I decided to classify positions rather than moves. This allows us to reduce the number of combinations and not to take in account the order of moves. Secondly, I decided to take first three moves, because in these moves players have some freedom to choose their preferred opening, later on they must response to the oppnent's play. Thirdly, I decided to study positions of white and black separatedly, in other words, to classify positions and moves of white, discarding positions and moves of black. This is, of course, a very big simplification; however it allows to reduce the number of categories significantly, and, as I said, in first moves the players play more or less independently. It's up to you to decide whether the classification is useful.
DEFINITIONS. STARTER MOVES are 20 moves available from the starter position, they are a3, a4, b3, b4, c3, c4, d3, d4, e3, e4, f3, f4, g3, g4, h3, h4, Na3, Nc3, Nf3, Nh3. I will call CENTRAL MOVES a subset of 10 of them, namely c3, c4, d3, d4, e3, e4, f3, f4, Nc3, Nf3, basing on their importance and frequncy. By the term FEATURE I understand one non-starter move, preceded by one starter move, or some specific position requiring two starter moves in specific order. In most cases, first three moves either have no features (containing only three starter moves) or one feature and one starter move. For example Bb4 is a feature. If white made the move Bb4 in the first three moves, he or she should spend one of three moves for e3 or e4, one move for Bb4 and (most probably) one other starter move. This turns feature Bb4 into a category. (As I mentioned, I discard order of moves, so it can be 1.e4 2.Bb4 or 1.e4 3.Bb4 or 2.e4 3.Bb4) The games when two features are developed within first three moves are possible (say, 1.e4 2.Bb4 3.e5, giving Bb4 feature and e5 feature), but either they are left unclassified because of infrequency or a second feature is explicitly included in the list of allowed moves for a given category. I treat also some combinations of two starter moves as features, say c4 Nc3, because they must be done in specific oreder and not independently.
METHODS. I created a Python script for classification. I used a game database of March 2017 from lichess, selected from there games where both players have rating not below 2000, main time is not below 300 seconds, games contain not less than 9 plies and game scores are given. This selection gave me 125445 games which is suitable for the first test of the approach.
Now enough with introduction, let us see what I got.
CATEGORY 1. CENTRAL QUIET GAMES (22.35%). Formal definition: all three first white moves must belong to the central moves (see definitions) and no white piece should be taken by black within first three moves and white does not develop Nc3+c4 feature or Nf3+f4 feature. Very wide and varying category, however, it contains some very frequent positions. Position 1a, d4+Nf3+c4 in some order, gives 5.6% of all games. Position 1b, d4+Nc3+e4, gives 4.56% of all games. Position 1c, e4+Nf3+Nc3, gives 2.6% of all games. Other 55 positions belonging to this category were found in the data, they give 9.58% of all games.
CATEGORY 2. CENTRAL SHARP (14.03%) Formal definition: all three first white moves must belong to the central moves, at least one white piece should be taken by black within first three moves and white does not develop Nc3+c4 or Nf3+f4 feature. This category has a leading position 2a: white plays e4+Nf3+d4 in some order, while black takes d4 pawn (position of other black pieces is not counted, as usual). This position gives as much as 9.36% of all games. Other 73 positions of white pieces beloging to this category found in the database give 4.66% of all games.
CATEGORY 3. QUEEN RIDER, f4+Nf3 feature (11.90%) Formal definition: first three moves must contain c4 and Nf3, all three first white moves must belong to the set [starter moves]+[c4, Nf3, cxd5]. This category also has a leader: position 3a c4+Nc3+d4 in some order with no takings gives 8.42% of all games. Other 29 positions found in the database belonging to this category give 3.47%.
CATEGORY 4. KINGSIDE BISHOP, features Bc4 or Bb5 or Bd3 (10.70%) Formal definition: first three white moves contain Bc4, Bb5 or Bd3 and all three white moves must belong to the set [starter moves]+[Bc4, Bb5, Bd3]. This category has two sharp leading positions. Position 4a, e4+Nf3+Bb5 in some order, gives 4.27%. Position 4b, e4+Nf3+Bc4 in some order, gives 3.77%. Other 60 positions give 2.66%
CATEGORY 5. SCANDINAVIAN-LIKE, feature exd5 (6.67%). Strict definition: first three white moves must include exd5 move and all three white moves belong to the set [exd5, Bb5, dxc6, Bc4]+starter moves. This category does not include only strict scadinavian countergambit, it just says that white takes d5 pawn within first three moves. Say, a game 1. e4 d6 2. d4 d5 3. exd5 falls into this category. This subcategory does not have very sharp leading positions. Totally 35 positions found in the database.
CATEGORY 6. QUEENSIDE BISHOP, features Bg5 or Bf4 (6.24%). Strict definition: first three moves must include Bg5 or Bf4, and all three initial moves belong to the set [Bg5, Bf4, Bh4, Bxf6]+[starter_moves]. This subcategory has two sharp leading positions. Position 6a d4+Nf3+Bf4 gives 2.01%. Position 6b d4+Bf4+e3 gives 1.75%. Other 53 positions give 2.48%
CATEGORY 7. FIANCETTO (5.57%). Strict definition: white must play either combination g3+Nf3 or combination g3+Bg2 in their first three moves, and all first three moves must belong to the set [g3, Nf3, Bg2]+starter moves. This category has only moderate leading positions, totally 41 positions in the database.
CATEGORY 8. KINGSIDE PROMOTION, e5 feature, (5.00%). Strict definition: first three moves must include e5, all three first moves must belong to [e5]+[starter moves] set. This category was a moderate surprise for me. Position 8a e4+e5+d4 gives 3.08%, other 25 positios in this category give 1.91%.
=== The categories above give 82.45% accumulated ==
CATEGORY 9. SIDE GAMES (2.96%). Strict definition: all three initial moves must belong to the starter moves, but the game explicitly does not fall into categories: central quiet, central sharp, fiancetto, queen rider, king rider. Obviously, one of side moves must be done, (otherwise it will fall into central games, queen rider or king rider categorie), but not combination g3+Nf3 (it would fall into fiancetto). All three moves are starter moves, so no feaure was developed. It is a very broad category with 259 positions and no sharp leading positions, the moves like a3, b3, h3, b4 are common.
CATEGORY 10. KING RIDER (2.42%) Strict definition: combination f4+Nf3 is developed within first three moves and all three moves must be starter moves.
CATEGORY 11. THREE-QUARTER-INDIAN, Nbd2 feature (2.34%). Strict definition: Nbd2 must be done within first three moves and all first three white moves must belong to the set [Nbd2]+[starter moves]. Another surprise for me.
CATEGORY 12. QUEENSIDE PROMOTION, feature d5 (2.12%). Strict definition: d5 must be played within first three white moves and all three first moves must belong to the set [d5]+[starter moves].
CATEGORY 13. QUEENSIDE FIANCETTO, feature Bb2 (1.54%). Strict definition: Bb2 must be played within first three moves and all three moves must belong to Bb2, Bxe5, Bxf6 + starter moves.
CATEGORY 14. QUEENSIDE ATTACKED, feature cxd5 without knight (1.06%). Strict definition: cxd5 must be played within first three moves, all moves must belong to the set cxd5 + starter moves and the game must not explicitly fall into queen rider category. The latter condition excludes Nc3 move.
CATEGORY 15. KINGSIDE ATTAKED, feature dxe5 (1.05%). Strict definition: dxe5 must be played within first three moves, all first three moves must belong to dxe5 + starter moves set.
CATEGORY 16. KNIGHT ATTACK, feature Nxe5 (0.68%). Strict definition: Nxe5 must be played within first three moves, and all first three moves must belong to Nxe5 + starter moves set.
UNCLASSIFIED (3.38%) The classification could be continued either by allowing combinations of the above non-initial moves, or by adding more non-initial moves. The former approach will allow to classify games like e4+e5+Bc4. Polish opening g4+Bg2 is not uncommon among unclassified games, it does not fall into side game catgory because of Bg2 requirement, also it does not fall into fiancetto category because of g3 requirement. The next non-initial move is dxc5, giving no more than 0.2%. So the classification stops here.