Subject: AlphaZero's Human-like Play Against Stockfish

2021-07-01 05:19:32
AlphaZero’s Human-like Play Against Stockfish

For those unaware, Stockfish is currently one of the most popular chess engines for analysis. Chances are that the “chess computer” on your phone, or computer, is likely Stockfish if a different chess engine wasn’t purchased. Chess computers have come a long way since the early days of Deep Blue in the 1990s and chess computers even decades before then. Chess computers have improved so much, in large part, due to technology improving at a shocking rate. It is mind boggling to realize just how “deep” engines like Stockfish can “calculate.” The very same Stockfish that played against AlphaZero, in the 2017 match, was calculating approximately 120 million chess positions per second!

How can Stockfish calculate 120 million chess positions per second, spend an average of two entire minutes per move and still not “solve” chess? The answer is simple, yet just as startling. Chess has that many possibilities! 120 million chess positions do indeed sound daunting, but it is a drop in the bucket to how vast chess is.

Claude Shannon was an American Mathematician born just outside of Petoskey, Michigan (His childhood home was in Gaylord, Michigan). Most know Shannon for his cryptology work during WWII, or for his work on Information Theory (beginning with his 1948 “A Mathematical Theory of Communication” article written in two parts). However, chess-enthusiasts perhaps know Shannon best for his mathematical work on chess.

In 1950, Shannon published a paper which illustrated the “game-tree complexity of chess.” One mathematical calculation to demonstrate how many possible chess “routes” there are is to calculate all of the possibilities after each ply (half a move) or after each move. A chess ply is one side moving and a “move” is both sides completing the “turn.” 1. e4 would be one “ply”, but 1. e4 c5 (Sicilian Defense) would be two “plies” or one “move.”

In chess, 1 ply gives 20 possibilities. 1. Na3, 1. Nc3, 1. Nf3, 1. Nh3, every pawn moving one space forward (like 1. e3) and every pawn moving two spaces forward (like 1. e4). However, 2 plies deep (one move) has 400 possibilities. We can see how many branches exist and how large the possibilities are.

3 plies gives 8,902 possibilities, 4 plies gives 197,281 possibilities, 5 plies gives 4,865,609 and 6 plies gives 119,060,324 possibilities. 6 plies would be 3 moves into the chess game and this is almost 120 million possible games already! This means that the super-calculating of Stockfish can “see” every option through move 3 in a single second of time, but the ever-increasing options of branching becomes even more overwhelming as the game proceeds.

By the ply 10 (5 moves deep into the chess game), there are 69,352,859,712,417 possible games! More staggering is the thought that the average chess game lasts 40 moves (80 plies). Even if a computer could calculate every possibility (nowhere near practical with the technology currently available), just the sheer amount of storage required is laughable; even if we had all the options calculated, there is no room to store all of this!

AlphaZero has been producing exciting results because it doesn’t calculate exactly the same way Stockfish does. AlphaZero was basically just programmed the rules of chess and then it “taught itself” through a process of “machine-learning” where it plays against itself many times (likely millions of games!) and then adapts its play-style via trial and error. One fascinating element of the play AlphaZero produces is how “human-like” it plays compared to previous chess engines. AlphaZero seems to be realizing long-term positional advantages that previous computers did not come up with and AlphaZero appears to be more inclined to play exchange sacrifices (and similar) for long-term compensation, rather than simply tactics. This play-style is far more “human-like” than anything we’ve seen previously. Here is one such game which resonated with me upon first seeing it.

This game is “Game 3” of the 2017 event AlphaZero versus Stockfish. For this game, AlphaZero is playing as the White pieces and Stockfish is playing as the Black pieces.

1. Nf3 Nf6 2. c4 b6 3. d4 e6 4. g3 Ba6 Chess computers and chess Grandmasters don’t always agree on opening theory, but AlphaZero’s opening choices came across as surprisingly “human-like” as here we have another Queen’s Indian position that has been played by titled (human) players many times before. 4…Ba6 is the Nimzowitsch Variation and is a solid way for Stockfish to play against White’s setup. In other scenarios (perhaps in-part due to this loss), Stockfish tried other lines including 4…Bb7. Nevertheless, AlphaZero seemed to be the stronger chess player.

5. Qc2 c5 6. d5 exd5 This is a positional pawn sacrifice that is thematic of the Queen’s Indian Opening. White seems to have enough positional compensation for the material with exact play. I used an opening database and filtered for only games with chess players rated over 2000 rating; this exact position came up many times and White scores slightly better percentage-wise.

In compensation for the d-pawn, White may get d-file pressure and Black will need to spend time to safely take the d5 pawn (such as relocating the Bishop back to the long diagonal, in which it originally did not develop).

A key tactic to note here is that 7. cxd5 Nxd5?? loses right away to 8. Qe4+ and White wins material (either wins the Knight on d5 with the fork or 8…Ne7?? 9. Qxa8 snags the trapped Rook in the corner).

7. cxd5 Bb7 8. Bg2 Nxd5 Capturing the pawn on d5 is now safe.

9. O-O Nc6 10. Rd1 Be7 This position is still all opening theory and has been played many times. Even though White scores slightly better than Black, statistically-speaking, this opening has a high draw margin according to my database.

It is valuable to highlight that 10…Be7 does not blunder the d5 piece. If 11. Rxd5?? then 11…Nb4 tactically holds everything together.

11. Qf5 Nf6 12. e4 g6 13. Qf4 O-O 14. e5 Nh5 15. Qg4 Re8 Considering the fact that AlphaZero was not programmed to play any specific openings, it is remarkable how similar its opening choices seem to be in alignment with Grandmaster-level opening theory. 15. Qg4 is still all “book” and it isn’t until Stockfish playing 15…Re8 that we vary (15…Qb8 is strange-looking, but perhaps better to keep watch on the e5 pawn).

I have one such game from 2009 (October 23rd) where the position at 15. Qg4 was reached by two well known chess players. GM Anish Giri played the White pieces and GM Judit Polgar played the Black pieces in this casual game online. Instead of the lines given, Polgar continued with 15…Ng7 16. Nc3 Ne6 and the game was drawn by agreement after 36 moves.

IM Daniel Rensch found interesting that Stockfish didn’t go for the popular 15…d5 (as played by GM Ivanchuk and many other Grandmasters).

16. Nc3 Qb8 17. Nd5 Bf8 17. Nd5 kicks the Bishop back, unless Black wants to give up the Bishop Pair. Here 17. Nd5 works because 17…Nxe5?? losses to 18. Nxe5 Qxe5 19. Nxe7+ Rxe7 (or…Qxe7) 20. Bxb7 wins material for White.

18. Bf4! Qc8 18. Bf4! highlights the dark-square weaknesses that Black (Stockfish) has around their King. The tempting 18…Nxf4 fails to 19. Nf6+ and ignoring the f4 Bishop allows discovered attacking ideas by pawn to e6 (and the f4 Bishop attacks the Black Queen on b8). Due to this, Stockfish played 18…Qc8 to sidestep this discovered attack before it happens.

The next several moves demonstrate the positional technique AlphaZero has as it develops and makes use of the weak dark-square color-complex.

19. h3 Ne7 20. Ne3 Bc6 21. Rd6! Ng7 22. Rf6! I love this maneuver because it looks impossible to pull off successfully, but the idea is actually pretty simple. The Rook to d6 and then f6 appears clumsy at first site because of it being right in the heart of Black’s position, but this is successfully making use of the weakened dark-squares (such as d6 and f6) and this Rook on f6 is paradoxically difficult to dislodge.

Qb7 23. Bh6 Nd5 24. Nxd5 Bxd5 25. Rd1 Ne6 Only now does Stockfish begin to slowly realize how deep the problems are (primarily due to the dark-squares) and admit that this position is roughly +1.00 evaluation in favor of AlphaZero (White).

26. Bxf8 Rxf8 Exchanging off the dark-squared Bishops actually doesn’t solve Black’s dark-square problems permanently. White still has ways to make progress and the dark squares near the King are still positionally weak.

White can make progress in a few ways: one of them is simply preparing h4 and h5 to open the h-file; meanwhile, the f6 Rook is the star at keeping Black’s forces cut off from the action!

27. Qh4 Bc6 28. Qh6 Rae8 Before White puts h4-h5 into action, they instructionally “understand” the element of time in the chess position. There is no immediate rush to play h4-h5 if Black’s forces are as far away from the fray as they are. In Karpov-fashion, AlphaZero first prepares this by a “positional-squeezing” of sorts. They will take grip on the dark-squares (even more so than now!) and only then push the h-pawn when ready.

29. Rd6! Bxf3 30. Bxf3 Qa6 The attack, or some progress, must be found. If not, then Black might be able to gobble up material with moves like …Qxa2 and eventually win. Kasparov once said (in general, not about this position) an old Russian chess saying going something like this, “When your King is under attack, you don’t worry about dropping a pawn on the Queenside.” The funny saying is essentially about putting things into perspective and aligning priorities. In chess, checkmate is valued above all else (including material) as it wins the game. In this position, giving up material (such as the a2 pawn that means little to the position) only matters if the attack on the King is unsuccessful.

31. h4 Qa5 It is valuable to understand (what Stockfish is not necessarily understanding by recommending 31. Rxd7? over 31. h4) that 31. Rxd7? might be a positional mistake. White’s attack is the key element here and the goal is opening lines of attack on the Black monarch. Capturing a pawn on d7 may just allow potential counterplay as Black could then free their position via the opening of the d-file.

If 31. Rxd7? then …Rd8 is in the air to free up Black’s position. What IM Daniel Rensch avoids mentioning is how 31…Rd8?? would be premature; it needs to be prepared in some way. If 31…Rd8?? 32. Rdxf7 Rxf7 33. Rxf7 Kxf7 34. Qxh7+ Ng7 35. e6+ and White’s position is overwhelming as the Black defenses are stretched and overloaded.

32. Rd1 c4 The Rook retreat back to d1 was to prevent the a5 Queen from …Qe1+ and forking the King and e5 pawn.

33. Rd5 Qe1+ Not as damaging if the Rook on d5 protects the e5 pawn after the Queen check. After the Queen check, White’s King is simply safer on g2.

34. Kg2 c3 The human perspective would likely recognize White’s control of the game and realize that White’s position is on the verge of crushing Black’s position. However, Stockfish doesn’t really seem to sense the danger; it claims the game is roughly equal - when in fact, the truth is Black’s position might be on the edge of collapse.

Stockfish is likely overestimating the value in its extra pawn (material advantage) and not fully understanding how their pieces all are tied down and “frozen” as IM Daniel Rensch describes it.

35. bxc3 Qxc3 36. h5! Re7 AlphaZero now relocates the Bishop to d1 and c2 or b3 for pressure on the Black Kingside; it is a teaching moment to signify the importance of getting all of your available forces into the attack, or to coordinate together for any other chess goals.

37. Bd1 Qe1 38. Bb3 Rd8 Stockfish is beginning to collapse, but even now doesn’t fully realize just how bad Black’s chess position truly is.

39. Rf3 Qe4 40. Qd2 Qg4 White has a myriad of threats and Black can only play reactionary chess to helplessly try to hang on. Previously, White had the main idea of opening the h-file, but recently even h6 was a threat (in part why …Qg4 was played to try and defend) since the dark squared situation is critical and Queen finagling to g7 would be checkmate.

The Black Rooks are tied down to the defense of the d7 pawn and Rensch also illustrates why the e6 Knight can’t really move anywhere as well. One sample continuation there was: 40…Nc5?! 41. hxg6 hxg6 42. Qg5 Rde8 (42…Nxc3? is too slow because of 43. Qxe7 Qxd5?? 44. Qxd8+ Kh7 45. axb3 and White is up a ton of material in the endgame) 43. Rd4! and the plan is to transfer the Rook from d4 to h4 and eventually h8 with a mating net. 43…Qxe5 tries for an endgame but also comes up short as 44. Qxe5 Rxe5 45. Bxf7+ wins the exchange and should convert the ending into a win, with precise play, in theory.

We digress back to the game where it is clear Black is effectively in zugzwang, or on route to it at the very least:

41. Bd1 Qe4 42. h6 Nc7 43. Rd6 Ne6 44. Bb3 Qxe5? 43…Rxe5? would be an error because of the backrank motifs possible with something such as 44. Rxd7 Rxd7 45. Qxd7 Re7 46. Qd8+! Ne8 47. Ba4 g5 48. Bc6 Qe6 49. Bd5 and the pressure is indefensible with backrank checkmate threats, the pinned e8 Knight and x-ray threats on the f7 square. There is simply no satisfactory way to keep everything from falling.

43…Qxe5? is similarly bad with one sample continuation being: 44. Re3 Qf5 45. f4! and the Queen is removed from the defense of the e7 Rook.

With the game move of 44…Qxe5?, Stockfish is really just not aware of the danger; taking this pawn just accelerates Black’s demise. Perhaps the “horizon effect” (computer not seeing deep enough to realize the danger) contributed to Stockfish grabbing the e5 pawn. Conversely, AlphaZero appears to better evaluate for this long-term compensation and impacts on the position.

45. Rd5 Qh8 Black’s forces are slowly being pushed further and further into passivity and it is only a matter of “when” for zugzwang to completely take effect. In practical language, Black is running out of good options and will eventually be forced into a move choice which crumbles their own defense.

46. Qb4 Nc5 Interference to try and save the e7 Rook; the Rook doesn’t want to abandon the 7th rank and leave the defense of the pawns on d7 and f7.

IM Daniel Rensch also points out another reason 46…R7e8 loses and that is the basic idea of 47. Qa4 and planning to capture the a7 pawn and eventually the b6 pawn. In similar fashion to what Black tried earlier in the game (with 30…Qa6), White now would threaten to simply capture material. If White can clear the Queenside pawns, then Black will inevitably lose due to the “principle of two weaknesses.” Black would have to worry about the King threats (especially via the weak dark squares around their own King) as well as the now passed pawn on the a-file.

47. Rxc5! bxc5 48. Qh4 Rde8 The exchange sacrifice on c5 is incredibly strong and it is nice to see that AlphaZero plays this material exchange purely for its positional merit. It continues with 48. Qh4 instead of taking material of little importance (c5 pawn). It is more important to put pressure on the e7 Rook. Black will either have to move the Rook (removing a defender of d7 and f7) or to reinforce the protection of that Rook (as in the game) but go further into passivity as a result.

49. Rf6! Ouch! It is tough to watch. The Black Rooks are inadequately trying their best and now the h8 Queen is shut out of the game. Again, realize that f6 is a dark-square that the Rook was able to land on and plug everything.

Rf8 50. Qf4 a5 51. g4 d5 If this isn’t a lesson in zugzwang, then I don’t know what is! Black has the compulsion to move and they simply have no appealing moves to select from!

52. Bxd5 Rd7 53. Bc4 a4 53…Rd4?? is one example where Black will get checkmated quickly. Here is Rensch’s favorite of these variations:

53…Rd4?? 54. Rxf7! (sacrificing the Queen) Rxf4 55. Rg7# because of the double check by the g7 Rook and the c4 Bishop.

54. g5 a3 By this point, it is even more clear that White is in control. If Black’s a-pawn falls, then White’s a2 pawn becomes a passed pawn and it is just a matter of time before Black’s downfall because everything else seems restricted and won’t be able to prevent the passed pawn from promotion.

55. Qf3 Rc7 56. Qxa3 Qxf6 Stockfish sacrifices the Queen out of desperation and will resign within several moves from now.

57. gxf6 Rfc8 58. Qd3 Rf8 59. Qd6 Rfc8 60. a4 1-0 Stockfish resigns with the Black pieces. Impressive enough for AlphaZero to win, but the positional and long-term “thinking” is what was salient and almost human-like; this is something chess engines have strived for, yet formerly lacked the ability.
