Don’t miss the latest developments in business and finance.

Superhuman learning

Review of 'Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI'

books, bookstore
Devangshu Datta
Last Updated : Mar 06 2019 | 12:22 PM IST
The ability to learn is a sign of intelligence. So is the ability to teach. What about the ability to teach oneself? Children learn their first language by decoding conversations, even without formal help. Really bright people teach themselves stretching beyond the known. A lot of artificial intelligence (AI) involves setting a few rules, and crunching tons of data to look for patterns and insights. This is akin to a child learning language. 

What can AI most effectively learn via this auto-didacticism? The AlphaZero algorithms provide some answers, and proof of concept. The algorithms were created by DeepMind, a British AI company founded by neuroscientist and game playing prodigy, Demis Hassabis (DeepMind is a subsidiary of Alphabet). It is, at the moment, the best player of the ancient game of Go, Shogi (a Japanese version of chess) and chess.

Game Changer examines AlphaZero from the chess players’ perspective. Ms Regan and Mr Sadler are strong chess players, with mathematics and IT backgrounds. It would have been great to have a take from Go and Shogi professionals, but that is lacking, in English at least.  

Of the three, chess is the easiest to play, or program and it’s by far, the most popular. Computers have long since outstripped humans at chess, which has “only” about 10^50 legal positions (that’s 10 followed by 49 zeros). For reference, the best guess is that there are about 10^78 atoms in the universe. Shogi has about 10^71 legal positions. Go has about 2x10^170 legal positions (20 followed by 169 zeros) making it, by far, the most complex. Most Shogi programs are as good as the best humans. Until 2016, no Go program had beaten any professional Go player. DeepMind’s first algorithm, AlphaGo, learnt by being fed a database of Go games, which helped it derive strategic principles. It played against itself on a very fast network to refine its understanding.

The algorithms use a “Monte Carlo Tree Search” (MCTS) to choose moves. In MCTS, the program plays out many thousands of games every second against itself, starting from a given position. It selects moves at random and assigns probabilities of success to each move, depending on results. This trains the neural network to find strategic patterns. In March 2016, AlphaGo beat Lee Sedol, the Go world champion. It had derived strategic principles that Go grandmasters concede supersede anything humans know.

The next iteration, AlphaGo Zero was self-taught. “Zero” was just given the rules of Go. It played against itself, without a database. It beat AlphaGo. The third generation, AlphaZero is a “generic reinforcement learning algorithm”, which taught itself to play Shogi and chess, by playing itself. It had just the basic rules of these games, with no databases, no opening manuals, or endgame tablebases. It took just four hours, working on a very fast system with over 5,000 specialised chips to train its neural network. It was playing many million of games every minute, so that isn’t as crazy as it sounds.  

It reinvented and surpassed human understanding, discovering every strategic concept humans have learnt in five centuries, and adding its own secret sauce to play in ways humans never thought possible. It thrashed one of the best conventional chess engines, Stockfish, first under very restrictive conditions. It repeated the feat more convincingly, under equal conditions. It also beat one of the best Shogi programs, Elmo.

AlphaZero has a “superhuman” playing style that experts describe as “intuitive”. It has changed the way humans play, and inspired a new approach to engine development, using MCTS. AlphaZero runs on specialised chips but crowd-sourced projects like LeelaChessZero use commercial hardware to implement similar principles.
Conventional engines are programmed with strategic rules fed by human “teachers”. Instead of MCTS, they calculate via an “alpha-beta” algorithm. In the second AlphaZero-Stockfish match, Stockfish was crunching 70 million moves per second, analysing to great depths to select the “best” moves.  It calculates 900 times as much as AlphaZero (which sees about 80,000 positions a second). But AlphaZero “understood” chess better.

Stockfish evaluates a given position as superior for one side or equal, by assigning a numeric value using a pawn as the basic unit. This is misleading for humans since it will not distinguish between a dynamic position, where there’s only one good move, and a stable situation with many equivalent moves.

AlphaZero estimates probabilistically. It says white (or black) will score 55 per cent (or 75 per cent) after it has played out the position millions of times internally, using MCTS. This is more helpful because a dynamic position may have a lower probability than a stable position.

Moreover, AlphaZero is not afraid to sacrifice material for mobility, or other long- term gains. The authors did a lot of analysis to illustrate the stylistic quirks. Intriguingly, DeepMind is trying to open the black box of these self-taught heuristics to get a sense of how the neural network “thinks”.

So now we know that an autodidactic algorithm can discover new things.  But chess, Shogi and Go are closed systems with complete information. In theory, these games can be “solved” with every position judged a win or a draw.

This is a fascinating book for game players, and chess players in particular. It also offers insights about AI development. Well worth a deep dive.
Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI 
Natasha Regan, Matthew Sadler
New In Chess, 416 pages, Rs 1,544