AlphaGo's Monte Carlo tree search, Textbook of AI

MCTS expands promising moves, simulates rollouts, and backs up scores.

From the chapter: Chapter 17: Applications

Glossary: monte carlo tree search, alphago

Transcript

The game of Go. Nineteen-by-nineteen board, hundreds of legal moves per turn, branching factor in the hundreds. Far beyond what brute-force search can handle.

Monte Carlo tree search. AlphaGo's planning algorithm. Build a search tree adaptively, focusing compute on promising branches.

Selection. Start at the root, the current position. Walk down the tree. At each node, choose the child with the highest upper-confidence-bound score. This balances exploitation, picking moves that look good, with exploration, trying moves that are uncertain.

Expansion. When we reach a leaf node not yet expanded, add its children to the tree.

Evaluation. Estimate the value of the new leaf. AlphaGo's first version used both a fast neural network value head and a Monte Carlo rollout to a random end-game.

Backup. Walk back up the tree. Update the visit count and average value of every node we passed through. Better-rated leaves pull their ancestors' averages up.

Repeat thousands of times per move. Each iteration grows the tree slightly, biasing future iterations toward promising lines.

After the search budget is exhausted, the move chosen is the most-visited child of the root.

AlphaGo combined this MCTS with a policy network for prior probabilities and a value network for leaf evaluation. The trio defeated Lee Sedol in 2016. AlphaZero, a year later, removed the rollouts and learned from self-play alone.

The same MCTS framework now powers AlphaProof for mathematics and AlphaTensor for matrix multiplication discovery.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).