People

Gerald Tesauro

1958–, AI researcher

Gerald Tesauro is an American AI researcher who, while at IBM Research, developed TD-Gammon (1992–1995), a backgammon-playing program that learned by playing approximately one million games against itself, using temporal-difference learning (Sutton, 1988) to train a feed-forward neural-network value function. TD-Gammon reached the level of the strongest human players, and, in some unconventional opening moves, surpassed them, eventually changing world-class human play.

TD-Gammon was the canonical demonstration that reinforcement learning by self-play could reach world-class performance in a complex domain. It directly inspired AlphaGo's self-play methodology twenty years later, and the 2016 result of AlphaGo defeating Lee Sedol explicitly traces its lineage back to Tesauro. He has remained at IBM Research for his career, contributing to reinforcement learning, multi-agent systems and the IBM Watson team.

Related people: Richard Sutton, Arthur Samuel

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).