References

Human Compatible: Artificial Intelligence and the Problem of Control

Stuart Russell (2019)

Viking.

URL: https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/

Abstract. Russell's book-length argument that AI as the field has historically practised it follows a standard model, build machines that optimise a fixed, fully specified objective, and that this model is the source of the alignment problem. The proposal is that the next generation of systems should be uncertain about the objective and learn it from human behaviour. The mathematical instantiation is cooperative inverse reinforcement learning (CIRL), in which a robot and a human play a game where only the human knows the reward function and the robot must defer to and learn from the human. Human Compatible gave alignment a respectable academic home in machine-learning theory and is one of the most influential books in the modern AI-safety canon.

Tags: safety alignment history

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).