People

Paul Christiano

1992–, Computer scientist

Paul Christiano is an American AI alignment researcher whose 2017 paper at OpenAI Deep Reinforcement Learning from Human Preferences (with Jan Leike, Tom Brown, Miljan Martic, Shane Legg and Dario Amodei) introduced the basic RLHF (Reinforcement Learning from Human Feedback) technique that would five years later become the basis of InstructGPT and ChatGPT.

Christiano left OpenAI in 2021 to found the Alignment Research Center (ARC), focusing on theoretical and empirical AI alignment research. ARC has produced influential work on dangerous-capability evaluations (METR's evaluations of GPT-4 and successors, run as a sister organisation), on eliciting latent knowledge (a thought-experimental research direction for detecting deceptive AI), and on iterated amplification (a proposed scalable alignment scheme).

In 2024 Christiano joined the US AI Safety Institute at NIST as head of AI safety. He has been one of the most prominent voices in the AI alignment community, both for his technical work and for his sober public commentary on the long-term risks of advanced AI.

Video

Related people: Dario Amodei, Geoffrey Irving

Works cited in this book:

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).