Dario Amodei, People, Textbook of AI

1983–, Computer scientist; CEO of Anthropic

Dario Amodei is an American physicist and computer scientist who, after a PhD in biophysics at Princeton and roles at Stripe and Baidu Research, joined OpenAI in 2016 and rose to become its Vice President of Research. He led the development of GPT-2, GPT-3 and the early RLHF research that became InstructGPT. In late 2020 he and several colleagues left OpenAI over strategic disagreements; in 2021 he and his sister Daniela Amodei co-founded Anthropic in San Francisco with the explicit mission of developing AI systems that are safe, beneficial and understandable.

Anthropic released the Claude family of large language models from March 2023 (Claude 1, Claude 2, Claude 3 family in March 2024, Claude 3.5 Sonnet in June 2024, Claude 3.7 in 2025, Claude 4 family in 2025–26, Claude Opus 4.7 in 2026). Anthropic has positioned itself as the safety-conscious frontier AI lab, with substantial investments in interpretability research (mechanistic interpretability, the Transformer Circuits programme), in Constitutional AI training methodology, and in formal alignment research.

Amodei has been one of the most prominent public voices on advanced AI risk, with widely-discussed essays including Machines of Loving Grace (2024) on the positive scenarios he hopes AI will enable. As of 2026 Anthropic is one of the three or four organisations at the AI frontier, with valuations measured in tens of billions and ongoing partnerships with Amazon and Google.

Video

Related people: Daniela Amodei, Sam Altman, Christopher Olah

Works cited in this book:

Concrete Problems in AI Safety (2016) (with Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané)
Deep reinforcement learning from human preferences (2017) (with Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg)
AI safety via debate (2018) (with Geoffrey Irving, Paul Christiano)
Language Models are Unsupervised Multitask Learners (2019) (with Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Ilya Sutskever)
Language Models are Few-Shot Learners (2020) (with Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever)
Scaling Laws for Neural Language Models (2020) (with Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu)
Constitutional AI: Harmlessness from AI Feedback (2022) (with Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan)
Predictability and Surprise in Large Generative Models (2022) (with Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Tom Brown, Jared Kaplan, Sam McCandlish, Chris Olah, Jack Clark)
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned (2022) (with Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark)
In-context Learning and Induction Heads (2022) (with Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah)
Discovering Language Model Behaviors with Model-Written Evaluations (2022) (with Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan)

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).