People

John Schulman

1985–, Computer scientist

John Schulman is an American computer scientist whose 2015 PhD work at Berkeley (under Pieter Abbeel) introduced Trust Region Policy Optimization (TRPO) and the simpler Proximal Policy Optimization (PPO, 2017). PPO has become the dominant policy-gradient algorithm for deep reinforcement learning and the workhorse of RLHF training of large language models, InstructGPT, ChatGPT, Claude, Gemini and the post-training stages of nearly every modern LLM use PPO or close variants.

Schulman was a co-founder of OpenAI (2015) and led its reinforcement-learning and post-training research. He was a major contributor to ChatGPT and GPT-4. In August 2024 he left OpenAI for Anthropic, citing AI safety as the motivating reason. He left Anthropic in early 2025 to pursue a startup.

Video

Related people: Sam Altman, Dario Amodei

Works cited in this book:

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).