1990–, Computer scientist
Long Ouyang is an American computer scientist whose 2022 paper at OpenAI Training Language Models to Follow Instructions with Human Feedback introduced InstructGPT, a fine-tuning pipeline that aligns base language models to user intent through three stages: supervised fine-tuning on human demonstrations, reward modelling from human preferences, and reinforcement learning (PPO) against the reward model. The pipeline became the basis of ChatGPT's training and the dominant paradigm for aligning LLMs.
The paper's empirical demonstration that a 1.3B-parameter InstructGPT outperforms the 175B-parameter base GPT-3 on human-preference evaluations established that alignment, not raw scale, was the path to user-facing utility, a finding that drove the next several years of LLM productisation.
Video
Related people: Paul Christiano, Sam Altman
Works cited in this book:
- Recursively Summarizing Books with Human Feedback (2021) (with Jeff Wu, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano)
- Training language models to follow instructions with human feedback (2022) (with Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe)
Discussed in:
- Chapter 15: Modern AI, Modern AI