People

Long Ouyang

1990–, Computer scientist

Long Ouyang is an American computer scientist whose 2022 paper at OpenAI Training Language Models to Follow Instructions with Human Feedback introduced InstructGPT, a fine-tuning pipeline that aligns base language models to user intent through three stages: supervised fine-tuning on human demonstrations, reward modelling from human preferences, and reinforcement learning (PPO) against the reward model. The pipeline became the basis of ChatGPT's training and the dominant paradigm for aligning LLMs.

The paper's empirical demonstration that a 1.3B-parameter InstructGPT outperforms the 175B-parameter base GPT-3 on human-preference evaluations established that alignment, not raw scale, was the path to user-facing utility, a finding that drove the next several years of LLM productisation.

Video

Related people: Paul Christiano, Sam Altman

Works cited in this book:

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).