Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022), References, Textbook of AI

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022)

arXiv.

DOI: https://doi.org/10.48550/arxiv.2203.02155

Abstract. The InstructGPT paper: introduces the supervised fine-tuning + RLHF pipeline used to align a base language model into a helpful, honest assistant. This recipe underpins ChatGPT and subsequent assistant-style models.

Tags: rlhf alignment language-models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Training language models to follow instructions with human feedback