GPT-3, Glossary, Textbook of AI

GPT-3 is the 175-billion-parameter autoregressive Transformer language model introduced by Brown et al. at OpenAI in May 2020. The model was trained on approximately 300 billion tokens of text data, Common Crawl, WebText2, books and Wikipedia, over roughly thousands of GPU-years of compute. The result was the first large language model with strong few-shot in-context learning capability: given a few examples of a task in its prompt, GPT-3 could perform new tasks without any gradient updates.

The result transformed the field. Where previous transfer-learning paradigms required fine-tuning on task-specific data, in-context learning made it possible to deploy a single model across thousands of tasks via prompting alone. The approach is the foundation of every subsequent commercial LLM product. The 2020 paper is generally taken as the moment the modern LLM era began.

GPT-3 was offered through an API rather than as open weights, in OpenAI's first major commercial deployment. Successive iterations followed: GPT-3.5 (2022), fine-tuned with RLHF following the InstructGPT recipe; used as the basis for the ChatGPT public launch. GPT-4 (2023) , multimodal, substantial reasoning improvements; technical details largely undisclosed. GPT-4o / GPT-4-turbo (2024), efficiency improvements; native multimodal. o1 / o3 (2024–), reasoning models trained with reinforcement learning to use extended chain-of-thought.

GPT-3's specific architectural details (decoder-only Transformer with learned absolute positional embeddings, alternating dense and locally-banded sparse attention) defined the template for what followed; later refinements such as RoPE, RMSNorm and SwiGLU were introduced by subsequent models.

Video

Related terms: tom-brown, GPT, In-Context Learning, ChatGPT

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).