Glossary

GPT

GPT (Generative Pre-trained Transformer) is OpenAI's family of decoder-only autoregressive language models. Unlike BERT, which uses bidirectional attention for encoding, GPT uses causal (left-to-right) self-attention: each position can only attend to previous positions. GPT is pretrained on large text corpora with the next-token prediction objective, then either fine-tuned or prompted for downstream tasks.

The GPT series traced the scaling laws in real time. GPT-1 (2018, 117M parameters) demonstrated the transformer language modelling approach. GPT-2 (2019, 1.5B parameters) showed surprisingly strong zero-shot performance on many tasks. GPT-3 (2020, 175B parameters) introduced in-context learning—the ability to perform new tasks from examples in the prompt without any fine-tuning—and made autoregressive language modelling the central paradigm of modern AI. GPT-4 (2023) added multimodal input and further scaled capabilities. Each iteration showed that raw scaling of parameters, data, and compute yields smoothly improving performance across a broad range of benchmarks.

The decoder-only GPT architecture has proven remarkably versatile. Virtually any task can be cast as text generation with appropriate prompting: summarisation, translation, question answering, code generation, reasoning, even image captioning (with multimodal variants). Instruction tuning and RLHF transform raw GPT models into helpful assistants like ChatGPT. The GPT family's success has made decoder-only autoregressive models the dominant architecture for modern LLMs, with LLaMA, PaLM, Claude, Mistral, and Gemini all following the same basic template.

Related terms: Transformer, BERT, Large Language Model, Language Model, In-Context Learning

Discussed in:

Also defined in: Textbook of AI