Glossary

Large Language Model

Also known as: LLM, large language model

A Large Language Model (LLM) is, at its core, an autoregressive Transformer trained on a massive corpus of text to predict the next token. What distinguishes LLMs from earlier language models is not a qualitative change in architecture but a quantitative change in scale—in parameters, training data, and compute. GPT-3 (2020) had 175 billion parameters trained on 300 billion tokens. Modern LLMs range from small (1B parameters) to huge (trillion-parameter mixture-of-experts models).

The Chinchilla scaling laws (Hoffmann et al., 2022) refined earlier analyses by showing that training data should scale proportionally with model size—GPT-3 was badly undertrained. This insight shifted the field toward training moderately sized models on far more data, exemplified by LLaMA (65B parameters, 1.4T tokens) matching GPT-3 performance at a fraction of the inference cost. Modern training pipelines comprise three stages: pre-training on diverse text with next-token prediction; supervised fine-tuning on instruction-response pairs; and alignment training via RLHF or DPO to match human preferences.

LLMs exhibit emergent capabilities that appear suddenly at certain scales: in-context learning (performing new tasks from prompt examples), chain-of-thought reasoning (solving problems by thinking step by step), and few-shot learning. They are prone to hallucination (generating plausible-sounding falsehoods) and reflect biases in training data. Retrieval-augmented generation grounds them in external knowledge; tool use lets them call functions and APIs; and agentic frameworks enable multi-step task execution. LLMs have transformed software development, education, creative writing, and scientific research, while raising urgent questions about misinformation, job displacement, and power concentration.

Related terms: GPT, Transformer, Scaling Laws, RLHF, Retrieval-Augmented Generation

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI