A foundation model (term coined in Bommasani et al. 2021, On the Opportunities and Risks of Foundation Models, Stanford CRFM) is a large model pre-trained on broad data at scale, adaptable to a wide range of downstream tasks. The defining example is the modern large language model (GPT, Claude, Gemini, Llama). The term emphasises a paradigm shift: rather than training task-specific models from scratch, train one general-purpose model and adapt it.
Properties:
- Scale: typically billions to trillions of parameters, trained on trillions of tokens.
- Generality: handles many downstream tasks via fine-tuning, prompting, or in-context learning.
- Emergence: capabilities not present in smaller models appear at scale (chain-of-thought reasoning, in-context learning, instruction following).
- Homogenisation: many applications now share the same underlying model, with adaptation rather than redesign.
Risks highlighted by the foundation-models report:
- Single point of failure: errors and biases in the foundation model propagate to all downstream uses.
- Concentration of power: only well-resourced organisations can train them.
- Opacity: hard to audit a 70B-parameter model's reasoning.
- Misuse: the same general capability can be applied to harmful tasks.
Multimodal foundation models: CLIP, GPT-4V, Claude, Gemini, combining text, image, audio, and video modalities. Increasingly the standard rather than the exception.
Vertical foundation models: Med-PaLM (medicine), ESM-2 (proteins), GraphCast (weather), GNoME (materials). Domain-specific foundation models trained on domain-specific data.
Related terms: Language Model, GPT, Claude, CLIP, In-Context Learning
Discussed in:
- Chapter 15: Modern AI, Modern AI