Glossary

Frontier Lab Compute Consumption

Frontier model training has scaled by roughly 10× compute per generation since 2018, a trajectory driven by scaling laws (Kaplan, Hoffmann/Chinchilla) showing that loss falls predictably as a power law in compute.

Compute milestones (training FLOPs, public estimates from Epoch AI):

  • GPT-2 (2019): $1.5 \times 10^{21}$
  • GPT-3 (2020): $3.1 \times 10^{23}$
  • PaLM (2022): $2.5 \times 10^{24}$
  • GPT-4 (2023): ~$2 \times 10^{25}$
  • Gemini Ultra (2023): ~$5 \times 10^{25}$
  • Llama 3.1 405B (2024): $3.8 \times 10^{25}$
  • GPT-4.5 / Grok 3 / Claude 3.5 Opus class (2024–25): estimated $10^{26}$
  • Next generation (2025–26): aiming at $10^{26}$–$10^{27}$

A single dense H100 delivers $\sim 10^{15}$ BF16 FLOP/s sustained; $10^{26}$ FLOPs therefore needs $10^{11}$ GPU-seconds, or about 3.2 million H100-hours. At 30 % MFU, about 10 million wall-clock H100-hours.

Cluster sizes:

  • Llama 3 trained on 24,576 H100s in two co-located clusters of 16k and 24k GPUs.
  • xAI Colossus (2024): 100,000 H100s in a single data centre in Memphis, expanded to 200,000 with H200/H100 mix and aiming at 1M.
  • Microsoft–OpenAI Stargate (announced 2025): multi-gigawatt site, $100B+ capex, target operational 2028.
  • Anthropic Project Rainier (2025): ~400k Trainium2 chips on AWS for Claude training.

Power: an H100 server (8 GPU + CPU + NIC + cooling) draws 10–14 kW; 100,000 H100s draws ~150 MW of IT power, with PUE-adjusted facility load of ~200 MW. The next generation (B200, 1 kW per GPU; GB200 NVL72, 120 kW per rack) drives multi-gigawatt sites for the largest training runs. For comparison, a typical nuclear reactor produces ~1 GW.

Order-of-magnitude per generation: this rate is consistent with both Moore-style cost-performance gains (3× per node) and algorithmic efficiency gains (3× per year, Epoch AI). Doubling every ~6–10 months in effective compute cannot continue indefinitely without either:

  1. New 100+ GW power infrastructure (gas turbines, nuclear PPAs, both Microsoft and Amazon have signed nuclear deals in 2024).
  2. Algorithmic breakthroughs reducing FLOP demand per capability unit.
  3. Hardware breakthroughs beyond CMOS (silicon photonics, optical compute).

Why it matters for the field: the gap between frontier-lab compute (~$10^{26}$ FLOPs) and academic compute (typically $10^{20}$–$10^{22}$) has grown to 4–6 orders of magnitude, locking academia out of frontier pre-training and concentrating capability research in a handful of labs (OpenAI, Anthropic, Google DeepMind, Meta, xAI, plus Chinese counterparts).

Related terms: Training-Cluster Economics, Power and Cooling, InfiniBand and RoCE, Inference Cost Economics

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).