Power and Cooling, Glossary, Textbook of AI

The thermal story of AI hardware tracks the compute story. Each generation packs roughly the same FLOPs/W improvement as Moore's-law scaling, but throws so many more transistors into each package that per-chip power has risen sharply.

Per-GPU TDP:

V100 (2017): 300 W
A100 (2020): 400 W
H100 SXM (2022): 700 W
H200 (2024): 700 W
B200 (2024): 1000 W
GB200 (Grace + 2× B200): 2700 W per superchip

Per-rack power:

Traditional data-centre rack: 5–15 kW.
HGX H100 rack (4 servers × 10 kW): ~40 kW.
GB200 NVL72: 120 kW per rack, with 72 GPUs and 36 Grace CPUs in a single liquid-cooled enclosure.
Stargate-class designs (2025–28): 250+ kW racks rumoured, requiring novel busbar and coolant distribution.

Cooling regimes:

Air cooling ceases to be viable above ~40 kW/rack, fans cannot move enough mass flow.
Direct-to-chip liquid cooling: cold-plate over GPU and CPU, secondary loop to facility. Standard on H100 SXM, mandatory on B200 and GB200.
Immersion cooling: server submerged in dielectric fluid. Higher density, less mature ecosystem; deployed by some hyperscalers and crypto-derived operators.
Two-phase cooling: dielectric boils on-chip and condenses, removing heat at constant temperature. Experimental at scale.

PUE (Power Usage Effectiveness) = total facility power / IT power. A modern hyperscale data centre runs PUE 1.10–1.20; older enterprise rooms run 1.5–2.0. AI clusters with liquid cooling push PUE toward 1.06–1.08 because pumps are far more efficient than chillers and CRAH fans.

Site selection is now grid-limited rather than fibre- or land-limited:

Northern Virginia (Ashburn): the historical centre, now grid-constrained, Dominion Energy has multi-year interconnect queues.
Texas (ERCOT): deregulated grid, fast interconnect, abundant gas; xAI Colossus in Memphis (TVA), Stargate in Abilene.
Phoenix, Reno, Iowa: cheap power and land, established hyperscaler campuses.
Nordic countries: free cooling from cold air, hydroelectric power; Meta, Google, Microsoft sites.
Nuclear PPAs: Microsoft signed Three Mile Island restart (Constellation, 2024); Amazon bought Talen's Susquehanna campus; Google signed SMR offtake with Kairos (2024). Nuclear is the only carbon-free baseload that scales to gigawatts on AI timeframes.

Carbon and water: a 100 MW cluster running at 80 % capacity uses ~700 GWh/year, comparable to a town of 60k people. Water use for evaporative cooling can reach 1.8 L per kWh in dry climates; this is increasingly a public-licence issue (Arizona, Spain).

Thermal design margin is now part of the model design. The B200's 1 kW envelope was pushed past Hopper's 700 W in part because the NVL72 rack's liquid cooling can dissipate it at the silicon level, a system-level rather than chip-level constraint.

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).