The thermal story of AI hardware tracks the compute story. Each generation packs roughly the same FLOPs/W improvement as Moore's-law scaling, but throws so many more transistors into each package that per-chip power has risen sharply.
Per-GPU TDP:
- V100 (2017): 300 W
- A100 (2020): 400 W
- H100 SXM (2022): 700 W
- H200 (2024): 700 W
- B200 (2024): 1000 W
- GB200 (Grace + 2× B200): 2700 W per superchip
Per-rack power:
- Traditional data-centre rack: 5–15 kW.
- HGX H100 rack (4 servers × 10 kW): ~40 kW.
- GB200 NVL72: 120 kW per rack, with 72 GPUs and 36 Grace CPUs in a single liquid-cooled enclosure.
- Stargate-class designs (2025–28): 250+ kW racks rumoured, requiring novel busbar and coolant distribution.
Cooling regimes:
- Air cooling ceases to be viable above ~40 kW/rack, fans cannot move enough mass flow.
- Direct-to-chip liquid cooling: cold-plate over GPU and CPU, secondary loop to facility. Standard on H100 SXM, mandatory on B200 and GB200.
- Immersion cooling: server submerged in dielectric fluid. Higher density, less mature ecosystem; deployed by some hyperscalers and crypto-derived operators.
- Two-phase cooling: dielectric boils on-chip and condenses, removing heat at constant temperature. Experimental at scale.
PUE (Power Usage Effectiveness) = total facility power / IT power. A modern hyperscale data centre runs PUE 1.10–1.20; older enterprise rooms run 1.5–2.0. AI clusters with liquid cooling push PUE toward 1.06–1.08 because pumps are far more efficient than chillers and CRAH fans.
Site selection is now grid-limited rather than fibre- or land-limited:
- Northern Virginia (Ashburn): the historical centre, now grid-constrained, Dominion Energy has multi-year interconnect queues.
- Texas (ERCOT): deregulated grid, fast interconnect, abundant gas; xAI Colossus in Memphis (TVA), Stargate in Abilene.
- Phoenix, Reno, Iowa: cheap power and land, established hyperscaler campuses.
- Nordic countries: free cooling from cold air, hydroelectric power; Meta, Google, Microsoft sites.
- Nuclear PPAs: Microsoft signed Three Mile Island restart (Constellation, 2024); Amazon bought Talen's Susquehanna campus; Google signed SMR offtake with Kairos (2024). Nuclear is the only carbon-free baseload that scales to gigawatts on AI timeframes.
Carbon and water: a 100 MW cluster running at 80 % capacity uses ~700 GWh/year, comparable to a town of 60k people. Water use for evaporative cooling can reach 1.8 L per kWh in dry climates; this is increasingly a public-licence issue (Arizona, Spain).
Thermal design margin is now part of the model design. The B200's 1 kW envelope was pushed past Hopper's 700 W in part because the NVL72 rack's liquid cooling can dissipate it at the silicon level, a system-level rather than chip-level constraint.
Discussed in:
- Chapter 15: Modern AI, Modern AI