Pangu-Weather, Glossary, Textbook of AI

Pangu-Weather, published by Bi, Xie, Zhang, Chen, Gu and Tian (Huawei Cloud, Nature 2023), is a Transformer-based global weather forecasting model that operates on a true 3D Earth representation and matches or exceeds ECMWF HRES while running ~10 000× faster. It was the first published deep-learning model to outperform the operational ECMWF on all evaluated headline variables and has become, with GraphCast, one of the two reference learned NWP systems.

The state is the same kind of cube as GraphCast, a $721 \times 1440$ lat-lon grid at 13 pressure levels for five upper-air variables plus four surface variables, but Pangu-Weather treats it as a single 3D volume rather than a 2D grid plus level features. The architecture is a 3D Earth-Specific Transformer (3DEST): a Swin-Transformer-style hierarchical encoder–decoder with windowed self-attention extended to three dimensions, plus an Earth-Specific Positional Bias per attention head that encodes the (latitude, longitude, pressure-level) position of each token. This bias is the crucial inductive prior, atmospheric physics is rotationally non-uniform (Coriolis, land/sea contrast, jet streams), so positional encoding cannot be translation-invariant.

The headline trick is hierarchical temporal aggregation. Rather than train a single model to produce 6-hour updates and roll out 40 times for a 10-day forecast (GraphCast's strategy), Pangu-Weather trains four separate models with forecast horizons of 1, 3, 6 and 24 hours. At inference these are composed greedily: a 56-hour forecast is built as $24+24+6+1+1$ hours, requiring only five forward passes rather than 56 one-hour rollouts. This minimises iterative error accumulation, each rollout step adds noise that compounds, but compositions of larger steps suffer less compounding because there are fewer steps. On metrics like 5-day Z500 RMSE, hierarchical aggregation alone improves skill by 10–20% over greedy single-model rollout.

Training data is again ERA5 from 1979 to 2017, with each of the four horizon models trained independently on its respective lead time using mean absolute error rather than MSE: $\mathcal{L} = \sum_v \alpha_v\, \|\hat{x}_v - x_v\|_1$, where the per-variable weight $\alpha_v$ is hand-tuned to balance pressure-level and surface variables. Training takes ~16 days on 192 V100 GPUs for the full four-model ensemble.

On the standard ECMWF verification protocol Pangu-Weather beats HRES on Z500, T850, Q700, U10 and 2m-T at all lead times from 1 to 7 days, and matches HRES on cyclone tracking with substantially better extreme-event skill. It runs 10-day forecasts in 1.4 seconds on a single A100. The published comparisons against GraphCast are roughly even: Pangu-Weather has slight edges on tropical cyclone tracking and on extreme heat events; GraphCast has slight edges on multi-day precipitation and on high-altitude variables. Both are well ahead of operational physics-based models, settling a question that was open as recently as 2022.

Pangu-Weather has been integrated into ECMWF's experimental ML ensemble and has been extended in Pangu-Ocean (oceanographic forecasting) and FengWu (a competing hierarchical Transformer from a Shanghai AI Lab team that further extends lead time to 14 days).

Related terms: Transformer, GraphCast, Vision Transformer

Discussed in:

Chapter 17: Applications, Weather and Climate

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).