Applications: 17.5   Climate and weather

Dr Chris Paton

17.5 Climate and weather

Weather forecasting and climate modelling sit among the largest computational tasks humans routinely run. The European Centre for Medium-Range Weather Forecasts (ECMWF) produces a global ten-day forecast every six hours by integrating discretised fluid-dynamical equations on a 9 km grid; the calculation occupies a several-tens-of-petaflops supercomputer for hours each cycle. National meteorological agencies in the United States, the United Kingdom, Japan, China, India, Australia and a dozen smaller states each maintain comparable infrastructure. The economics are formidable: a single tier-one weather centre runs a hardware budget of tens of millions of pounds a year, plus the staff to operate, maintain and develop the model code.

In the last three years, the calculus has been quietly upended. Deep-learning models trained on decades of reanalysis data now match or beat the best traditional numerical weather prediction (NWP) systems on a wide range of forecast targets, while running roughly four orders of magnitude faster on commodity GPUs. The same shift is starting to appear in climate downscaling, in renewable-energy forecasting and in severe-weather prediction. None of these is a curiosity; together they are reshaping how a substantial slice of the planet's geophysical infrastructure operates.

§17.4 considered robotics, where the binding constraint on AI is contact with the physical world through actuators and sensors. Climate and weather sit at the opposite extreme: the physical world is observed at vast scale through satellites, balloons, radars and ground stations, and the AI's task is purely to predict, not to act. That removes the safety questions that dominate robotics and replaces them with a different concern, getting the rare, catastrophic events right when most of the training data describes ordinary weather.

Weather forecasting

The breakthrough year was 2023. Three groups, working largely independently, demonstrated that a deep network trained on the ECMWF ERA5 reanalysis archive could produce ten-day global forecasts competitive with, and often better than, the operational dynamical models. The lever in each case was the same: 39 years of hourly ERA5 data, comprising surface and atmospheric variables on a 0.25° grid, gives roughly a third of a million training snapshots. That is enough for a sufficiently expressive neural network to learn the conditional distribution of tomorrow's atmosphere given today's, without ever encoding the Navier–Stokes equations explicitly.

GraphCast (Lam and colleagues, Science 2023, DeepMind) is a graph neural network operating on a multi-mesh icosahedral grid that wraps the sphere with progressively finer resolution. Forecasts run autoregressively in six-hour steps to a ten-day horizon at 0.25° (about 28 km). On the ECMWF's standard scorecard of 1380 verification targets, different variables at different pressure levels and lead times, GraphCast outperforms the Integrated Forecasting System (IFS) on roughly 90% of them across the full ten-day range. A run that takes IFS several hours on a supercomputer takes GraphCast about a minute on a single TPU. The cost ratio is staggering and is the reason every major operational centre has paid attention.

Pangu-Weather (Bi and colleagues, Nature 2023, Huawei Cloud) takes a different architectural route. It uses a 3D Earth-specific transformer with a hierarchical temporal aggregation that combines forecasts at different lead times to suppress error accumulation. Pangu beats IFS on most fields at one to seven day horizons and runs in seconds on a single GPU. Its tropical cyclone track predictions, in particular, were noticeably sharper than the dynamical models in early operational testing.

FourCastNet (Pathak and colleagues, NVIDIA, 2022) and its successor FourCastNet v2 (2023) use adaptive Fourier neural operators, a transformer-like architecture in which mixing happens in the spectral rather than the spatial domain. FourCastNet was the first ML weather system to demonstrate competitive skill at scale and remains the easiest of the family to train and deploy on commodity GPU clusters.

NeuralGCM (Kochkov and colleagues, Nature 2024, Google Research) takes a hybrid approach: a differentiable atmospheric dynamical core supplies the large-scale fluid mechanics while a neural-network parameterisation handles sub-grid processes such as convection, turbulence and cloud microphysics. The result is competitive with the best pure-ML and pure-physical models on medium-range forecasts and remains stable for multi-decade integrations, addressing a key worry about pure-ML approaches: that they may diverge or wander outside the training distribution when run for years rather than days.

ECMWF released its own AI-based system, AIFS, in 2024 and now delivers it operationally alongside IFS. Running ML forecasts on a few GPUs rather than a supercomputer enables ensembles of hundreds or thousands of perturbations, which capture forecast uncertainty better than the ~50-member dynamical ensembles used today.

Climate downscaling

Climate models, coupled ocean–atmosphere–land systems integrated for decades or centuries, are run at much coarser resolution than weather models. Even the latest CMIP6 ensemble runs at 50–100 km horizontally; that is fine for assessing global mean temperature change but useless for anyone who needs to know whether their valley will flood, whether their vineyard will tolerate the projected 2050 climate, or how to size urban drainage. Bridging that gap is the job of downscaling.

Traditional dynamical downscaling embeds a regional model inside the global one and re-runs the regional patch at finer resolution. It is faithful to the physics but expensive, a regional climate run can cost more than the global run that drives it. Statistical downscaling, fitting empirical relationships between coarse and fine fields from historical data, is cheap but brittle when the climate moves outside the training range.

AI downscaling sits between the two. Super-resolution diffusion models, GAN-based approaches and conditional normalising flows take coarse climate-model output as input and produce fine-grained fields (1–5 km) that respect both the large-scale structure of the input and the fine-scale statistics of the historical observations. The CorrDiff system from NVIDIA, deployed for Taiwan typhoon forecasting in 2024, downscales kilometre-scale precipitation and wind from coarser inputs in seconds. Google's MetNet-3 produces 1 km, twelve-hour US precipitation forecasts directly from radar and satellite. The IPCC AR7 cycle, in preparation through 2028, is the first IPCC assessment in which ML-based downscaling is being used at scale to translate global-model output into regional impact assessments for agriculture, water resources and public health.

Renewable energy

Wind and solar power are intermittent on every timescale that matters: minutes (cloud cover, gust fronts), hours (diurnal cycle), days (synoptic weather) and seasons. Grid operators need probabilistic forecasts at all those horizons to balance supply and demand, schedule reserves and price electricity. AI is now the default tool for the short-horizon end.

Solar forecasting from geostationary satellite imagery, predicting the next hour of irradiance at a panel array, uses convolutional architectures trained on years of paired sky-cam, satellite and ground-truth power data; errors are roughly half those of persistence baselines and a third lower than physical-model nowcasts. Wind forecasting integrates ML weather predictions (often a downscaled GraphCast or Pangu output) with site-specific power curves and turbine wake models. Battery-management systems use reinforcement learning to schedule charging and discharging given price forecasts, weather forecasts and battery degradation models.

On the demand side, utility load forecasting has been a regression problem since the 1980s; deep learning has cut errors by 10–20% in most published comparisons since 2020 and is now operational at most major grid operators. The real contribution, though, is at the household scale: smart-meter data, processed with sequence models, enables individualised demand-response pricing that shifts load away from peaks without consumer attention. National Grid in the UK and CAISO in California both run ML-based balancing tools as standard.

Severe weather

Severe-weather prediction has the highest direct human stakes. The 2024 Atlantic hurricane season, featuring Beryl, Helene and Milton, all of which caused major US damage, was the first in which ML models including GraphCast were operationally used for track prediction alongside dynamical models. Track-prediction errors at five-day lead time have shrunk by roughly 25% over 2020–2025, with ML contributing meaningfully. Intensity prediction, historically the harder problem, has improved more slowly; a NOAA-led HURDAT-AI effort and several university groups are pushing on it now.

Wildfire risk prediction combines satellite vegetation indices, weather forecasts and ignition data into spatial probability fields. The Pyrios system (Ladrigau and colleagues), the NOAA-NCEI Fire Connector and a clutch of Cal Fire integrations have been central to US Western states' fire-season planning since the 2020 and 2023 seasons. Google's flood-forecasting service, deployed in over 80 countries by 2024 in partnership with national meteorological agencies, uses a hydrological model with ML components and has provided flood alerts to over 460 million people.

Limits

The catch in all of this is the training distribution. ML weather models inherit the biases of ERA5, including its known weaknesses over the polar regions, the Southern Ocean and the tropical convergence zones. Extreme events are rare by definition, and a model trained predominantly on ordinary weather can underpredict the tails. GraphCast, in its initial Science paper, was demonstrably weaker on extreme precipitation than IFS; subsequent variants have closed but not eliminated the gap. Climate change introduces a further problem: the future will contain weather patterns and combinations of conditions that did not occur in the training period, and an ML model has no first-principles way to extrapolate.

The pragmatic answer is the hybrid. NeuralGCM, AIFS and the CorrDiff family all combine physics-based components with ML components, getting the data efficiency and speed of the latter while keeping the conservation laws and extrapolation behaviour of the former. That hybrid frontier is where the operational systems of the late 2020s will live.

What you should take away

ML weather models, GraphCast, Pangu-Weather, FourCastNet, NeuralGCM, match or beat operational NWP at medium-range forecasts and run roughly four orders of magnitude faster on commodity GPUs.
Big operational centres (ECMWF, NOAA, the UK Met Office) have integrated ML systems into their delivered products since 2024; the cost reduction enables much larger ensembles.
Climate downscaling with diffusion and GAN models converts coarse global-model output into kilometre-scale regional fields useful for impact assessment; the IPCC AR7 cycle uses these methods at scale.
Renewable-energy forecasting and grid management now rely on ML at every timescale from minutes (sky-cam nowcasting) to days (downscaled global forecasts) to seasons (load forecasting).
The weakness of pure-ML approaches is the tail of the distribution (extreme events, novel regimes, climate-change extrapolation); the working frontier is hybrid physics-plus-ML rather than one or the other.