nnU-Net ("no-new-U-Net"), published by Isensee, Jäger, Kohl, Petersen and Maier-Hein in Nature Methods in 2021, is not a new architecture but a self-configuring framework that turns a dataset into a trained, tuned U-Net with no manual hyperparameter selection. Its central thesis is provocative: most published gains on medical-imaging benchmarks come from careful pipeline engineering rather than architectural novelty, and a properly configured plain U-Net beats almost everything else.
Given a new dataset, nnU-Net inspects a small set of dataset fingerprints, image dimensions, voxel spacings, modality, intensity distributions, class frequencies, and uses hand-crafted heuristic rules to choose three categories of decisions. Fixed parameters (training schedule, loss function, optimiser) are inherited from a curated default. Rule-based parameters are derived from the fingerprint: patch size is set as large as the GPU allows; batch size follows from patch size; pooling depth is chosen so that the smallest feature map is at least $4^3$; spacing is resampled to the dataset median; intensities are normalised by CT clipping with mean/std or MR z-scoring. Empirical parameters (whether to use 2D, 3D full-resolution, 3D low-resolution cascade, or postprocessing by largest connected component) are decided by cross-validated comparison of the candidates and ensembled.
The training recipe is deliberately conservative: SGD with Nesterov momentum 0.99, polynomial learning-rate decay $\eta_t = \eta_0 (1 - t/T)^{0.9}$, 1000 epochs of 250 iterations with extensive on-the-fly augmentation (rotation, scaling, elastic deformation, gamma, mirror, Gaussian noise/blur). The loss is the sum of cross-entropy and Dice, $\mathcal{L} = \mathcal{L}_{\text{CE}} + \mathcal{L}_{\text{Dice}}$, with the Dice term computed across the batch rather than per-sample to stabilise foreground gradients on small classes.
The architecture itself is an unremarkable U-Net with instance normalisation and leaky ReLU; the magic is reproducibility. Hand the framework a labelled dataset in a standardised folder layout and it produces a state-of-the-art segmentation model with a single command. nnU-Net has won or placed in dozens of MICCAI grand challenges (BraTS, KiTS, LiTS, AMOS, AutoPET, the Medical Segmentation Decathlon) without any task-specific tuning. The lesson the authors draw, and which the field has largely accepted, is that a well-configured baseline is the right benchmark: any new method should demonstrate gains on top of nnU-Net, not against weak baselines.
Recent extensions include nnU-Net v2 with PyTorch Lightning and improved data loaders, MedNeXt swapping in ConvNeXt blocks while keeping the configuration logic, and nnUNet ResEnc with deeper residual encoders. The framework remains the de facto standard against which medical segmentation methods are measured.
Related terms: U-Net, MedSAM, Convolutional Neural Network
Discussed in:
- Chapter 17: Applications, Medical Imaging