Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, & Aleksander Madry (2018)
arXiv.
DOI: https://doi.org/10.48550/arxiv.1805.11604
Abstract. Demonstrates empirically that batch normalisation does not reduce internal covariate shift in any meaningful sense. The authors argue instead that BN smooths the loss landscape, reducing the Lipschitz constants of the loss and its gradient, which explains the observed training benefits.
Tags: regularisation batch-normalisation theory