Günter Klambauer, Thomas Unterthiner, Andreas Mayr, & Sepp Hochreiter (2017)
arXiv.
DOI: https://doi.org/10.48550/arxiv.1706.02515
Abstract. Introduces the Scaled Exponential Linear Unit (SELU) and shows that, under specific initialisation and architectural constraints, activations self-normalise toward zero mean and unit variance, removing the need for explicit normalisation layers.
Tags: neural-networks activations normalisation