References

Gaussian Error Linear Units (GELUs)

Dan Hendrycks & Kevin Gimpel (2016)

arXiv.

DOI: https://doi.org/10.48550/arxiv.1606.08415

Abstract. Introduces the Gaussian Error Linear Unit, a smooth activation that weights its input by the CDF of a standard normal. GELU has become the default activation in transformer architectures including BERT and GPT.

Tags: neural-networks activations gelu

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).