Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, & Dario Amodei (2020), References, Textbook of AI

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, & Dario Amodei (2020)

arXiv.

DOI: https://doi.org/10.48550/arxiv.2001.08361

Abstract. Establishes empirical scaling laws for language model performance as a smooth power-law function of parameters, dataset size, and compute. The paper motivated the training of ever-larger models by demonstrating predictable returns on scale.

Tags: scaling language-models

Cited in:

Chapter 13: Attention & Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Scaling Laws for Neural Language Models