Glossary

Epoch

An Epoch is one complete pass through the training dataset. If the training set has $N$ examples and the batch size is $B$, then one epoch consists of $\lceil N/B \rceil$ parameter updates (one per batch). Training typically proceeds for many epochs—tens to hundreds for classical deep learning, though modern large language model pretraining often sees each token only once, making the concept of "epoch" less directly applicable.

The number of epochs is an important hyperparameter. Too few and the model underfits; too many and it overfits. Early stopping automates this choice by monitoring validation loss and halting training when it stops improving. Learning rate schedules are often specified in terms of epochs: step decay at epoch 30, 60, 90 for ResNet on ImageNet, or cosine annealing from maximum to minimum over a specified epoch budget.

For very large datasets, partial epochs are meaningful: one might evaluate on validation every $k$ steps rather than every epoch. For massive datasets where a single epoch is already excessive—modern LLM pretraining corpora contain trillions of tokens—training typically runs for a fixed compute budget or a fixed number of tokens processed, with the concept of "epoch" becoming less central. In the era of scaling, the key quantities are parameters, training tokens, and compute FLOPs, which together predict performance via scaling laws.

Related terms: Batch Size, Early Stopping

Discussed in:

Also defined in: Textbook of AI