Hyperparameter Tuning, Glossary, Textbook of AI

Hyperparameter Tuning is the process of searching for the best values of hyperparameters, settings that govern the training process but are not learned from data. Common hyperparameters include learning rate, batch size, number of layers and units, regularisation strengths, dropout probability, activation function, and data augmentation settings. The space of possible configurations is vast and the interactions between hyperparameters are often nonlinear and poorly understood.

Grid search exhaustively evaluates every combination in a predefined grid. It is simple but scales exponentially with the number of hyperparameters. Random search, shown by Bergstra and Bengio (2012) to be more efficient, samples configurations uniformly at random and is especially effective when only a few hyperparameters dominate performance. Bayesian optimisation builds a probabilistic surrogate model (typically a Gaussian process) of validation performance as a function of hyperparameters, using an acquisition function to balance exploration and exploitation. Libraries like Optuna, Hyperopt, and Ax make Bayesian optimisation accessible.

For computationally expensive searches, multi-fidelity methods like Successive Halving and Hyperband allocate budget adaptively: many configurations train briefly, only the promising ones continue. Population-based training and neural architecture search further automate the process. In practice, deep learning tuning relies heavily on community wisdom and rules of thumb, starting with well-established defaults and refining from there. Documenting the search procedure is essential for reproducible research.

Related terms: Learning Rate

Discussed in:

Chapter 10: Training & Optimisation, Hyperparameter Tuning

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.