Glossary

Loss Function

Also known as: cost function, objective function

A Loss Function (also called cost function or objective function) quantifies the discrepancy between a model's predictions and the desired outputs. Training a machine learning model is, almost without exception, framed as minimising a loss function with respect to the model's parameters. The choice of loss function encodes what we mean by "good" predictions and has profound consequences for the learned model.

For regression tasks, the mean squared error (MSE) $L = \frac{1}{n}\sum_i (y_i - \hat{y}_i)^2$ is standard; it penalises large mistakes heavily and corresponds to maximum likelihood under Gaussian noise. Mean absolute error (MAE) is more robust to outliers. For classification, cross-entropy is the loss of choice: $L = -\sum_i y_i \log \hat{p}_i$, which corresponds to maximum likelihood under a Bernoulli or categorical output distribution. Hinge loss, used by SVMs, encourages large-margin classifiers. KL divergence measures distance between distributions and is used in variational methods.

A subtle but important distinction separates the loss function the optimiser minimises from the evaluation metric the practitioner cares about. A model might be trained with cross-entropy but evaluated with F1 score, AUC, or accuracy. When these diverge—particularly under class imbalance—the choice of training loss may require adjustment (class weighting, focal loss, or direct metric optimisation) to produce a model that performs well on the metric that matters.

Related terms: Cross-Entropy, Gradient Descent, Maximum Likelihood Estimation

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI