K-fold cross-validation, Textbook of AI

Split the training set into k folds, train on k minus one, validate on the last, rotate.

From the chapter: Chapter 6: ML Fundamentals

Glossary: cross validation

Transcript

We have a training set, and we need to estimate how well our model will generalise.

A single train-test split gives one estimate. It depends on which examples landed in which split.

K-fold cross-validation does better. Divide the training set into k equal folds. Five folds is common.

Hold out fold one. Train on folds two through five. Evaluate on fold one.

Now hold out fold two. Train on the rest. Evaluate on fold two.

Continue until every fold has been the validation set once. We have k different estimates of test error.

Average them. This is the cross-validated estimate. Lower variance than a single split, because every example contributed to evaluation exactly once.

Cross-validation is also how we tune hyperparameters. Try a value of regularisation strength. Compute the average across folds. Try another. Pick the value with the lowest average error.

Then retrain on the full training set with the chosen hyperparameter. Reserve a separate test set, untouched throughout, for the final unbiased estimate.

The cost is k times more training. The benefit is a vastly more reliable model selection signal.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).