7.15 Summary
We have built up the workshop of supervised learning from first principles. Linear and logistic regression gave us the language of loss functions, gradients, regularisation, and probabilistic interpretations, the same machinery underpinning every neural network in the remainder of this book. Generalised linear models extended that machinery across the exponential family. KNN gave us a non-parametric baseline; decision trees, the interpretable rule-based alternative; naive Bayes, the small-data champion. SVMs introduced the maximum-margin principle, the Lagrangian dual, KKT conditions, and the kernel trick, concepts that recur in modern attention mechanisms and contrastive learning. Random forests and gradient boosting then showed how aggregating many weak learners produces the strongest off-the-shelf classifiers on tabular data. Finally, we placed all of these on a common evaluation footing, confusion matrices, ROC, calibration, cross-validation, and laid out a practical workflow that begins long before model fitting and ends long after.
The next chapter turns to the other half of classical machine learning: unsupervised learning, where the labels disappear and we ask the algorithm to discover structure on its own.