- State the supervised ML framework in terms of data, hypothesis class, loss function, and optimisation
- Engineer features and representations appropriate to the data modality and model
- Evaluate models using metrics such as accuracy, precision, recall, F1 score, and ROC/AUC
- Apply regularisation (L1, L2, early stopping, dropout) to control overfitting and improve generalisation
- Use train/validation/test splits and k-fold cross-validation to estimate generalisation error honestly
You want to build a spam filter. The traditional approach is to write rules by hand: if the email contains "lottery" and "click here," flag it. But spammers adapt. Your rules go stale within weeks, and adding more rules makes the system brittle until the rule book reads like a legal document that no one fully understands.
Machine learning takes a different approach. You show the algorithm thousands of emails, each labelled "spam" or "not spam," and it figures out the rules on its own. When spammers change tactics, you retrain on fresh examples. The filter adapts because the algorithm learns from data, not from your guesses about what spam looks like. The same blueprint, with surface changes, powers credit-scoring systems, medical diagnostic aids, autonomous-vehicle perception stacks, recommendation engines, and the foundation models that the rest of this book is concerned with.
This chapter covers the ideas that make this work. You will learn the formal framework behind ML algorithms, how to prepare features for a model, how to measure whether a model is actually useful, how to prevent overfitting, and how to estimate performance honestly with cross-validation. We work through examples in scikit-learn and NumPy and close with thirty exercises, fifteen of them with full solutions. For deeper treatment, see Hastie, Tibshirani, and Friedman 2009, Bishop 2006, Murphy 2022, Goodfellow, Bengio, and Courville 2016, and Russell and Norvig 2020.
In this chapter
- 6.1 What is machine learning?
- 6.2 The three classical paradigms (and a fourth)
- 6.3 The supervised learning setup
- 6.4 Generalisation: the central problem
- 6.5 Optimisation in more depth
- 6.6 Capacity, complexity, and regularisation
- 6.7 No free lunch and inductive bias
- 6.8 Model selection
- 6.9 Curse of dimensionality
- 6.10 Pipelines and leakage
- 6.11 Feature engineering vs representation learning
- 6.12 The double-descent phenomenon
- 6.13 Implicit regularisation of SGD
- 6.14 Common evaluation metrics
- 6.15 Imbalanced classes
- 6.16 Honest evaluation: train/val/test discipline
- 6.17 Putting it all together: a worked project
- Summary
- Exercises
- Selected solutions
- Further reading