ML Fundamentals

Dr Chris Paton

Learning Objectives

State the supervised ML framework in terms of data, hypothesis class, loss function, and optimisation
Engineer features and representations appropriate to the data modality and model
Evaluate models using metrics such as accuracy, precision, recall, F1 score, and ROC/AUC
Apply regularisation (L1, L2, early stopping, dropout) to control overfitting and improve generalisation
Use train/validation/test splits and k-fold cross-validation to estimate generalisation error honestly

You want to build a spam filter. The traditional approach is to write rules by hand: if the email contains "lottery" and "click here," flag it. But spammers adapt. Your rules go stale within weeks, and adding more rules makes the system brittle until the rule book reads like a legal document that no one fully understands.

Machine learning takes a different approach. You show the algorithm thousands of emails, each labelled "spam" or "not spam," and it figures out the rules on its own. When spammers change tactics, you retrain on fresh examples. The filter adapts because the algorithm learns from data, not from your guesses about what spam looks like. The same blueprint, with surface changes, powers credit-scoring systems, medical diagnostic aids, autonomous-vehicle perception stacks, recommendation engines, and the foundation models that the rest of this book is concerned with.

This chapter covers the ideas that make this work. You will learn the formal framework behind ML algorithms, how to prepare features for a model, how to measure whether a model is actually useful, how to prevent overfitting, and how to estimate performance honestly with cross-validation. We work through examples in scikit-learn and NumPy and close with thirty exercises, fifteen of them with full solutions. For deeper treatment, see Hastie, Tibshirani, and Friedman 2009, Bishop 2006, Murphy 2022, Goodfellow, Bengio, and Courville 2016, and Russell and Norvig 2020.

Textbook of AI

ML Fundamentals

In this chapter