Machine learning (ML) is the subset of artificial intelligence concerned with algorithms that improve their performance at a task through experience, that is, through exposure to data. Rather than writing explicit rules, the programmer provides examples and lets the system discover the patterns for itself. The discipline now powers most of contemporary AI, from web search and recommendation to autonomous vehicles and scientific discovery.
Mitchell's definition
Tom Mitchell's classical 1997 definition states that a program learns from experience $E$ with respect to task $T$ and performance measure $P$, if its performance on $T$, as measured by $P$, improves with $E$. The triple $(T, E, P)$ remains a useful framing for any concrete ML problem: spam filtering ($T$) improves through labelled emails ($E$) measured by classification accuracy ($P$); chess play ($T$) improves through self-play games ($E$) measured by win rate ($P$).
Paradigms
The three classical paradigms of ML are:
- Supervised learning, learning from input–output pairs $(x_i, y_i)$, encompassing regression and classification. The goal is to learn a function $f: \mathcal{X} \to \mathcal{Y}$ that generalises beyond the training set.
- Unsupervised learning, discovering structure in unlabelled data $\{x_i\}$, including clustering, density estimation, dimensionality reduction, and anomaly detection.
- Reinforcement learning, an agent learns a policy $\pi(a|s)$ to maximise expected cumulative reward $\mathbb{E}[\sum_t \gamma^t r_t]$ through interaction with an environment.
A fourth paradigm, self-supervised learning, has risen to prominence with large language models: the model generates its own supervisory signal from the structure of the data, predicting the next token, masked words, or contrastive views. Self-supervision has been the engine of the foundation-model era.
Generalisation: the central challenge
The central challenge of machine learning is generalisation, performing well on data the model has never seen. A model that perfectly memorises its training set but fails on new examples is overfitting; one too rigid to capture even the training data is underfitting. The art of ML lies in balancing capacity and regularisation. Classical statistical learning theory bounds generalisation error in terms of capacity measures such as VC dimension and Rademacher complexity; modern deep networks, which often interpolate the training set yet generalise well, have driven a re-examination of these bounds and concepts such as double descent and implicit regularisation.
Historical arc
ML's intellectual roots stretch from Bayes (1763) and Legendre's least-squares (1805) through Fisher's statistics (1920s) to the perceptron (Rosenblatt, 1958). The connectionist winter following Minsky and Papert's Perceptrons (1969) gave way to the rediscovery of backpropagation (Rumelhart, Hinton, Williams, 1986) and the symbolic-statistical pivot of the 1990s, when support vector machines, boosting, and graphical models matured. The 2012 ImageNet result of AlexNet detonated the deep learning era, which in turn produced the transformer (2017) and the foundation models of the 2020s.
Modern landscape
Today ML powers ranking and recommendation, machine translation, speech recognition, protein-structure prediction (AlphaFold), code generation, autonomous driving, and the conversational AI of large language models. It is increasingly indistinguishable from "AI" in popular usage, and the distinction between ML and statistics has blurred to the point that many practitioners regard them as a single discipline viewed from different vantages.
Related terms: Supervised Learning, Unsupervised Learning, Reinforcement Learning, Deep Learning, Generalisation
Discussed in:
- Chapter 6: ML Fundamentals, What is Machine Learning?