Recall (classification), Glossary, Textbook of AI

Recall, also called sensitivity or the true positive rate (TPR), is the fraction of actual positives that a binary classifier correctly identifies:

$$\text{Recall} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}},$$

where $\mathrm{TP}$ and $\mathrm{FN}$ are the counts of true positives and false negatives in the confusion matrix. Recall is one of two complementary metrics of classifier performance on the positive class, the other being precision $\mathrm{TP}/(\mathrm{TP} + \mathrm{FP})$.

Recall $= 1$ means every actual positive was found, although some predicted positives may be false alarms (low precision). Recall $= 0$ means none was found. Recall is undefined when the test set contains no actual positives.

Medical and signal-detection terminology

In medicine and epidemiology recall is sensitivity; its complement is the false-negative rate, $1 - \text{recall} = \mathrm{FN}/(\mathrm{TP} + \mathrm{FN})$. The companion measure on the negative class is specificity $= \mathrm{TN}/(\mathrm{TN} + \mathrm{FP})$, the true negative rate, with complement the false-positive rate $\mathrm{FPR} = 1 - \text{specificity}$. The conflation of recall (machine-learning) and sensitivity (medicine) is one of the most common sources of cross-disciplinary confusion.

Threshold curves

Most classifiers output a continuous score; a decision threshold $\tau$ converts this to a binary prediction. As $\tau$ varies, recall and other metrics trace out curves:

Receiver operating characteristic (ROC) plots TPR (recall) against FPR; the AUC-ROC is the area under this curve and equals the probability that a random positive scores higher than a random negative.
Precision–recall curve plots precision against recall; the AUC-PR (average precision) is its area, more informative under heavy class imbalance.

When recall matters

Recall is the headline metric when the cost of a false negative dominates:

Medical screening: don't miss the cancer, don't miss the sepsis.
Fraud detection: don't miss the fraudulent transaction.
Legal evidence retrieval (e-discovery): don't miss the smoking gun.
Information retrieval: don't miss relevant documents in a comprehensive search.
Safety-critical perception: don't miss the pedestrian in an autonomous vehicle's field of view.

Trade-off with precision

Lowering the threshold typically raises recall and lowers precision; raising it does the opposite. The right operating point depends on the relative costs of false negatives and false positives. The harmonic mean F1-score $F_1 = 2 \cdot \mathrm{precision} \cdot \mathrm{recall} / (\mathrm{precision} + \mathrm{recall})$ summarises both with equal weight; F$_\beta$ generalises this to $\beta$-weighted recall.

Multi-class extensions

For $K$-class problems recall is computed per class against a one-vs-rest confusion matrix and aggregated as macro-recall (unweighted mean), micro-recall (pool counts then compute), or weighted recall (weight by class support). Similar variants exist for precision and F1.

Related terms: Precision (classification), F1 Score, AUC-ROC

Discussed in:

Chapter 8: Unsupervised Learning, Evaluating Classifiers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).