Precision (classification), Glossary, Textbook of AI

Precision in binary classification is

$$\text{Precision} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}$$

, the fraction of predicted positives that are actually positive. Also called positive predictive value (PPV) in epidemiology.

Contrasts with recall (sensitivity, true positive rate):

$$\text{Recall} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}$$

, the fraction of actual positives that were correctly predicted.

Trade-off: lowering the decision threshold typically raises recall and lowers precision (more positives flagged, but more false positives among them). Raising the threshold does the reverse. The precision-recall curve plots one against the other across all thresholds.

F1 score is the harmonic mean: $F_1 = 2 PR / (P + R)$. The $F_\beta$ score weights recall $\beta$ times as much as precision.

When precision matters most:

Spam filtering: a false-positive (legitimate email marked spam) is much more costly than a false-negative (spam in inbox).
Search engine top-1 results: users only see the top result, so it must be relevant.
Recommendation systems: showing irrelevant items annoys users.

When recall matters more:

Medical screening: a false-negative (missed disease) is far more costly than a false-positive (additional testing).
Fraud detection: missing a fraud is often costlier than investigating a non-fraud.
Evidence retrieval in law: missing relevant evidence may lose a case.

Multi-class extensions: macro-precision (per-class average), micro-precision (aggregate then divide), weighted precision.

Precision is one of the most widely-used evaluation metrics in classification.

Related terms: Recall (classification), F1 Score, AUC-ROC

Discussed in:

Chapter 7: Supervised Learning, Evaluation Metrics

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).