Precision in binary classification is
$$\text{Precision} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}$$
, the fraction of predicted positives that are actually positive. Also called positive predictive value (PPV) in epidemiology.
Contrasts with recall (sensitivity, true positive rate):
$$\text{Recall} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}$$
, the fraction of actual positives that were correctly predicted.
Trade-off: lowering the decision threshold typically raises recall and lowers precision (more positives flagged, but more false positives among them). Raising the threshold does the reverse. The precision-recall curve plots one against the other across all thresholds.
F1 score is the harmonic mean: $F_1 = 2 PR / (P + R)$. The $F_\beta$ score weights recall $\beta$ times as much as precision.
When precision matters most:
- Spam filtering: a false-positive (legitimate email marked spam) is much more costly than a false-negative (spam in inbox).
- Search engine top-1 results: users only see the top result, so it must be relevant.
- Recommendation systems: showing irrelevant items annoys users.
When recall matters more:
- Medical screening: a false-negative (missed disease) is far more costly than a false-positive (additional testing).
- Fraud detection: missing a fraud is often costlier than investigating a non-fraud.
- Evidence retrieval in law: missing relevant evidence may lose a case.
Multi-class extensions: macro-precision (per-class average), micro-precision (aggregate then divide), weighted precision.
Precision is one of the most widely-used evaluation metrics in classification.
Related terms: Recall (classification), F1 Score, AUC-ROC
Discussed in:
- Chapter 7: Supervised Learning, Evaluation Metrics