Glossary

P-value

The P-value is the probability, under the null hypothesis, of obtaining a test statistic at least as extreme as the one observed. A small p-value indicates that the observed data are unlikely under $H_0$, providing evidence against it. Conventionally, a p-value below the significance level (typically 0.05) leads to rejection of the null hypothesis.

The p-value is one of the most misinterpreted concepts in statistics. Crucially, it is not the probability that the null hypothesis is true. Nor is a p-value below 0.05 strong evidence that a result is real—with many tests, low p-values occur by chance. Statistical significance does not imply practical significance: with enormous sample sizes, even trivially small effects yield minuscule p-values. The American Statistical Association issued a formal statement in 2016 warning against mechanical use of p-values and calling for richer reporting practices.

The machine learning community has increasingly moved away from rigid p-value thresholds in favour of reporting effect sizes, confidence intervals, and Bayesian posterior probabilities. Nonetheless, p-values remain prevalent in clinical AI and pharmacovigilance, where regulatory frameworks mandate controlled trials with pre-specified Type I error rates. Understanding both the mechanics and the limitations of p-values is essential for any practitioner who must interpret or communicate statistical results.

Related terms: Hypothesis Testing, Confidence Interval

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI, Textbook of Medical Statistics