Under the null hypothesis, the statistic has a known distribution. Extreme values lie in a small tail.
From the chapter: Chapter 5: Statistics
Glossary: hypothesis testing, p value
Transcript
We have a question. Does this drug lower blood pressure.
The null hypothesis says it does not. Under the null, the difference between treatment and control is just sampling noise.
We compute a test statistic from our data. A standardised difference of means. Under the null, this statistic follows a known distribution, often Gaussian or t.
Plot that distribution. Most of its mass sits in the middle. Far tails are rare.
We choose a significance level, typically five percent. Two and a half percent in each tail. The cutoffs become rejection thresholds.
If our observed test statistic lands inside the central region, we cannot reject the null. The data are compatible with no effect.
If it lands in a tail, we reject the null. The data would be unlikely to have arisen if the drug did nothing.
The p-value is the probability of seeing a statistic at least as extreme, under the null. Small p-values mean the data are surprising under no effect.
This procedure controls the false-positive rate at the chosen level. It does not measure the size of an effect or the probability that the alternative is true. It is a calibrated decision rule, no more, no less.