Bootstrap: resample with replacement, Textbook of AI

Build a sampling distribution from one dataset by drawing thousands of resamples.

From the chapter: Chapter 5: Statistics

Glossary: bootstrap

Transcript

Suppose we have one sample of thirty observations, and we want to know the uncertainty in the sample mean.

Classical theory says the standard error is sigma over root n. But sigma is unknown.

The bootstrap. Treat the sample as if it were the population. Draw new samples from it, with replacement.

Each bootstrap sample has thirty observations. Some original points appear multiple times, some not at all. About sixty-three percent of the originals appear in any one bootstrap sample.

Compute the statistic of interest, the mean, on each bootstrap sample. We get one estimate per resample.

Repeat one thousand times. We now have one thousand bootstrap estimates of the mean.

Their distribution approximates the sampling distribution of the original estimator. The standard deviation across them is the bootstrap standard error. The 2.5 and 97.5 percentiles give a 95 percent confidence interval.

This works for almost any statistic. Median, correlation, regression coefficient, ratio of means. No closed-form formula required.

The bootstrap is one of the great inventions of computational statistics. It turns a hard analytical problem into a simple resampling loop.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).