Each tree sees a different bootstrap sample and a different random feature subset, then they vote.
From the chapter: Chapter 7: Supervised Learning
Glossary: random forest, decision tree, ensemble learning
Transcript
A single decision tree splits the input space into rectangles. It is fast, interpretable, and prone to overfit.
A random forest grows many trees from the same data, then averages them.
Tree one sees a bootstrap sample, drawn with replacement from the training set. About sixty-three percent of the original points appear, the rest are duplicates. At each split, the tree considers only a random subset of features.
Tree two: another bootstrap, another random feature subset. Different splits, different shape.
Tree three. Tree four. Two hundred trees.
Each tree on its own is a bit overfit. Each tree's errors are mostly its own.
Average their predictions for regression, or take a majority vote for classification. Idiosyncratic errors cancel. The shared signal survives.
The variance drops. The bias barely moves. Generalisation improves.
Random forests need almost no tuning. They handle mixed data types, missing values, and irrelevant features without breaking a sweat. For tabular data they were the strongest off-the-shelf method for two decades, until gradient boosting eventually edged them out.
The lesson stayed the same: many noisy learners, decorrelated, beat one careful learner.