A Random Forest is an ensemble of decision trees trained using two sources of randomness. First, each tree is trained on a bootstrap sample—a random sample drawn with replacement from the training set—this is bagging. Second, at each split, only a random subset of features is considered (typically $\sqrt{d}$ for classification, $d/3$ for regression). Predictions are aggregated by majority vote (classification) or averaging (regression).
The two randomisation mechanisms work in concert to reduce variance without substantially increasing bias. Decision trees are high-variance learners—small data perturbations can produce very different trees—so averaging many of them over different samples and feature subsets yields a far more stable predictor. The trees in a random forest are typically grown deep without pruning, because the ensemble averaging provides the regularisation that pruning would normally provide.
Random forests are remarkably versatile and require minimal hyperparameter tuning. They are resistant to overfitting (more trees rarely hurt, given compute), handle mixed feature types gracefully, and provide natural feature-importance estimates via mean decrease in impurity. Out-of-bag (OOB) error—computed by predicting each training example using only the trees that did not see it in their bootstrap sample—gives a free unbiased estimate of generalisation error without a separate validation set. These qualities have made random forests a default choice across scientific, commercial, and competition applications.
Related terms: Decision Tree, Ensemble Methods, Gradient Boosting
Discussed in:
- Chapter 7: Supervised Learning — Ensemble Methods
Also defined in: Textbook of AI, Textbook of Medical AI