Ensemble Methods, Glossary, Textbook of AI

Ensemble Methods combine multiple base learners into a single, stronger model. The fundamental insight is that a collection of individually weak or unstable models can, when combined judiciously, produce predictions that are substantially more accurate and robust than any single constituent. The bias–variance decomposition formalises this: if the base learners' errors are sufficiently diverse, averaging reduces variance without increasing bias.

Bagging (bootstrap aggregating) trains each base learner on a different bootstrap sample and combines predictions by voting or averaging; it primarily reduces variance. Random forests extend bagging with feature subsampling. Boosting builds the ensemble sequentially, with each new learner focusing on examples the current ensemble gets wrong. AdaBoost upweights misclassified points; gradient boosting fits new trees to the negative gradient of the loss function in function space. XGBoost, LightGBM, and CatBoost are highly optimised gradient boosting implementations that dominate tabular data competitions.

Stacking (stacked generalisation) uses the outputs of several diverse base learners as features for a meta-learner, which learns how best to combine them. Unlike bagging and boosting's fixed combination rules, stacking learns the combination from data. To avoid overfitting, base-level predictions are generated via cross-validation so the meta-learner sees predictions the base models did not train on. Ensembles are universally stronger than any single model, at the cost of increased complexity and compute.

Video

Related terms: Random Forest, Gradient Boosting

Discussed in:

Chapter 7: Supervised Learning, Ensemble Methods

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.