Boosting, Glossary, Textbook of AI

Boosting is a family of ensemble methods that combine many "weak" learners, models slightly better than random, into a "strong" learner that achieves arbitrarily small error. The foundational result is Schapire's boosting theorem (1990): in the PAC framework, a concept class is weakly learnable if and only if it is strongly learnable.

The first practical boosting algorithm was AdaBoost (Freund and Schapire, 1995), which iteratively reweights the training data to focus subsequent learners on examples earlier ones got wrong. Gradient boosting (Friedman, 2001) generalised the algorithm to arbitrary differentiable loss functions: each new learner is trained to predict the negative gradient of the loss with respect to the current ensemble's prediction.

Modern gradient-boosted-tree libraries, XGBoost (Chen and Guestrin, 2016), LightGBM (Microsoft, 2017), CatBoost (Yandex, 2018) , combine gradient boosting with decision trees, various regularisations, and engineering optimisations (parallelisation, GPU support, efficient memory layout). They dominate tabular-data competitions on Kaggle and are deployed at scale at virtually every major technology company.

Boosting is contrasted with bagging (Breiman, 1996), which trains each model on an independent bootstrap sample. Bagging reduces variance; boosting reduces bias. Random forests combine bagging with feature subsampling; XGBoost combines boosting with various forms of regularisation. The two ensemble traditions remain complementary.

Video

Related terms: AdaBoost, Gradient Boosting, Bagging, Random Forest, Ensemble Methods

Discussed in:

Chapter 7: Supervised Learning, Supervised Learning

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).