Deep Belief Network, Glossary, Textbook of AI

A deep belief network (DBN), introduced by Geoffrey Hinton, Simon Osindero and Yee-Whye Teh in 2006, is a stack of restricted Boltzmann machines trained layer by layer. Each successive RBM is trained on the hidden activations of the previous one, building up a deep hierarchy of features. After greedy unsupervised pre-training, the entire network can be fine-tuned end-to-end on a labelled task.

DBNs were generally credited with launching the modern deep-learning era. Before 2006, attempts to train deep networks by direct backpropagation typically failed, the gradient signal vanished before reaching the early layers, and the optimisation got stuck in poor local minima. Hinton, Osindero and Teh's pre-training-then-fine-tuning recipe demonstrated that deep architectures could be trained successfully, and the resulting networks substantially outperformed shallow alternatives on MNIST.

The pre-training-then-fine-tuning recipe defined deep-learning practice for the next several years. From around 2010, however, advances in initialisation (Glorot 2010), activation functions (ReLU, Nair and Hinton 2010), regularisation (dropout, Srivastava et al. 2014) and hardware (GPU training) made fully supervised training of deep networks practical without the pre-training step. AlexNet's 2012 ImageNet result was trained from random initialisation; the era of pre-training-with-RBMs was effectively over.

DBNs are now of historical interest rather than active practice, but the idea of pre-training survives, the modern world has an enormous pre-training infrastructure for language models and vision-language models, and the conceptual lineage from DBN to BERT to GPT is direct.

Related terms: geoffrey-hinton, Restricted Boltzmann Machine, Contrastive Divergence, Deep Learning

Discussed in:

Chapter 9: Neural Networks, Generative Models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).