Boltzmann Machine, Glossary, Textbook of AI

A Boltzmann machine, introduced by David Ackley, Geoffrey Hinton and Terry Sejnowski in 1985, is a stochastic recurrent neural network: the stochastic generalisation of the deterministic Hopfield network. Units take binary values $s_i \in \{0, 1\}$, and the symmetrically connected network defines an energy function

$$E(\mathbf{s}) = -\sum_{i\lt j} w_{ij} s_i s_j - \sum_i b_i s_i,$$

where $w_{ij}$ are pairwise weights and $b_i$ are biases. At thermal equilibrium with temperature $T$, the probability of any global state follows the Boltzmann distribution

$$P(\mathbf{s}) = \frac{1}{Z} \exp\!\left(-\frac{E(\mathbf{s})}{T}\right),$$

with partition function $Z = \sum_{\mathbf{s}} \exp(-E(\mathbf{s})/T)$. Each unit is updated by a stochastic rule: it switches on with probability $\sigma\!\left(\sum_j w_{ij} s_j + b_i\right)$, where $\sigma$ is the logistic sigmoid. Repeated asynchronous updates with simulated annealing draw samples from $P(\mathbf{s})$.

The Boltzmann learning rule updates each weight by the difference between two correlations:

$$\Delta w_{ij} = \eta \left( \langle s_i s_j \rangle_{\text{data}} - \langle s_i s_j \rangle_{\text{model}} \right),$$

where $\langle \cdot \rangle_{\text{data}}$ is the correlation when the visible units are clamped to a training example (the "wake" phase) and $\langle \cdot \rangle_{\text{model}}$ is the correlation when the network runs freely (the "sleep" phase). The update is purely local, depending only on the two units' co-activations, but extracting reliable correlations requires extensive Markov chain Monte Carlo sampling, which made the original Boltzmann machine prohibitively slow.

The restricted Boltzmann machine (RBM), in which connections are restricted to lie between a visible layer and a hidden layer with no within-layer connections, allowed a much faster approximate learning rule: contrastive divergence (Hinton, 2002). Because hidden units become conditionally independent given visibles (and vice versa), Gibbs sampling alternates two block updates, and even a single step gives useful gradient estimates. RBMs were the building blocks of Hinton and Salakhutdinov's 2006 deep belief network, in which RBMs were stacked and trained greedily layer by layer. This work is widely credited with kickstarting the deep-learning renaissance: it showed that deep architectures could be trained successfully when initialised by unsupervised pre-training.

Conceptually, Boltzmann machines connect statistical physics, neural networks and probabilistic inference. The energy-based view they introduced underpins energy-based models more broadly, and the idea of learning by matching data and model statistics resurfaces in score matching, noise-contrastive estimation and modern diffusion models, which can be viewed as deep continuous-time Boltzmann machines learning to invert a noising process. Although classical Boltzmann machines are rarely deployed today, their conceptual lineage permeates contemporary generative AI.

Discussed in:

Chapter 6: ML Fundamentals, Machine Learning
Chapter 7: Supervised Learning, Deep Learning

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).