Belief Propagation, Glossary, Textbook of AI

Belief propagation (BP), introduced by Judea Pearl in 1982 and developed in his 1988 book Probabilistic Reasoning in Intelligent Systems, is a message-passing inference algorithm for graphical models. Each node passes local probability updates, messages, to its neighbours; on tree-structured graphs, two passes (root-to-leaves and leaves-to-root) compute exact posterior marginals at every node in linear time.

Sum-product on a tree

For a factor graph with variables $\mathbf{x} = (x_1, \ldots, x_n)$ and factors $\{f_a\}$, the joint factorises as $p(\mathbf{x}) = \frac{1}{Z} \prod_a f_a(\mathbf{x}_a)$. The sum-product algorithm sends two kinds of messages:

$$\mu_{x \to f}(x) = \prod_{g \in \mathcal{N}(x) \setminus f} \mu_{g \to x}(x)$$

$$\mu_{f \to x}(x) = \sum_{\mathbf{x}_a \setminus x} f_a(\mathbf{x}_a) \prod_{y \in \mathcal{N}(f) \setminus x} \mu_{y \to f}(y)$$

The marginal at any variable is then $p(x) \propto \prod_{f \in \mathcal{N}(x)} \mu_{f \to x}(x)$. On a tree, a single sweep in each direction computes all marginals exactly. The cost is linear in the number of nodes and exponential in the maximum factor size.

The max-product variant

Replacing $\sum$ with $\max$ gives the max-product algorithm, which computes the maximum a posteriori (MAP) assignment rather than marginals. The Viterbi algorithm for hidden Markov models is max-product on a chain; the forward–backward algorithm is sum-product on the same chain.

Loopy belief propagation

On graphs with cycles, repeatedly applying the message-update rules until convergence is called loopy belief propagation. There is no general convergence guarantee, the fixed points correspond to stationary points of the Bethe free energy (Yedidia, Freeman, Weiss, 2001), and convergence is sensitive to the message-update schedule.

Despite the lack of theoretical backing, loopy BP often produces good approximate posteriors. It was the key insight enabling the breakthrough performance of turbo codes (Berrou, 1993) and the rediscovery of low-density parity-check codes (Gallager, 1962; MacKay, 1996). Both are now near-Shannon-capacity error-correcting codes deployed in 4G/5G cellular standards, Wi-Fi 802.11n/ac, DVB-S2 satellite TV and deep-space communications including the Mars rovers.

Modern legacy

Belief propagation appears throughout AI:

Turbo decoding and LDPC decoding in communications.
Stereo vision and image denoising via Markov random fields.
Probabilistic programming systems (Stan, Pyro) use BP variants for some queries.
Graph neural networks are a direct descendant, the same algorithmic skeleton, with learned message functions replacing hand-derived probabilistic ones. A GNN layer sends a learned $\phi(\mathbf{h}_u, \mathbf{h}_v)$ along each edge instead of a probabilistic factor.

Pearl's 2011 Turing Award citation singled out belief propagation and the development of Bayesian networks as transformative contributions to AI.

Related terms: Bayesian Network, Hidden Markov Model, Graph Neural Network, judea-pearl

Discussed in:

Chapter 5: Statistics, Probabilistic Inference

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).