A Hopfield network, introduced by John Hopfield in 1982, is a fully-connected recurrent neural network of binary threshold units (±1) with symmetric weights (w_ij = w_ji, w_ii = 0) and asynchronous update dynamics. Each unit's state is updated to s_i = sign(Σ_j w_ij s_j). The crucial property is that the network's dynamics monotonically minimise an explicit energy function E = −½ Σ_ij w_ij s_i s_j until reaching a local minimum.
Hopfield's insight was that this can be exploited as an associative memory. By choosing weights via the Hebbian rule w_ij = (1/N) Σ_p ξ_i^p ξ_j^p, where {ξ^p} is a set of patterns to store, each pattern becomes (approximately) a local minimum of the energy. The network retrieves stored patterns from partial or noisy cues by descending to the nearest minimum.
The storage capacity is approximately 0.14 N for a network of N units before catastrophic interference between patterns sets in (Amit, Gutfreund, Sompolinsky, 1985, using methods of statistical mechanics). Capacity can be increased by other learning rules, pseudoinverse, perceptron-style , but at the cost of basin-of-attraction quality.
Hopfield networks are the conceptual ancestor of the Boltzmann machine (stochastic units, Hinton and Sejnowski 1985), of every subsequent energy-based model in deep learning, and of modern Hopfield networks (Ramsauer et al., 2020) which connect the model to attention mechanisms, establishing that, in the right scaling regime, Hopfield retrieval is mathematically equivalent to attention with exponentially many stored patterns.
Mathematics
A Hopfield network has $N$ binary units $s_i \in \{-1, +1\}$ with symmetric weights $w_{ij} = w_{ji}$ and zero self-connections $w_{ii} = 0$. The energy function is
$$E(s) = -\frac{1}{2} \sum_{i,j} w_{ij} s_i s_j - \sum_i b_i s_i.$$
Asynchronous updates flip a unit to minimise local energy:
$$s_i \leftarrow \mathrm{sign}\!\left(\sum_j w_{ij} s_j + b_i\right).$$
Each update either decreases the energy or leaves it unchanged, so the dynamics monotonically descend to a local minimum.
Hebbian storage of a set of patterns $\{\xi^{(\mu)}\}_{\mu=1}^P$ uses
$$w_{ij} = \frac{1}{N} \sum_{\mu=1}^P \xi_i^{(\mu)} \xi_j^{(\mu)}, \quad i \neq j.$$
Each pattern $\xi^{(\mu)}$ is then approximately a local minimum of the energy, and the network converges from a noisy starting state to the nearest stored pattern, implementing content-addressable memory.
Storage capacity (Amit, Gutfreund, Sompolinsky 1985, methods of statistical mechanics): $P_{\max} \approx 0.138 N$ patterns can be stored before catastrophic interference. Above capacity, all stored memories are corrupted; below, retrieval is reliable.
Modern Hopfield networks (Ramsauer et al. 2020) use a continuous energy function
$$E(\xi) = -\mathrm{lse}(\beta, X^\top \xi) + \tfrac{1}{2} \xi^\top \xi + \mathrm{const}$$
and update rule
$$\xi^{\mathrm{new}} = X \, \mathrm{softmax}(\beta X^\top \xi)$$
which is mathematically equivalent to attention in the limit of one-step convergence. This connects classical associative memory to modern Transformers and shows that attention has exponentially many "stored patterns" in the storage-capacity sense.
Related terms: john-hopfield, Boltzmann Machine, Energy-Based Model, Attention Mechanism
Discussed in:
- Chapter 9: Neural Networks, Recurrent Networks