Graph Neural Network, Glossary, Textbook of AI

A graph neural network (GNN) is a neural architecture designed to learn representations of nodes, edges, and whole graphs by propagating information along graph structure. Where convolutional networks exploit the regular grid of an image and recurrent networks exploit the sequence order of text, GNNs exploit the irregular adjacency of an arbitrary graph $G = (V, E)$.

The defining computation is message passing. Each node $v$ holds a hidden state $h_v^{(l)} \in \mathbb{R}^d$ at layer $l$. At each layer, $v$ aggregates messages from its neighbours $\mathcal{N}(v)$ and updates its own state:

$$h_v^{(l+1)} = \mathrm{UPDATE}\!\left(h_v^{(l)},\; \mathrm{AGG}\{h_u^{(l)} : u \in \mathcal{N}(v)\}\right)$$

The aggregator $\mathrm{AGG}$ must be permutation invariant because a node's neighbours have no canonical ordering; common choices are sum, mean, max, or attention-weighted sum. The update function is typically a small MLP or a gated unit. After $L$ layers, $h_v^{(L)}$ encodes information from the $L$-hop neighbourhood of $v$. For graph-level tasks a readout $z_G = \mathrm{POOL}\{h_v^{(L)} : v \in V\}$ produces a single vector.

GNNs unify many earlier ideas. The graph convolutional network uses a symmetric normalised aggregator; the graph attention network learns aggregation weights; GraphSAGE samples neighbours for scalability; graph isomorphism networks are designed to be as discriminative as the Weisfeiler--Lehman test. All fit the message-passing template.

Applications span chemistry (molecular property prediction, where atoms are nodes and bonds are edges), physics simulation (particles and contacts), traffic forecasting (road segments), recommender systems (user--item bipartite graphs), drug discovery, and combinatorial optimisation. AlphaFold uses graph-style attention over residue pairs, and DeepMind's GraphCast uses a GNN on a global mesh to forecast weather more accurately than traditional numerical models at a fraction of the compute.

Two limitations recur. Over-smoothing: stacking many message-passing layers makes node representations converge, since repeated averaging is a low-pass filter on the graph. Over-squashing: information from distant nodes must be compressed through narrow bottlenecks of the graph, losing detail. Workarounds include residual connections, graph rewiring, and graph transformers that attend over all node pairs.

GNNs are best understood as a generalisation of convolution from grids to arbitrary topologies, and as the natural neural architecture whenever data carry explicit relational structure rather than fitting neatly into a sequence or grid.

Video

Discussed in:

Chapter 9: Neural Networks, Graph Neural Networks

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).