Glossary

Recurrent Neural Network

Also known as: RNN

A Recurrent Neural Network (RNN) processes a sequence one element at a time, maintaining a hidden state that serves as a summary of the information seen so far. At each time step $t$, the RNN receives the current input $\mathbf{x}t$ and previous hidden state $\mathbf{h}{t-1}$, and produces a new hidden state $\mathbf{h}t = f(W{hh}\mathbf{h}{t-1} + W{xh}\mathbf{x}_t + \mathbf{b})$, typically with tanh activation. Weight matrices are shared across time steps, so parameter count is independent of sequence length.

Training uses Backpropagation Through Time (BPTT), which unrolls the network over the sequence and applies the chain rule through the resulting deep computational graph. Unfortunately, simple RNNs suffer severely from vanishing and exploding gradients: because the chain rule involves multiplying many Jacobians together as gradients flow backward through time, gradients shrink or grow exponentially. This limits simple RNNs' ability to learn dependencies beyond about 10–20 time steps.

LSTM and GRU cells were designed to solve this. They use gating mechanisms to control information flow through a dedicated memory channel, allowing gradients to flow over hundreds of time steps without attenuation. From roughly 2014 to 2017, LSTMs dominated sequence modelling in machine translation, speech recognition, and language modelling. They have since been largely supplanted by transformers, which process sequences in parallel and capture long-range dependencies more effectively—though RNNs retain niche advantages in streaming and low-latency applications, and recent state-space models (S4, Mamba) revive RNN-like architectures with improved efficiency.

Related terms: LSTM, GRU

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI