The recurrence unrolled across time becomes a deep feed-forward network with shared weights.
From the chapter: Chapter 12: Sequence Models
Glossary: rnn, recurrent neural network
Transcript
A recurrent neural network has one cell. The cell takes a hidden state and an input, returns a new hidden state.
We feed a sequence in. At time one, the cell sees x one and the initial hidden state. It outputs h one.
At time two, the cell sees x two and h one. It outputs h two. The same weights as time one.
Unfold this picture across time. The same cell, copied at every step, hidden state passing forward.
What looked like a recurrence is now a feed-forward network. As deep as the sequence is long. With weights shared across every layer.
This is the trick that makes RNNs trainable by backpropagation. We backprop through the unrolled graph, summing the gradient contributions across every time step. This is backpropagation through time.
The chain of multiplications is exactly the chain rule applied to a recursive function. Many small numbers multiplied together vanish. Many large numbers multiplied together explode.
Vanishing and exploding gradients are why long sequences are hard for plain RNNs. LSTMs and GRUs add gates and a separate cell state to keep gradients flowing. Transformers replace the recurrence entirely with attention.
But the unfolding picture is the foundation.