The Gated Recurrent Unit (GRU), proposed by Cho et al. in 2014, simplifies the LSTM by merging the cell state and hidden state into a single state vector and using only two gates instead of three. The update gate $\mathbf{z}_t$ plays a role analogous to both LSTM's forget and input gates, controlling the balance between retaining the previous state and incorporating a new candidate. The reset gate $\mathbf{r}_t$ determines how much of the previous state contributes to the candidate update.
GRU has fewer parameters than LSTM and is therefore faster to train, which can be advantageous with limited compute or data. Empirical comparisons typically find neither GRU nor LSTM consistently dominates: GRUs edge ahead on smaller datasets or simpler tasks, LSTMs may hold an advantage for very long-range dependencies. The choice is often made by cross-validation on the specific task.
Both GRU and LSTM can be enhanced by stacking multiple recurrent layers (deep RNNs) and by processing the sequence in both directions (bidirectional RNNs) to capture context from past and future simultaneously. In most modern applications, transformers have replaced both GRUs and LSTMs, but recurrent architectures remain relevant in streaming, low-latency, and embedded settings, and their gating insights have informed the design of more recent sequence models.
Related terms: LSTM, Recurrent Neural Network
Discussed in:
- Chapter 12: Sequence Models — LSTMs & GRUs
Also defined in: Textbook of AI