References

On the difficulty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, & Yoshua Bengio (2013)

International Conference on Machine Learning.

URL: https://arxiv.org/abs/1211.5063

Abstract. A clean exposition of the vanishing- and exploding-gradient problem in recurrent networks, building on the earlier qualitative analysis of Bengio, Simard and Frasconi (1994). The product of step-Jacobians $\prod_j J_j$ governs how a perturbation propagates through time; in generic networks it grows or shrinks exponentially with sequence length. The paper introduces gradient clipping as the standard mitigation for the exploding case and shows that it stabilises training across many architectures. Gradient clipping remains a default in every modern deep-learning library.

Tags: rnn optimisation sequence-models

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).