Further reading
- Goodfellow, Bengio and Courville (2016), Deep Learning, Chapter 10, recurrent and recursive networks.
- Jurafsky and Martin (forthcoming third edition), Speech and Language Processing, chapters on language models, embeddings, RNNs, and attention.
- Olah (2015), "Understanding LSTM Networks", the canonical visual exposition.
- Karpathy (2015), "The Unreasonable Effectiveness of Recurrent Neural Networks", the min-char-rnn blog post.
- Bahdanau, Cho and Bengio (2015) 2014, the original attention paper.
- Sutskever, Vinyals and Le 2014, sequence-to-sequence learning.
- Hochreiter and Schmidhuber 1997, the LSTM paper.
- Mikolov et al. 2013, Pennington et al. 2014, Bojanowski et al. 2017, the word-embedding triumvirate.
- Sennrich, Haddow and Birch 2016, BPE for NMT.
- Graves et al. 2006, Connectionist Temporal Classification.
- Holtzman et al. 2019, nucleus sampling.
- Vaswani et al. 2017, Attention Is All You Need (preview of Chapter 13).
This site is currently in Beta. Contact: Chris Paton
Textbook of Usability · Textbook of Digital Health
Auckland Maths and Science Tutoring
AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).