Further reading

  • Goodfellow, Bengio and Courville (2016), Deep Learning, Chapter 10, recurrent and recursive networks.
  • Jurafsky and Martin (forthcoming third edition), Speech and Language Processing, chapters on language models, embeddings, RNNs, and attention.
  • Olah (2015), "Understanding LSTM Networks", the canonical visual exposition.
  • Karpathy (2015), "The Unreasonable Effectiveness of Recurrent Neural Networks", the min-char-rnn blog post.
  • Bahdanau, Cho and Bengio (2015) 2014, the original attention paper.
  • Sutskever, Vinyals and Le 2014, sequence-to-sequence learning.
  • Hochreiter and Schmidhuber 1997, the LSTM paper.
  • Mikolov et al. 2013, Pennington et al. 2014, Bojanowski et al. 2017, the word-embedding triumvirate.
  • Sennrich, Haddow and Birch 2016, BPE for NMT.
  • Graves et al. 2006, Connectionist Temporal Classification.
  • Holtzman et al. 2019, nucleus sampling.
  • Vaswani et al. 2017, Attention Is All You Need (preview of Chapter 13).

Further Learning

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).