1989–, Computer scientist
Dzmitry Bahdanau is a Belarusian computer scientist whose 2014 paper with Kyunghyun Cho and Yoshua Bengio Neural Machine Translation by Jointly Learning to Align and Translate introduced the attention mechanism for sequence-to-sequence learning. Where standard seq2seq models compressed the entire source sentence into a single fixed-size vector, Bahdanau attention let the decoder look back at every source word at each output step, with learned weights determining how much each source word contributed to the next output.
The mechanism dramatically improved neural machine translation, especially for long sentences, and was the conceptual ancestor of the self-attention at the heart of the Transformer (Vaswani et al., 2017). The lineage from Bahdanau attention to modern LLMs is direct, every attention head in every modern Transformer is a refinement of Bahdanau's 2014 idea.
Bahdanau completed his PhD at Montréal under Bengio and has worked at Element AI / ServiceNow Research and now at MILA.
Related people: Yoshua Bengio, Ashish Vaswani
Works cited in this book:
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014) (with Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, Yoshua Bengio)
- Neural Machine Translation by Jointly Learning to Align and Translate (2014) (with Kyunghyun Cho, Yoshua Bengio)
Discussed in:
- Chapter 13: Attention & Transformers, Attention and Transformers