Yoshua Bengio, People, Textbook of AI

1964–, Computer scientist

Yoshua Bengio is a French-Canadian computer scientist who has been one of the central figures in the development of modern deep learning. His 2003 paper A Neural Probabilistic Language Model (with Ducharme, Vincent and Jauvin) introduced the first significant neural language model, using continuous word embeddings as the input layer of a feed-forward network predicting the next word, anticipating word2vec, GloVe and the word-embedding component of every subsequent language model.

Bengio's later contributions span deep architectures (his 2009 Learning Deep Architectures for AI was an early synthesis), generative models (with Ian Goodfellow he supervised the 2014 GAN thesis), attention mechanisms in machine translation (Bahdanau, Cho and Bengio, 2014, the precursor to the Transformer's attention), curriculum learning, and the theory of generalisation in deep networks.

He founded the Montreal Institute for Learning Algorithms (MILA), one of the most productive deep-learning research centres in the world. His brother Samy Bengio is a senior machine-learning researcher at Apple (since 2021, after a long career at Google Brain). He shares the 2018 Turing Award with Hinton and LeCun. Since 2023 he has been an active advocate for AI safety research and government regulation of frontier AI.

Video

Related people: Geoffrey Hinton, Yann LeCun, Ian Goodfellow

Works cited in this book:

Gradient-based learning applied to document recognition (1998) (with Y. Lecun, L. Bottou, P. Haffner)
A Neural Probabilistic Language Model (2003) (with Réjean Ducharme, Pascal Vincent, Christian Jauvin)
Understanding the Difficulty of Training Deep Feedforward Neural Networks (2010) (with Xavier Glorot)
Random Search for Hyper-Parameter Optimization (2012) (with James Bergstra)
On the difficulty of training recurrent neural networks (2013) (with Razvan Pascanu, Tomas Mikolov)
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014) (with Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk)
Neural Machine Translation by Jointly Learning to Align and Translate (2014) (with Dzmitry Bahdanau, Kyunghyun Cho)
Generative Adversarial Nets (2014) (with Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville)
Deep Learning (2016) (with Ian Goodfellow, Aaron Courville)

Discussed in:

Chapter 1: What Is AI?, A Brief History of AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).