Jacob Devlin, People, Textbook of AI

1981–, Computer scientist

Jacob Devlin is an American computer scientist whose 2018 paper with Ming-Wei Chang, Kenton Lee and Kristina Toutanova BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding introduced BERT, an encoder-only Transformer pre-trained on a masked language modeling objective (predict randomly masked tokens) and a next-sentence prediction objective.

BERT achieved state-of-the-art on nearly every NLP benchmark in 2018 and reshaped the field. The pre-train-then-fine-tune paradigm BERT demonstrated, pre-train on unlabelled text at scale, fine-tune on small labelled task data, became the template for an entire generation of NLP models (RoBERTa, ALBERT, DeBERTa, XLNet, ELECTRA, etc.).

BERT was the encoder-side counterpart of GPT, and from 2018 to 2022 the two architectural traditions developed in parallel. The decoder-only autoregressive Transformer (GPT line) ultimately proved dominant for the largest models and for tasks beyond classification, but BERT-style encoders remain widely used for retrieval, classification and feature extraction. Devlin left Google for OpenAI in February 2023 and returned to Google's Gemini team in 2024.

Video

Related people: Ashish Vaswani, Alec Radford

Works cited in this book:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019) (with Ming-Wei Chang, Kenton Lee, Kristina Toutanova)
PaLM: Scaling Language Modeling with Pathways (2022) (with Aakanksha Chowdhery, Sharan Narang, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel)

Discussed in:

Chapter 13: Attention & Transformers, Attention and Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).