Noam Shazeer, People, Textbook of AI

1976–, Computer scientist

Noam Shazeer is an American computer scientist who has been one of the most consistently influential individual contributors to large-scale machine learning. A long-time Google researcher, he co-authored Attention Is All You Need (the Transformer, 2017) and was a major contributor to the Mixture-of-Experts paper Outrageously Large Neural Networks: The Sparsely- Gated Mixture-of-Experts Layer (2017) that introduced sparse routing as an alternative scaling strategy.

He left Google in 2021 to co-found Character.AI with Daniel De Freitas, building on Google's LaMDA work to create a consumer-facing dialogue product. In 2024 he and De Freitas returned to Google as part of a major licensing-and-acquisition deal, becoming co-leads of the Gemini large-language-model effort. He is one of the few researchers who has been at the cutting edge of LLMs continuously from 2015 to the present.

Video

Related people: Ashish Vaswani, Jeff Dean

Works cited in this book:

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks (2015) (with Samy Bengio, Oriol Vinyals, Navdeep Jaitly)
Attention Is All You Need (2017) (with Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019) (with Colin Raffel, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu)
Fast Transformer Decoding: One Write-Head is All You Need (2019)
GLU Variants Improve Transformer (2020)
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (2021) (with William Fedus, Barret Zoph)
PaLM: Scaling Language Modeling with Pathways (2022) (with Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel)

Discussed in:

Chapter 13: Attention & Transformers, Attention and Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).