1995–, Computer scientist
Tri Dao is a Vietnamese-American computer scientist whose 2022 paper FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness introduced the FlashAttention algorithm, a tile-based implementation of attention that reduces memory bandwidth requirements substantially while computing the exact same output as standard attention. FlashAttention is now standard in nearly every Transformer training and inference framework.
With Albert Gu, Dao co-developed the Mamba architecture (2023), a state-space model that achieves Transformer-quality language modelling with linear time complexity. Mamba and its successors represent the most serious challenge to the Transformer's dominance to emerge in the post-2017 era.
Dao completed his Stanford PhD under Christopher Ré in 2023 and joined Princeton as a faculty member while continuing affiliations with Together AI and other organisations.
Video
Related people: Albert Gu, Ashish Vaswani
Works cited in this book:
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022) (with Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (2024) (with Albert Gu)
Discussed in:
- Chapter 13: Attention & Transformers, Attention and Transformers