People

Tri Dao

1995–, Computer scientist

Tri Dao is a Vietnamese-American computer scientist whose 2022 paper FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness introduced the FlashAttention algorithm, a tile-based implementation of attention that reduces memory bandwidth requirements substantially while computing the exact same output as standard attention. FlashAttention is now standard in nearly every Transformer training and inference framework.

With Albert Gu, Dao co-developed the Mamba architecture (2023), a state-space model that achieves Transformer-quality language modelling with linear time complexity. Mamba and its successors represent the most serious challenge to the Transformer's dominance to emerge in the post-2017 era.

Dao completed his Stanford PhD under Christopher Ré in 2023 and joined Princeton as a faculty member while continuing affiliations with Together AI and other organisations.

Video

Related people: Albert Gu, Ashish Vaswani

Works cited in this book:

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).