Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, & Hao Ma (2020), References, Textbook of AI

Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, & Hao Ma (2020)

arXiv:2006.04768.

URL: https://arxiv.org/abs/2006.04768

Abstract. Introduces Linformer, an efficient Transformer that approximates self-attention with linear complexity by exploiting the empirical observation that the attention matrix produced by trained Transformers is approximately low-rank. Linformer projects the keys and values into a fixed-dimensional latent space ($k \ll n$) using learned linear projections, then performs attention in that space. The result is $\mathcal{O}(n k)$ time and memory rather than $\mathcal{O}(n^2)$. Linformer was one of the early efficient-attention variants and informed the broader literature on low-rank, kernel and sparse attentions.

Tags: transformers efficient-attention

Cited in:

Chapter 13: Attention & Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Linformer: Self-Attention with Linear Complexity