Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, & Hao Ma (2020)
arXiv:2006.04768.
URL: https://arxiv.org/abs/2006.04768
Abstract. Introduces Linformer, an efficient Transformer that approximates self-attention with linear complexity by exploiting the empirical observation that the attention matrix produced by trained Transformers is approximately low-rank. Linformer projects the keys and values into a fixed-dimensional latent space ($k \ll n$) using learned linear projections, then performs attention in that space. The result is $\mathcal{O}(n k)$ time and memory rather than $\mathcal{O}(n^2)$. Linformer was one of the early efficient-attention variants and informed the broader literature on low-rank, kernel and sparse attentions.
Tags: transformers efficient-attention
Cited in: