Iz Beltagy, Matthew E. Peters, & Arman Cohan (2020), References, Textbook of AI

Iz Beltagy, Matthew E. Peters, & Arman Cohan (2020)

arXiv.

DOI: https://doi.org/10.48550/arxiv.2004.05150

Abstract. Introduces Longformer, which combines local sliding-window attention with a small number of global tokens to reduce the quadratic cost of self-attention to linear, enabling processing of documents with thousands of tokens.

Tags: transformer attention efficiency longformer

Cited in:

Chapter 13: Attention & Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Longformer: The Long-Document Transformer