Ofir Press, Noah A. Smith, & Mike Lewis (2021), References, Textbook of AI

Ofir Press, Noah A. Smith, & Mike Lewis (2021)

arXiv.

DOI: https://doi.org/10.48550/arxiv.2108.12409

Abstract. Introduces ALiBi, which subtracts a linear penalty proportional to the distance between positions from the attention logits with no learned parameters. ALiBi enables transformers to extrapolate to sequence lengths far beyond those seen during training.

Tags: transformer positional-encoding alibi

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation