Abstract. Introduces the Retentive Network (RetNet) architecture. Reformulates the sequence layer as a retention operator that admits three mathematically equivalent computational forms, a parallel form for efficient training (looking like attention with a complex-exponential decay), a recurrent form for $O(1)$-memory inference, and a chunkwise form for long-sequence modelling. RetNet matches Transformer perplexity at scale while removing the $O(n^2)$ inference cost of attention. The architecture is part of the cluster of "linear attention with structured decay" models that includes Mamba, RWKV and Gated Linear Attention.