Abstract. Introduces RWKV (Receptance-Weighted-Key-Value), a community-developed recurrent architecture that admits a parallelisable training form and an RNN-like inference form. The core layer is a hand-designed mixture of a linear-attention-style channel and a token-shift channel. RWKV scales to 14B parameters, matches contemporary GPT-class models on language modelling, and offers $O(1)$-memory inference. The architecture has been the most heavily community-iterated non-attention model of the post-Transformer era and shipped in version 6 by 2024.