Mor Geva, Roei Schuster, Jonathan Berant, & Omer Levy (2021), References, Textbook of AI

Mor Geva, Roei Schuster, Jonathan Berant, & Omer Levy (2021)

Conference on Empirical Methods in Natural Language Processing.

URL: https://arxiv.org/abs/2012.14913

Abstract. Re-interprets the Transformer feed-forward layer as a key-value memory. The first matrix $\mathbf{W}_1$ acts as a set of keys; each column of the second matrix $\mathbf{W}_2$ is a corresponding value. The hidden activations select which keys match the input, and the output is a weighted combination of values. Shows empirically that individual neurons in the FFN respond to interpretable input patterns (n-grams, semantic categories) and that the corresponding output rows promote specific output tokens. The paper is foundational for the mechanistic-interpretability view of MLPs as factual lookup memory.

Tags: interpretability transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Transformer Feed-Forward Layers Are Key-Value Memories