Shivam Garg, Dimitris Tsipras, Percy Liang, & Gregory Valiant (2022)
Advances in Neural Information Processing Systems 35.
URL: https://arxiv.org/abs/2208.01066
Abstract. A controlled empirical study of in-context learning. The authors train Transformers on a synthetic in-context-regression task, given a sequence of $(\mathbf{x}, y)$ pairs sampled from an unknown linear function, predict $y$ for a new $\mathbf{x}$, and show that the trained Transformer matches the performance of ordinary least squares on Gaussian inputs and adapts gracefully to function-class shifts. Extends the result to sparse linear functions, two-layer ReLU networks and decision trees. The paper is the empirical companion to the "Transformers implement learning algorithms in their forward pass" hypothesis.
Tags: language-models in-context-learning transformers