Shivam Garg, Dimitris Tsipras, Percy Liang, & Gregory Valiant (2022), References, Textbook of AI

Shivam Garg, Dimitris Tsipras, Percy Liang, & Gregory Valiant (2022)

Advances in Neural Information Processing Systems 35.

URL: https://arxiv.org/abs/2208.01066

Abstract. A controlled empirical study of in-context learning. The authors train Transformers on a synthetic in-context-regression task, given a sequence of $(\mathbf{x}, y)$ pairs sampled from an unknown linear function, predict $y$ for a new $\mathbf{x}$, and show that the trained Transformer matches the performance of ordinary least squares on Gaussian inputs and adapts gracefully to function-class shifts. Extends the result to sparse linear functions, two-layer ReLU networks and decision trees. The paper is the empirical companion to the "Transformers implement learning algorithms in their forward pass" hypothesis.

Tags: language-models in-context-learning transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes