Arthur Jacot, Franck Gabriel, & Clément Hongler (2018), References, Textbook of AI

Arthur Jacot, Franck Gabriel, & Clément Hongler (2018)

Advances in Neural Information Processing Systems 31.

URL: https://arxiv.org/abs/1806.07572

Abstract. Establishes that in the infinite-width limit, training a neural network with gradient descent and small learning rate is equivalent to kernel regression with the Neural Tangent Kernel (NTK), a kernel determined by the network architecture at initialisation. The NTK remains constant during training in this limit, so dynamics become linear and analytically tractable. The paper provided the first solid theoretical handle on the optimisation and generalisation of overparameterised networks and seeded a substantial follow-up literature on lazy training, feature learning and the gap between NTK predictions and finite-width network behaviour.

Tags: theory deep-learning generalisation

Cited in:

Chapter 6: ML Fundamentals

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Neural Tangent Kernel: Convergence and Generalization in Neural Networks