Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017), References, Textbook of AI

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017)

arXiv.

DOI: https://doi.org/10.48550/arxiv.1705.08292

Abstract. Empirical demonstration that adaptive methods such as Adam can converge to solutions that generalise worse than well-tuned SGD with momentum, particularly on image classification, motivating caution and the development of improved variants.

Tags: optimisation adam generalisation

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

The Marginal Value of Adaptive Gradient Methods in Machine Learning