References

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017)

arXiv.

DOI: https://doi.org/10.48550/arxiv.1705.08292

Abstract. Empirical demonstration that adaptive methods such as Adam can converge to solutions that generalise worse than well-tuned SGD with momentum, particularly on image classification, motivating caution and the development of improved variants.

Tags: optimisation adam generalisation

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).