Diederik P. Kingma & Jimmy Ba (2014)
arXiv.
DOI: https://doi.org/10.48550/arxiv.1412.6980
Abstract. Introduces Adam, which combines momentum (first-moment estimate) with RMSProp-style per-parameter scaling (second-moment estimate) and bias correction. Adam's robustness and default-friendly hyperparameters have made it the most widely used deep learning optimiser.
Tags: optimisation adam sgd