Boris T. Polyak (1964)
USSR Computational Mathematics and Mathematical Physics.
DOI: https://doi.org/10.1016/0041-5553(64)90137-5
Abstract. Introduces the heavy-ball method, gradient descent augmented with a momentum term that pushes the iterate in the direction of an exponential moving average of past gradients. Polyak proved that for strongly convex quadratics the heavy-ball method achieves the accelerated convergence rate $\mathcal{O}((\sqrt{\kappa}-1)/(\sqrt{\kappa}+1))^t$, a quadratic improvement over plain gradient descent's $\mathcal{O}((\kappa-1)/(\kappa+1))^t$. The method is the historical ancestor of every modern momentum-based optimiser including SGD-with-momentum, Nesterov's accelerated gradient, Adam and AdamW.
Tags: optimisation history
Cited in: