References

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, & Kaiming He (2017)

arXiv:1706.02677.

URL: https://arxiv.org/abs/1706.02677

Abstract. Facebook AI Research's recipe for scaling minibatch SGD to thousands of GPUs while preserving generalisation. Introduces the linear scaling rule, multiply learning rate by the same factor as the batch size, paired with a learning-rate warm-up to handle the early-training instability that the linear-scaling rule produces. Trains ResNet-50 on ImageNet to 76.3% top-1 in one hour using 256 GPUs and a batch size of 8,192. The recipe became standard in the large-batch deep-learning era and quantitatively related batch size to noise injection.

Tags: optimisation distributed-training imagenet

Cited in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).