Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Akiba, Takuya; Suzuki, Shuji; Fukuda, Keisuke

Full-text links:

Download:

(license)

Current browse context:

cs.DC

< prev | next >

new | recent | 1711

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Authors: Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

(Submitted on 12 Nov 2017)

Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

Comments:	NIPS'17 Workshop: Deep Learning at Supercomputer Scale
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Cite as:	arXiv:1711.04325 [cs.DC]
	(or arXiv:1711.04325v1 [cs.DC] for this version)

Submission history

From: Takuya Akiba [view email]
[v1] Sun, 12 Nov 2017 17:36:46 GMT (21kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

arXiv.org > cs > arXiv:1711.04325

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Submission history