Current browse context:
cs.DC
Change to browse by:
References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
(Submitted on 12 Nov 2017)
Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.