Facebook develops new technique for accelerating deep learning for computer vision
Based on a collaboration between the Facebook Artificial Intelligence Research and Applied Machine Learning groups, a new paper has been released that details how Facebook researchers developed a new way to train computer vision models that speeds up the process of training an image classification model in a significant way.
Facebook explains in the paper that deep learning techniques thrive with large neural networks and datasets, but these tend to involve longer training times that may impede research and development progress. Using distributed synchronous stochastic gradient descent (SGD) algorithms offered a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers, but to make this method efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size, according to Facebook.
In the paper, the researchers explained that on the ImageNet dataset, large minibatches cause optimization difficulties, but when these are addressed, the trained networks exhibited good generalization. The time to train the ImageNet-1k dataset of over 1.2 million images would previously take multiple days, but Facebook has found a way to reduce this time to one hour, while maintaining classification accuracy.
"They can say 'OK, let’s start my day, start one of my training runs, have a cup of coffee, figure out how it did,'" Pieter Noordhuis, a software engineer on Facebook’s Applied Machine Learning team, told VentureBeat.“And using the performance that [they] get out of that, form a new hypothesis, run a new experiment, and do that until the day ends. And using that, [they] can probably do six sequenced experiments in a day, whereas otherwise that would set them back a week."
Specifically, the researchers noted that with a large minibatch size of 8192, using 256 GPUs, they trained ResNet-50 in one hour while maintaining the same level of accuracy as a 256 minibatch baseline. This was accomplished by using a linear scaling rule for adjusting learning rates as a function of minibatch size and developing a new warmup scheme that overcomes optimization challenges early in training by gradually ramping up the learning rate from a small to large value and the batch size over time to help maintain accuracy.
With these techniques, noted the paper, a Caffe2-based system trained ResNet-50 with a minibatch size of 8192 on 256 GPUs in one hour, while matching small minibatch accuracy.
"Using commodity hardware, our implementation achieves ∼90% scaling efficiency when moving from 8 to 256 GPUs," notes the paper’s abstract. "This system enables us to train visual recognition models on internet-scale data with high efficiency."
The team achieved these results with the aforementioned Caffe2, along with the Gloo libraryfor collective communication, and Big Basin, which is Facebook’s next-generation GPU server.
In summary, according to Facebook's Lauren Rugani, the paper demonstrates how creative infrastructure design can contribute to more efficient deep learning at scale.
"With these findings, machine learning researchers will be able to experiment, test hypotheses, and drive the evolution of a range of dependent technologies — everything from fun face filters to 360 video to augmented reality," wrote Rugani.
Viewthe paper.
Viewthe Facebook article.
Share your vision-related news by contacting James Carroll, Senior Web Editor, Vision Systems Design
To receive news like this in your inbox, click here.
Join our LinkedIn group | Like us on Facebook | Follow us on Twitter
Learn more: search the Vision Systems Design Buyer's Guide for companies, new products, press releases, and videos
About the Author
James Carroll
Former VSD Editor James Carroll joined the team 2013. Carroll covered machine vision and imaging from numerous angles, including application stories, industry news, market updates, and new products. In addition to writing and editing articles, Carroll managed the Innovators Awards program and webcasts.