Deep learning for vision processing: Google's perspective
In a column published two months ago, I discussed the compelling potential and increasingly proven reality of convolutional neural networks and other deep learning techniques for solving a variety of longstanding computer vision challenges. And in a column published two weeks ago, I mentioned that I'd be in attendance that same week at my organization's Embedded Vision Summit, along with pointing you to past events' presentations on deep learning and other computer vision topics.
At the Summit, I gained yet another perspective on deep learning, thanks to a keynote from Google Senior Fellow Jeff Dean, entitled "Large-Scale Deep Learning for Building Intelligent Computer Systems." Here's a preview, straight from the Alliance's YouTube channel:
As background, over the past few years, Google has built two generations of large-scale systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. Google has released its second-generation software library for machine learning, TensorFlow, as an open source project, and is now collaborating with a growing community on improving and extending its functionality. Using TensorFlow, Google's research group has made significant improvements in the state-of-the-art in many areas, and dozens of different groups at Google use it to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks. And as part of this work, multiple groups have ported their existing projects from other deep learning systems to TensorFlow.
In his talk, Dean highlighted some of ways that Google trains large models quickly on large datasets, and discussed different approaches for deploying machine learning models in environments ranging from large datacenters to mobile devices. He then discussed ways in which Google has applied this work to a variety of problems in Google's products, usually in close collaboration with other teams. Among other topics covered, Dean showcased just-introduced TensorFlow v0.8, which supports distributed processing for model training purposes. He also discussed (expanding on a subject I introduced in another recent column) how high-precision floating-point network models are not only often overkill from a required-accuracy standpoint, but also incur undesirable system demands from performance, power consumption and bill-of-materials cost standpoints.
The full video of Dean's talk (the first of what will end up being around 100 presentation and demonstration videos from the multi-day event) can be found on the Alliance website, along with a slide set of Embedded Vision Summit business and technical presentations in Adobe Acrobat PDF format (note that website registration and login are required prior to access). And for even more information on deep learning for vision, I encourage you to attend Embedded Vision Alliance founder Jeff Bier's upcoming presentation "Using Deep Learning to Extract New Value from Sensors," taking place on Tuesday, June 21 from 2:00-2:30PM PT at the Sensors Expo's Pre-Conference Symposia in San Jose, California. Here's an abstract of Bier's planned June 21 talk (he's also giving a presentation on embedded vision from 10:20-11:00AM on June 22 during the main conference):
After decades of development, artificial neural network algorithms have emerged as a powerful technology for extracting insights from many different types of data, including Internet searches, audio signals, and images. For companies making and using sensors, deep learning offers the ability to obtain new value from existing sensors. For example, a microphone in a vehicle can detect when the road surface is wet; an image sensor can measure heart rate and emotional state from face images. Sensor suppliers and users who are first to exploit deep learning have the opportunity to gain significant competitive advantage. In this presentation, we introduce the basic concepts of deep neural networks, illustrate how they can be used to expand the utility of a range of different sensor types, and provide recommendations for how to implement them in products.
I'll have more to tell you about Embedded Vision Summit topics, trends and associated talks in the coming weeks. Until then, I as always welcome your feedback!
Regards,
Brian Dipert
Editor-in-Chief, Embedded Vision Alliance
[email protected]