The following blog post is from Jeff Bier, Founder of the Embedded Vision Alliance, Co-Founder & President of Berkeley Design Technology, Inc.
About eight years ago, my colleagues and I realized that it would soon become practical to incorporate computer vision into cost- and power-constrained embedded systems. We recognized that this would be a world-changing development, due to the vast range of valuable capabilities that vision enables. It’s been gratifying to see this potential come to fruition, with a growing number of innovative vision-enabled products finding market success.
What we didn’t anticipate in 2011 was the important role that cloud computing would play in the proliferation of visual intelligence into new applications. Today, cloud compute providers increasingly offer GPU and FPGA co-processors to accelerate parallelizable workloads, and many also offer their own APIs providing a variety of vision capabilities, including Amazon’s Rekognition, Google’s Cloud Vision API, IBM’s Watson Visual Recognition, and Microsoft’s Azure Computer Vision API, among others.
Improvements in cloud application development support are not limited to vision-specific elements, of course. The competition to attract developers, and the massive investments in software by cloud compute providers, are driving rapid advances in all sorts of cloud software development tools, APIs and frameworks. In comparison, software development tools for embedded processors are improving at a more modest pace.
This raises the question: Given the increasingly awesome software development tools available in the cloud, and the ubiquity of Internet connectivity, why would anyone subject themselves to the challenges typically associated with implementing demanding computer vision and deep learning applications on an embedded processor? Wouldn’t it be much easier to ship the data to the cloud and do all of the heavy lifting there?
The answer is: It depends. Whether cloud or edge processing (or a combination of the two, or something in-between) is best depends entirely on the requirements and constraints of the application.
To illustrate this, consider two automotive applications. First, vision-based driver assistance systems, which provide capabilities such as forward collision avoidance. Such systems are increasingly becoming mainstream. It seems clear that applications like this should use local processing exclusively, because they require maximum reliability and minimum latency.
But not all automotive vision applications have these requirements. Consider a parking-space finding system that uses cameras on many cars to keep track of available parking spaces and direct drivers to the nearest available space. (In San Francisco, this would be an instant success.) Here, it’s obvious that the cloud is a natural fit—the system requires aggregation of data from multiple vehicles, while reliability and latency are not critical.
But even in this case it doesn’t necessarily follow that all of our computing should be done in the cloud. For example, we could choose to do all of the vision processing in the car, limiting our cloud data uplink to information about the location of available parking spaces. This illustrates the situation we see frequently today: In most applications, developers have at least some freedom to choose where vision processing will take place.
In some cases, a partitioning of vision processing between edge and cloud may be ideal. Consider Camio, a provider of video monitoring software. Camio puts just one piece of vision processing at the edge: the camera or other local hardware has responsibility for determining whether each video frame is interesting. If a frame is interesting, that frame (and neighboring frames) is sent to the cloud, where more sophisticated algorithms figure out specifically what was interesting about the frame (for example, a person approaching the camera). This is a clever approach. By limiting the extent of analysis required by the camera, Camio makes it possible to keep the camera inexpensive. But by filtering out uninteresting frames, they cut down dramatically on the amount of video data sent to the cloud, and on the amount of cloud computing required to analyze that video. And, of course, having the interesting data in the cloud means that data from multiple cameras can be aggregated.
So edge-vs-cloud looks like a binary decision only at first glance. And there are other, intermediate options. A great example of this is Anki’s Cozmo robot—a cute little social, interactive robot toy. To achieve meaningful social interactivity, Cozmo requires computer vision capabilities such as face recognition. But to meet the size, power, and cost constraints associated with a battery-powered toy, Anki’s engineers could not incorporate all of the required processing power into the robot itself. Their clever solution was to harness the user’s mobile phone as a an off-board processor—in effect, a miniature, free, and nearby type of “cloud compute node,” leveraging the substantial processing power of today’s mobile phones to augment that of the robot itself. The result is impressive.
Earlier I mentioned that development tools, libraries, and frameworks for cloud software development are improving at a much faster pace than their embedded counterparts. For applications where developers have substantial discretion to choose edge or cloud processing, the developer productivity advantages of the cloud will increasingly tilt the balance towards the cloud. On the other hand, for many applications that are sensitive to latency, cost and power, edge processing will still be preferable, or even required. But even in these cases, it may make sense to begin development in the cloud—and possibly even to deploy an initial product using the cloud—in order to speed development. Later, some or all of the processing can be migrated the edge.
Edge-vs-cloud trade-offs are complex. If you want to learn more about these trade-offs from experts, you’ll want to attend the 2019 Embedded Vision Summit, taking place May 20-23 in Santa Clara, California. Over the past six years, the Summit has become the preeminent event for people building products incorporating vision. In 2019, edge-cloud trade-offs will once again be one of the focus areas for the Summit program. Mark your calendar and plan to be there. Registration is now open on the Summit website.
Jeff Bier
Founder, Embedded Vision Alliance
Jeff Bier | Founder, Embedded Vision Alliance
Jeff Bier is the founder of the Embedded Vision Alliance, a partnership of 90+ technology companies that works to enable the widespread use of practical computer vision. The Alliance’s annual conference, the Embedded Vision Summit (May 20-23, 2019 in Santa Clara, California) is the preeminent event where engineers, product designers, and business people gather to make vision-based products a reality.
When not running the Alliance, Jeff is the president of BDTI, an engineering services firm: for over 25 years BDTI has helped hundreds of companies select the right technologies and develop optimized, custom algorithms and software for demanding applications in audio, video, machine learning and computer vision. If you are choosing between processor options for your next design, need a custom algorithm to solve a unique visual perception problem, or need to fit demanding algorithms into a small cost/size/power envelope, BDTI can help.