Deep learning chip enables efficient embedded device processing
Optimized for deep learning workloads, the Hailo-8 processor from Hailo (Tel Aviv, Israel; www.hailo.ai) is smaller than a penny and features a number of efficiency and processing advantages over similar devices available on the market today.
Chief Technology Officer Avi Baum says the product was developed in early February 2017 as a new type of processor designed for deep learning applications.
“Generally speaking, sensing is becoming more and more capable than ever before, and is deployed with low costs and low power, but the compute behind these technologies has not necessarily scaled up accordingly,” he says. “GPUs of course are widely available and are great solution for general processing requirements, but these were not designed for deep learning.”
Hailo’s structure-defined data flow architecture is built with distributed on-chip memory fabric, novel control schemes, efficient interconnect, and a full-stack software toolchain co-designed with the hardware architecture, all relying on the fundamental properties of neural networks, says Baum.
“Our architecture consists of a set of building blocks for compute, memory, and control, and these are physically distributed along the device and allocated during a compilation time to address the different layers of a neural network,” he says.
An accompanying software development kit (SDK) enables integration into existing neural network and deep learning frameworks, such as TensorFlow, Caffe, and ONNX, and provides model translation from such frameworks into Hailo format. On top of debug and analysis tools for profiling and emulating without hardware,the SDK targets the lowering of the entry barrier for developers who onboard the platform and want to reap the benefits of highly efficient neural network inference, suggests Baum.
Compared to other comparable devices available on the market today, the Hailo-8 offers a number of advantages, according to the company. For example, when running the ResNet-50 pre-trained neural network (without pruning) with 8-bit, 224 x 224 resolution, the Hailo-8 processor achieves a frame rate of 780 fps at 2 W of power and 2.9 TOPS/W. TOPS (Tera Operations Per Second) represent a common performance metric used for system on chips, and TOPS per watt (TOPS/W) extends that measurement to describe performance efficiency, with the higher the TOPS/W, the better.
Meanwhile, the Xavier AGX embedded processor from NVIDIA (Santa Clara, CA, USA; www.nvidia.com) achieved a frame rate of 656 fps at 31 W of power and 0.14 TOPS/W. The Hailo-8 processor consumes almost 20 times less power while performing the same tasks,says the company.
“With the growing demand for higher resolution and more capable networks, the typical operation that a vision system would carryout in a neural network requires more than 2 TOPS in real scenarios,” says Baum.
Furthermore, the device is capable of performing up to 26 TOPS. “Hailo’s processor enables edge devices to run applications and operate more effectively at lower costs, allowing devices such as autonomous vehicles, drones, smart home appliances,AR/VR platforms, and wearables to operate more efficiently. In short, Hailo aims to empower a new era of AI computing,” he says.
“For those markets where the need is established and well understood, such as automotive,we are basically trying to lower the system complexity in such scenarios,” he says. “On the other hand, there are endless use cases that could benefit from the deployment of sensors and machine learning processing.”
James Carroll
Former VSD Editor James Carroll joined the team 2013. Carroll covered machine vision and imaging from numerous angles, including application stories, industry news, market updates, and new products. In addition to writing and editing articles, Carroll managed the Innovators Awards program and webcasts.