Embedded deep learning system automates retail payment terminals
Embedded Machine Learning Automates Retail Payment
With increased competition from E-Commerce, retail stores of all types must find ways to stay competitive. To that end, a team of embedded vision and deep learning companies partnered together and developed an automate retail payment and inventory management system that offers instant checkouts, shorter lines, and allows stores to stay open 24/7.
Current self-checkout methods may use barcode reading to detect and identify products, while some recent systems may use object classification methods using features like color or type to identify product features. Both methods, however, are not robust when deployed in uncontrolled environments. Using artificial intelligence (AI) can more accurately accomplish this goal by detecting products without barcodes and allow for the product portfolio to scale up more easily over time.
Related: MIPI CSI-2 interface provides flexibility for next-generation embedded vision systems
For edge AI and computer vision software company Irida Labs (Patras, Greece; www.iridalabs.com), collaborations such as this materialize through partnerships with well-established hardware companies with similar goals. In this example, according to Demetris Anastassiou, Product Marketing Specialist, Senior Computer Vision Engineer at Irida Labs, creating a low-cost, low-power, scalable system represented the main goal (Figure 1).
“The main objective and most interesting aspect of the application is the fact that it uses low-power devices to perform deep learning tasks in real time, with real time referring to five frames per second, in this case,” says Anastassiou.
A prototype of the system made its debut at the Hannover Messe show. A customer places items of food or beverages onto a tray and makes their way to a checkout terminal and puts the tray under the smart retail system (Figure 2). A Basler (Ahrensburg, Germany; www.baslerweb.com) dart BCON MIPI CSI-2 camera with S-Mount lens and an AR1335 CMOS image sensor from ON Semiconductor (Phoenix, AZ, USA; www.onsemi.com) captures a video stream of the tray and its contents. This color camera captures 13 MPixel images at up to 30 fps, measures 29 x 29 mm with CS-Mount, and weighs just 15 g.
Irida Lab’s convolutional neural network-based software EV Lib identifies, classifies, and displays the price for the customer. The software and camera run on an SMX8 SMARC 2.0 Computer on Module from Congatec (Deggendorf, Germany; www.congatec.com), which is based on an i.MX 8M applications processor from NXP Semiconductors (Eindhoven, Netherlands; www.nxp.com) with ARM Cortex A72, Cortex-A53, and Cortex-M4 processors. In the prototype system, a constant LED light provides illumination, but similar client solutions have relied on ambient and on-premises lighting fixtures. Most important, explains Anastassiou, is providing consistent lighting conditions and limiting wide variability.
In addition, an alternate version of the system that aims for speed and performance was developed in collaboration with Mediatek (Hsinchu, Taiwan; www.mediatek.com). The system leverages the company’s i500 (MT8385) Artificial Intelligence of Things (AIoT) processing device, which offers high-performance neural network acceleration and a dedicated artificial intelligence processor to meet faster processing and checkout client demands, according to Anastasiou.
Irida Labs takes a flexible, case-by-case approach when deploying its deep learning software. The company uses well-known frameworks for developing machine learning such as TensorFlow and PyTorch, while also using networks like YOLO and MobileNet. However, the company has a proprietary framework for machine learning and inference and the choice of which option to deploy depends on individual application needs, according to Anastassiou.
“In this specific example, we aren’t dealing with something as complex as the Amazon Go store, so we narrowed the scope to the requirements of the individual deployments. For example, the prototype shown at Hannover Messe and Embedded World is used in the real world in the catering industry in Germany and the Netherlands, so we train the system based on the inventory at these individual locations,” he says.
Related: Embedded deep learning creates new possibilities across disparate industries
For this system, development, optimization and training took a few months and relied on approximately 10 images of each product in inventory, at most.
“The main difference in vast machine learning models and models designed for specific applications is that the latter do not involve a massive catalog,” says Anastassiou. “One does not need to distinguish, let’s say, an apple from a cat. It does not require a million images to ensure that the apple is different from a cat because a cat will never appear on the tray.”
Irida Labs secret sauce, according to Anastassiou, is that it aims to solve real world problems instead of casting too wide a net.
“Today it is quite easy for people to take a deep learning model provided by a university or a tech giant and try to solve a specific task, such as detecting people crossing a street, only to be disappointed by the lack of robustness” he says. “The reason is that each vision application has a very specific requirement and a wide variety of underlying hardware. Understanding that and having the engineering know-how makes the difference in real-world problems”.
“Plus”, he adds, “environmental conditions will always be different. It makes no sense to try to deploy a system trained to identify people walking in the Sahara Desert, where it is always sunny and very bright, and then use that exact same system in Norway.”
Additionally, all processing for the system takes place on the edge, removing the need for the cloud.
“Deploying a vision sensor system like this in the cloud does not make sense due to runtime data volume, and anyway in most cases everything should be local so that data does not leave the premises due to GDPR issues,” says Anastassiou. “The system was designed in a way that it can be used and updated as easily as possible, which is why we use a small processing device and a small MIPI camera. The whole system is a single, unified, versatile and extensible AIoT sensor that generates application-specific metadata only.”
He continues, “This system works in very confined, small areas, which makes it possible to install into appliances in the catering industry. But other applications of the system could include gas stations, mini markets, or general stores with low inventory count and a controllable catalog.”
About the Author
James Carroll
Former VSD Editor James Carroll joined the team 2013. Carroll covered machine vision and imaging from numerous angles, including application stories, industry news, market updates, and new products. In addition to writing and editing articles, Carroll managed the Innovators Awards program and webcasts.