Deep learning system powers traffic enforcement system
Creating a device on which a deep learning system can with a minimum 90% accuracy determine whether a driver wears a seat belt or uses their cell phone while driving is a challenge. Developing a system that can capture clear images from cars moving up to 80 kmph, on roads up to three lanes across, in all varieties of environmental conditions, increases the difficulty.
For Hazen.ai (Makkah al-Mukarramah, Saudi Arabia; www.hazen.ai), a developer of artificial intelligence (AI) systems for road safety applications, flexible convolutional neural networks (CNN) and a robust observability and monitoring stack, supported in part by a close working relationship with local law enforcement, provide the keys to success.
The Mobile Phone and Seatbelt Detection device, designed for traffic enforcement authorities, mounts on either a pole on the side of the road or a gantry above the road. Gantry installs generate less occlusion but are more expensive. Roadside installs are more challenging from a video analytics perspective but are cheaper and more convenient.
Hazen.ai works with clients to provide guidance on site selection and hardware placement for the device. Khurram Amin, Chief Technical Officer, notes that the optimal placement is usually very clear.
Related: What is deep learning and how do I deploy it in imaging?
The device captures images of the driver as their car enters the camera’s field of view. A CNN running on the device then analyzes the image and determines whether the driver has committed a violation by not wearing a seatbelt or by using a cell phone while behind the wheel.
The system generates either a yes or no classification for seat belt use. For cell phone use, the system provides more complex results. The CNN recognizes the difference between a phone used for talking or a phone used for texting, based on the position in which the driver holds the phone. The model also classifies a third category of image, indeterminate, indicating an image which is so blurry that the model cannot classify with either a yes or no label with confidence.
Each image classified as a violation stores in memory and transmits to a back-end system at intervals determined by the client (e.g. law enforcement agency). Images that do not capture a violation are deleted from the device. The client can then review the images and issue citations to drivers appropriately, using the vehicle’s license plate number to identify the driver.
When development of the Mobile Phone and Seatbelt Detection device began, there were three challenges to overcome. First, the device required image quality high enough to reliably allow a consensus of human reviewers to agree as to whether or not a driver wore a seat belt or used a smartphone while driving.
The engineers experimented with a selection of Sony (Brooklands, Surrey, UK; www.image-sensing-solutions.eu) sensors including the IMX267, and cameras including the XNO-6120R by Hanwha Techwin (Changwon, South Korea; www.hanwha-security.com), the IBP831-1ER from Pelco (Fresno, CA, USA; www.pelco.com), and DS-2DE54321W-AE from Hikvision (Hangzhou, China; us.hikvision.com), before choosing the Axis Communications (Lund, Sweden; www.axis.com/en-us) P1375-E Network Camera as the preferred camera used in internal testing.
According to Amin, the P1375-E was chosen for its optimal price point as well as for image quality, as the company sought a camera it could recommend to clients. Hazen.ai is primarily a software company, however, Amin stresses, and therefore supports as many different cameras as it can to provide clients with freedom of choice.
Wanting to build an edge system that didn’t require constant communication with a network presented the second challenge during the design phase. This meant making sure the device had an appropriate amount of computer power and data storage.
Two primary chipset options exist: embedded devices from NVIDIA (Santa Clara, CA, USA; www.nvidia.com) equipped with ARM processors and embedded GPUs, such as the Xavier NX and Jetson TX2, or industrial PCs with at least an Intel (Santa Clara, CA, USA; www.intel.com) i9700K or higher CPU with 8GB of DDR4 RAM. The company recommends a PCIe-based Intel Neural Compute Stick to further enhance industrial PC performance.
Each image of a violation usually requires 5 MB of data, says Amin. Internal testing revealed that less than 1% of images captured resulted in a violation. If 100,000 cars drove past a Mobile Phone and Seatbelt Detection device, that might result in 1000 violations, requiring only 5 GB of storage, a paltry amount for a solid state drive (SSD).
Furthermore, when the device transmits its data to the back end, that data no longer stores on the device, constantly refreshing the amount of storage space. A proprietary API allows the device to interact with any back ends for violations processing maintained by a law enforcement agency.
The difficulty of the third challenge increased because the device ran on the edge. Indications of errors, such as a sudden rise in the percentage of images that resulted in violations, or an absence of violations over time periods that normally generated a healthy number of violations, would be clear. The errors would go unnoticed until the next data transmission, however, resulting in potentially long periods in which the device did not properly function. Hazen.ai has developed a robust, internal observability and monitoring stack to continuously monitor the device and generate automatic alerts in case any such anomalies occur.
The company developed its base CNN, the core prediction model for the device, for almost two years, using tens of thousands of training images. Cooperation with local authorities determined the best locations for camera placements to gather the images. A partner data company annotated the ground truth classifications to create the training dataset and verified the accuracy of the CNN results during internal testing.
High level frameworks like Caffe (caffe.berkeleyvision.org), TensorFlow (www.tensorflow.org), and PyTorch (www.pytorch.org) provide the initial scaffolds for CNN design and training. The model is then coded in C, C++, or CUDA, depending on the specific hardware used, to extract as much compute power as possible. Further optimization for individual chipsets also takes place.
Related: Deep learning helps detect distracted driving behavior
Hazen.ai conducts field tests to compare the CNN classifications against ground truth. If required, the CNN at each site can be fine-tuned to cater to the unique conditions of a deployment and ensure consistent accuracy greater than 90%. The angle of the sun shining into the camera, glare, shadows cast by buildings, or common weather conditions like consistent, heavy rainfall all may warrant fine-tuning.
If necessary, the company can complete the procedure in one to two days or with around 1000 violation images, according to Amin. Extra lights for additional illumination or shutter speed alterations can also provide needed corrections for accurate results.
A robust monitoring system constantly operates at each deployment site: the traffic enforcement officers issuing the citations based on the CNN classifications. In most jurisdictions, by law, officers must review an image captured by the Mobile Phone and Seatbelt Detection device before issuing a citation based on that image. False positives therefore always register and provide a metric for the model’s prediction accuracy.
False negatives are more difficult to detect, however. Only positive results transmit to the back end, with all other data deleted. A pipeline currently in development will allow law enforcement officers to review images and assign labels and thereby generate images used to fine tune the model at specific deployments.
The device has been field tested at between 5 and 10 different sites, according to Amin, and has one live deployment for a client in Egypt, which began in July. Accuracy for seat belt violations and cell phone usage violations at that commercial deployment currently stands at 95% and 92% respectively.
Each contract with a client contains stipulations as to whether or not Hazen.ai may use data generated at the site. If the client allows use of the data, the tweaks to the model at that site can incorporate into the core CNN, resulting in a more robust baseline model.
About the Author
Dennis Scimeca
Dennis Scimeca is a veteran technology journalist with expertise in interactive entertainment and virtual reality. At Vision Systems Design, Dennis covered machine vision and image processing with an eye toward leading-edge technologies and practical applications for making a better world. Currently, he is the senior editor for technology at IndustryWeek, a partner publication to Vision Systems Design.