How to get started with deep learning in machine vision
Mike Fussell
Problems with high variability or subjectivity can be difficult to solve with traditional rules-based machine vision techniques. A seemingly simple problem like grading produce relies on a complex network of interactions between subjective and highly-variable criteria including size, shape, color, and uniformity. By training a neural network with examples of each grade, developers can use deep learning to accomplish such a task.
Verifying the contents of a package to ensure the presence of all required items can be a difficult problem if the shape, color, or form of each item can change from package to package. For instance, the same cable can be folded in different ways and appear different. A neural network can be trained to recognize objects accurately in challenging and subjective situations, such as when the appearance of the same object varies.
Getting started
Classifying images based on content represents an ideal starting point for developers just beginning with deep learning for machine vision. Classification can solve a variety of tasks, including grading objects, subjective quality inspection, checking for the presence or absence of materials in packaging, vehicle driver wakefulness, and age estimation. Instead of building a new model from scratch, start by using a reputable and generalizable model for transfer learning—which involves using supervised learning along with a pretrained neural network.
While building a new network from scratch might sound interesting, it can increase the scope of a project significantly without delivering any real performance advantage. By starting with a known deep learning-based solution, transfer learning can provide a useful head-start. Rather than training an entirely new network, transfer learning retrains a few layers of a pre-trained network to adapt it to a new task. Transfer learning can deliver results just as accurate as a new network built from scratch, while requiring much less training data.
For developers familiar with OpenCV, TensorFlow is a flexible and powerful framework for building and training deep neural networks. The framework provides a way to quickly automate complex tasks with Python code and offers a large and active community with examples, tutorials, and forums available.
Depending on the complexity of the task, NVIDIA’s (Santa Clara, CA, USA; www.nvidia.com) CUDA parallel computing platform and programming model for GPU computing offers the ability to accelerate the training of networks using TensorFlow. While it will speed up training large networks from scratch, a GPU isn’t a necessity for getting started, as most examples and tutorials will still run quickly enough on a standard desktop CPU.
Many deep learning tools are only available for Linux, with Ubuntu 18.04 being the standard Linux distribution. Most current TensorFlow tutorials use this operating system. Using a Docker image (bit.ly/VSD-DCKR) can simplify the installation and configuration of TensorFlow and its dependencies, but be sure to select the correct Docker image, as separate CPU-only and GPU-accelerated versions exist.
Best practices for training and deploying neural networks
When designing a deep learning-based machine vision system, software is only half of the solution. The positioning of the target, the type and direction of lighting, the optics, and the camera must also be considered. Optimizing the physical components of a deep learning-based system can simplify the problem, minimizing the amount of training data and size of the network required to solve the problem. This can translate into faster training during development and higher accuracy and operating speed once deployed.
The training images must resemble the inference images to the best extent possible. When working on projects with small training datasets, small differences in object positioning and lighting can have a big impact on application performance. Providing consistent target positioning and lighting can reduce the variation between images, thus reducing the amount of training data needed. 3D printing offers an ideal solution for quickly building custom mounts to securely hold samples in place with a high degree of precision.
The lighting for the target should help accentuate the differences between classes. Developers should strive to avoid having areas with highlights so bright or shadows so dark that detail in these regions is lost. Many color-related machine vision problems can be solved more effectively by using a monochrome camera with the right combination of colored lighting and filters than by using a color camera. A good quality camera with enough resolution and dynamic range to capture fine detail will capture high-quality deep learning training data and perform well in the field.
Ensuring consistency in image processing between training data and data that will be captured in the field is crucial. For example, differences in the application of anti-aliasing during image resizing can impact network performance (Figure 1). Two images rescaled using different methods may look identical, but will have differences that can result in lower confidence predictions or incorrect classification decisions.
Tutorial and troubleshooting
Many excellent tutorials can serve as starting points for a first deep learning project. FLIR’s how to build a deep learning classification system (bit.ly/VSD-DLR), for example, provides instructions on building an image classifier using transfer learning with TensorFlow. This tutorial also provides instructions about how to deploy a trained model onto a single-board computer using FLIR cameras and the Spinnaker software development kit to solve real problems in the field.
In the same way that designing deep learning-based machine vision systems requires new skills and new ways of thinking about problems, troubleshooting these systems also requires a different approach. When investigating a network that is not performing as well as expected, look for patterns in the incorrect results. Training data can oftentimes be the source of unexpected results and poor performance (Figure 2). While the accuracy of networks can often be improved by expanding the training dataset, a more systematic approach can suggest the dimensions the dataset must be expanded in. Often, unexpected results will be the result of edge cases which are under-represented in the training dataset or incorrectly labeled.
While the accuracy of a network can generally be improved by increasing the size of the dataset, eventually the performance of a network will level off at a point where no further gains in accuracy are possible. To be successful, a solution must work at or below this accuracy limit.
Conclusion
Many deep learning frameworks, network types, and tools exist. Starting with an existing solution to a classification problem and adapting it using transfer learning offers a great first step in deploying deep learning. Another option for getting started in deep learning is with FLIR’s inference cameras. With these cameras, a neural network loads directly onto the camera, making it possible to eliminate the need for a host PC or board to perform deep learning inference on the device.
Mike Fussell is the Product Manager at FLIR Machine Vision (Richmond, BC, Canada; www/flir.com/mv)
This story was originally printed in the November/December 2019 issue of Vision Systems Design magazine.