Support vector machines speed pattern recognition
Numerous image-processing and machine-vision libraries are available that use search algorithms. Despite this, many of these software packages cannot recognize objects that are, by nature, subject to deformation or naturally occurring variations. Examples include handwritten, needled, or embossed characters.
To accomplish these recognition tasks in the past, systems integrators turned to neural-network-based systems. By training such systems against large numbers of known good and bad images, neural networks mimic the brain by using a network of interconnected, processing elements that act as learning mechanisms. This training enables neural networks to recognize-and to understand-subtle or complex patterns, such as distinguishing between dogs and cats, apples and pears, and male and female faces.
Building on these developments, Manto object-classification software from Stemmer Imaging (Puchheim, Germany; www.stemmer-imaging.de) uses a relatively new mathematical technique called Support Vector Machines (SVMs). Invented by Vladimir Vapnik, currently a professor of computer science at the University of London (Royal Holloway, University of London, Egham, UK; www.clrc.rhul.ac.uk), SVMs have already been successfully used in applications ranging from handwriting recognition to facial recognition to surface inspection.
The principle of the SVM technique is relatively simple. First, a set of known data is captured. In a machine-vision system, for example, this could be images of three different types of wood. Once the training data have been captured, the information is nonlinearly mapped into a higher-dimensional feature space. Then, a separating hyperplane is constructed mathematically that yields a nonlinear decision boundary in input space. Training data are then separated by this nonlinear plane such that any new additional data (such as a new image) will fall into a specific classification. Interestingly, by using a kernel function, this separating hyperplane can be computed without explicitly mapping the data into that feature space.
"To generate the vectors needed to place individual image data in vector space," says Volker Gimple of Stemmer Imaging, "a sequence series of multiresolution filters is applied to the image using the same principles used in wavelet transforms. In the latest version of Manto, three such filters are used. The first is a reduction filter that suppresses insignificant data such as noise. The second is a filter, similar to a Sobel operator, designed to enhance the geometric information inherent in the image. Last, a filter can be applied to suppress the nonrotation invariant information (or asymmetry) within the image.
"If the object is rotation-invariant, has clear geometric features (such as a torus or disk), and has small scale variations that are insignificant (such as noise)," says Gimple, "then all three filters would be applied to the image." In applications such as optical character recognition, however, the object may not be rotationally invariant, so there is no need to suppress the asymmetry within the image. In such cases, it may only be necessary to reduce the noise in the image and enhance geometric features.
"After filters are applied to images, SVMs expand the information as a linear combination of a subset of the data. These support vectors locate hyper surfaces that separate the data into different classes. The SVM then tries to find the separation of the multidimensional spaces between the classes.
According to Martin Kersting, vice president of engineering at Stemmer, one of Manto's particular strengths is recognizing handwriting, the most difficult form of character recognition. During a test using the MNIST database of handwritten digits, which is available on the Web at yann.lecu.com/exdb/mnist/index.html, Manto needed 11 minutes to learn the characteristic features of 60,000 handwritten training samples. After that, the software classified 10,000 digits of a test set not included in the training samples with a recognition rate of 99.35%. The processing speed was 1 ms/character using a system based on a 500-MHz AMD Athlon processor. At present, Manto is also being tested for use in a system for car-license-plate recognition (see figure on p. 7).
More details on SVM theory and applications can be found inIEEE Intelligent Systems (July/August 1998; guppy.mpe.nus.edu.sg/~mspessk/svm/x4018.pdf). Better still, Thorsten Joachims of the Department of Computer Science at Cornell University (Ithaca, NY, USA; www.cornell.edu) has already implemented a C version of Vapnik's SVM for pattern recognition. Available at svmlight.joachims.org, the SVMlight program compiles on SunOS 3.1.4, Solaris 2.7, Linux, IRIX, Windows NT, and Powermac and is free for scientific use.