Machine vision software: OCR software helps digitize moon lander data
As one of the few places engaged in solar system studies at its founding, the University of Arizona’s Lunar and Planetary Laboratory (LPL) is home to the Space Imagery Center, a NASA Regional Planetary Image Facility. In 2015, NASA partnered with the University of Arizona to digitize more than 90,000 film images and data stored from 1960s-era Surveyor moon landers.
The goal? Create an archive for inclusion in the NASA Planetary Data System (PDS), a collection of data products from NASA planetary missions. Says John Anderson, senior media technician at LPL, “my responsibility is the digital recording of the images, extracting and decoding of encoded image data optically recorded on each film frame, and processing the pictures for viewing in a digital format.”
Surveyor missions returned over 92,000 individual images of the moon’s surface between 1966 and 1968, recorded by focusing a 70-mm film camera at a display. The files and video tape records of the missions have long since disappeared or become obsolete, with only the film rolls remaining.
Because many frames had human readable text that could not easily be read by conventional optical character recognition (OCR) software, the original scanning process demanded intense operator involvement and manual error checking.
After Lorne Trottier, co-owner of Matrox Imaging (Dorval, QC, Canada;www.matrox.com/imaging) saw an article in Planetary Report about the NASA PDS project, he reached out to the university through Arnaud Lina, Matrox Imaging’s director of research and innovation offering assistance using Matrox’s OCR software to read LPL’s text information.
Mission data fields to be read; note the dot-matrix nature of the text.
“LPL selected some cropped images to upload for a test and the results were amazing. It was very encouraging, especially with the failure of other OCR products to read the human readable text (HRT),” says Anderson.
The overall project involves creating a searchable archive that will outlast conventional physical media repositories. Given the possible long-term reference potential of the images and data, there is need for careful and accurate treatment of the resources. The workflow comprised an image scanning system from Stokes Imaging (Austin, TX, USA;www.stokesimaging.com) capable of capturing between four and eight frames per minute as high-resolution TIFF images.
Typical film image from Surveyor mission, with a CRT display (left) and associated data fields (right).
However, during scanning, the film itself was not uniform in spacing, indexing, exposure, or processing. Once scanned, Photoshop from Adobe Systems Inc. (San Jose, CA, USA;www.adobe.com) and MATLAB from The MathWorks (Natick, MA, USA; www.mathworks.com) were used to pick out the details and create large composite mosaics from the image files.
The project began in February 2015 with the assembly of the Stokes Scanner, and continues to process, catalog, and data-mine the information contained within the images. Inconsistent frame spacing as well as frames drifting with respect to the edge perforations challenged the LPL team’s ability to consistently advance the film. With each new roll of film, the spacing of the frames and lateral positioning of the image shifted resulting in overall images with text in different places, and some images tainted with artifacts. Moreover, the data fields have HRT with varying number of characters.
Nonetheless, Matrox’s solution—based on one of its efficient and accurate OCR software tools—addressed the problem of reading dot matrix characters, and reduced the time expenditure to a few minutes per roll. The initial review of the Matrox OCR solution showed an almost perfect read from nearly 4,500 different image files. For example, for roll 1 of Mission 5, the Matrox OCR solution scanned 846 files, reading 15,191 individual fields for a staggering 99.77% accuracy. Rolls 2 and 9 of Mission 5, were even better, yielding respective 99.92% and 100% accuracy rates.
Anderson notes, “Compared with accuracy rates of 75% to 85% achieved with the original approach, there is no doubt as to the better result. Our project has been greatly enhanced and the progress of reading and cataloging the data with high accuracy would not have been possible without the gracious assistance of the Matrox team.”