IMAGE PROCESSING: Multiprocessing system supports FPGAs, GPUs, and CPUs
To achieve the maximum performance in any image-processing system, developers must carefully partition their algorithms to run across a number of different hardware components. While FPGAs provide the optimum performance for pipelined preprocessing tasks such as convolutions, general-purpose CPUs are more effective in performing series of sequential operations where some form of image analysis is required.
Originally designed for 3-D modeling and rendering, GPUs provide an alternative means of image processing, allowing per pixel and texturing operations to be performed rapidly. While each of these methods has its limitations and benefits, balancing an image-processing task between all three architectures can result in a dramatic increase in system performance.
“In a typical image-processing task,” says Dwayne Crawford, product manager with Matrox Imaging (Dorval, QC, Canada; www.matrox.com/imaging), “image-processing operations can be classified as those for preprocessing, image processing, and image analysis.”
Because preprocessing functions such as image formatting, color conversion, and image filtering are typically repetitive operations that require little or no branching of code, they are best implemented in FPGAs. By using FPGAs, system developers can leverage the large I/O bandwidth of these devices to increase the speed of functions while at the same time offloading these tasks from the host CPU.
Of course, these devices are not ideally suited to functions with a higher pixel-to-pixel dependency such as finding and quantifying objects within an image. Since such functions may require branching of code, their implementation is better suited to general-purpose CPUs. Alternatively, functions such as image warping that require massive parallelism but operate with non-branching code are best implemented in GPUs.
To perform image-analysis functions such as geometric pattern matching, the interdependency of features within an object mandates heavy code branching, relegating the task to more conventional CPUs. To achieve the maximum performance from any image-processing or machine-vision system requires balancing each of these approaches. More important, perhaps, it requires systems with carefully partitioned I/O and software that supports every type of processor.
To date, very few companies have developed commercially available systems that offer such functionality. At VISION 2008 in Stuttgart, Germany, however, Matrox previewed the Matrox Supersight system, which enables developers to incorporate all these technologies into a complete industrial imaging system. At the heart of the system is the company’s custom-designed active backplane that incorporates two PCI Express and nine PCI-X slots that allow the system to be configured in a number of different ways.
To support this architecture, Matrox will offer a number of different boards including a double-wide, dual quad-core Intel Xeon x86 accelerator board (XAB) with an x16 PCI Express interface. “In a typical configuration,” says Crawford, “the system uses this XAB card as a host computer, AMD stream processor for GPU acceleration, and up to six Matrox Odyssey Xpro+ boards for both CPU and FPGA processing.” To increase the performance, the company is introducing an upgraded version of the system.
To provide x16 PCI Express Gen 2 interconnects between the available slots, the system uses multiple PCI Express Gen 2 switches. In a similar manner to the original Matrox Supersight design, this switched fabric allows the system to be configured in a number of different ways. “Depending on the application,” says Crawford, “the system could be configured with up to four interconnected XAB boards, resulting in a total of 16 CPU cores. Alternatively, two XAB boards could be configured with two FPGA and four GPU boards or one XAB board could be configured with one FPGA and acquisition board and six GPUs” (see Figs. 1 and 2).
FIGURE 1. The Matrox Supersight system allows computing elements such as frame grabbers, FPGA boards, CPUs, and GPUs to be incorporated in the same backplane as a single system.
To complement this architecture, Matrox will introduce a series of Camera Link frame grabbers and I/O modules for the system. While the Matrox Radient series of frame grabbers will be offered as dual or quad Base and single or dual Full Camera Link boards with an x8 PCI Express interface, they will also feature Altera Stratix III FPGAs. In this way, systems can be configured that support FPGA, CPU, and GPU-based processing.
For system developers who demand even greater performance, the company will also offer an x4 PCI Express short card, the Matrox e2link designed to transfer data bidirectionally between systems at up to 2 Gbytes/s. To support up to four systems in a master-slave configuration, the company will also offer a four-port, x8 PCI Express interconnect two-slot card.
To leverage the architecture of these systems, Matrox’s distributed MIL (DMIL) technology provides a means to access and control MIL image capture and processing functionality across multiple systems. While support of multicore CPUs allows these functions to be seamlessly accelerated, DMIL also allows the functions to be distributed over multiple FPGA boards and GPUs using the existing MIL API.