PCI Express speeds data transfer
Implementing Intel’s PCI Express architecture will allow data-transfer rates to be increased dramatically.
By Andrew Wilson, Editor
Intel’s established PCI parallel bus architecture has matured to meet the increased data rates required to transfer high-speed image data to host-based computer systems. Over the years its initial incarnation-a 32-bit, 33-MHz bus-has grown to accommodate higher data rates, clock frequencies, and more I/O pins. The final version of the bus, specified as PCI-X, is now being used by many frame-grabber vendors to transfer a theoretical 1 Gbyte/s from cameras to host-based CPUs.
For manufacturers of server-based computing systems, the limited bandwidth, scalability, and clock frequencies of the PCI bus and its derivatives presented a problem. To increase the bandwidth, reduce pin count, and lower systems costs, Intel turned to a serial interconnect architecture called PCI Express. From a hardware point of view, PCI Express is unlike PCI in every respect. PCI Express devices communicate with each other over links. In the minimal PCI Express configuration, this link is a single (x1) interface that uses a dual simplex (full duplex) channel to transmit data at 2.5 Gbits/s in each direction.
From a software perspective, PCI Express is compatible with PCI. “PCI Express supports full PCI functionality without the need for changing software drivers, although implementing additional specific PCI Express features does require software modifications to be made,” says Rob Giesen, R&D manager of National Instruments (NI) Vision Group. “This software compatibility is a huge benefit of PCI Express and, together with the cost savings compared to PCI and PCI-X, the reason for the relatively fast market adoption.”
Electrically, each lane comprises two LVDS pairs. Theoretically, this would result in a data rate of 312.5 Mbytes/s/lane. However, to easily achieve bit synchronization and improve error detection, 8-bit data bytes are encoded as 10-bit transmission characters using an 8B/10B encoding scheme. This results in 20% less data bandwidth or 250 Mbytes/s/lane in each direction. Since more than image data is transmitted across the two pairs, this overhead may be closer to 25%.
Increased bandwidth
To increase the bandwidth of devices, multiple lanes can be grouped into different configurations of x1 (“by-one”), x2, x4, x8, x12, x16, and x32 lane widths. For the x32 implementation, this results in a maximum data rate of 16 Gbytes/s. Currently, however, most motherboards commonly support x1, x4, x8, and x16 implementations. Comparing the bandwidth attainable using x1, x2, x4, x8, and x16 PCI Express implementations with PCI (64 bit/66 MHz) and PCI-X (64 bit/133 MHz) is interesting (see Fig. 1). The 533 Mbytes/s of the PCI-64 implementation is roughly comparable to that of a x2 PCI Express implementation. Similarly, the 1-Gbyte/s data rate of PCI-X is only matched by the performance of the x4 PCI Express.
FIGURE 1. Comparing the bandwidth attainable using x1, x2, x4, x8, and x16 PCI Express implementations with PCI (64 bit/66 MHz) and PCI-X (64 bit/133 MHz) shows that the 533 Mbytes/s of the PCI-64 implementation is roughly comparable to that of a x2 PCI Express implementation (top). Similarly, the 1-Gbyte/s data rate of PCI-X is only matched by the performance of the x4 PCI Express (bottom).
With these data, it can be seen why, until now, products incorporating the PCI Express standard have been slow to reach the market and why, perhaps, system integrators are not demanding such interfaces from their suppliers. PCI Express, however, does have many more advantages than the PCI’s parallel bus architecture. For example, unlike PCI, where all devices on the bus share a limited bandwidth, each PCI Express lane is given a dedicated bandwidth.
To implement a PCI Express-based frame-grabber board, designers must interface existing logic to the PCI Express bus. This entails formatting data in a correct fashion so that the information can be properly transmitted over the PCI Express interface. This is accomplished in three stages using a transaction layer, a data link layer, and finally the physical interface (see Fig. 2).
To properly format data, the transaction layer requests unique packets from the software layer using 32- or 64-bit addressing. Once accomplished, the data link layer performs a cyclic redundancy check and generates a packet sequence number before packets are framed and encoded in 10B/8B format pending physical transmission over a SerDes interface.
One interesting feature of PCI Express is the way that bus bandwidth can be allocated to specific devices. In a machine-vision system, for example, it may be important to specify that data from a frame grabber be transmitted over the PCI Express interface with medium priority, while error signals resulting from image analysis are given a higher priority. This is done using traffic classes (TCs) and virtual channels (VCs). The TCs can be assigned individual priorities (from one to eight) to specify the level of importance of the data. These are then mapped into a VC that is assigned a specific bandwidth.
FIGURE 3. While the NI PCIe-1429 (top) can support the Base, Medium, and Full Camera Link standard, the Leutron PicPort-Express-CL (bottom) can support two Base or one Medium Camera Link cameras.
In their next generation of PCI Express-based frame grabbers, designers can use PCI-X-to-PCI Express bridge devices or implement the interface in an FPGA. Vendors such as BitFlow, Leutron, and NI have opted for the former (see Fig. 3). Both NI’s PCIe-1429 and Leutron’s PicPort-Express-CL are Camera Link frame grabbers based on a x4 implementation of the PCI Express standard. While the PCIe-1429 can support the Base, Medium, and Full Camera Link standard, the PicPort-Express-CL can support two Base or one Medium Camera Link cameras. BitFlow’s R64eCL is a Base, Dual Base, Medium, and Full Camera Link frame grabber that is offered in a x8 configuration.
Using bridges
FIGURE 4. Intel 41210 Serial connects parallel bus PCI and PCI-X technology-based systems to the PCI Express serial I/O architecture. Configured with a x4 or x8 lane upstream port connection to host PCI Express slots, the device features two 133 MHz PCI-X bus segments downstream for attaching legacy PCI and/or PCI-X devices.
Bitflow, Leutron, and NI have implemented the PCI Express interface with Intel’s 41210 serial-to-parallel PCI bridge chip (see Fig. 4). While providing an easy way for existing PCI and PCI-X designers to re-engineer existing frame grabbers to PCI Express, the device has several limitations. Because the device is a bridge, it can only transfer data as fast as the information can be supplied from the dual 133-MHz PCI-X interfaces. While such designs may find favor with designers building dual Gigabit fiber-channel interfaces, the addition of two PCI-X interfaces may be overkill. Such bridge devices also require additional logic to support frame-grabber designs. In BitFlow, Leutron, and NI implementations, for example, FPGAs interface the image capture and control logic with the 41210.
“This, of course,” says Joseph Sgro, president of Alacron, “depends on one’s functional definition of a frame grabber. Combining image-capture logic and image-data archiving in one device can eliminate the burdensome and unnecessary data traffic between frame grabber and a JBOD/RAID disk controller from the host bus.”
Philip Colet, vice president of sales and marketing at Dalsa Coreco, is not enamored with Intel’s offering. “It is a bridge chip for a x4 implementation and one that we have tested extensively,” he says. “Our results were extremely disappointing (since there were severe compatibility issues with host CPUs), and we elected to adopt a different approach.” What that approach would be, Colet would not say. But Dalsa Coreco will soon be introducing a line of PCI Express boards based on its X64-CL iPro product line. According to Colet, this product will feature two independent Base Camera Link inputs and on-board image preprocessing such as Bayer filter conversion.
The problem, however, may not lie entirely with the Intel 41210. BitFlow’s R64eCL is a Base, Dual Base, Medium, and Full Camera Link frame grabber that also uses the Intel bridge chip to transfer data to the PCI Express in a x8 configuration. According to Bill Carson, vice president of sales and marketing, the problem lies not with the device, but in the design of PCI Express motherboards. “On some motherboards,” he says, “inserting a x8 frame grabber into a x16 slot can result in the system recognizing the board by default as a x1 card.” This, of course drastically reduces system performance.
“To enable scalability from low-cost to high-performance motherboards, the PCI Express specification does not require the motherboard to provide full bandwidth support to x4 or x8 boards in a x16 slot,” says NI’s Giesen. “It is the motherboard manufacturer’s decision to make this trade-off between cost and range of support. Therefore, independent of the design of the frame grabber, this always can be an issue depending on the motherboard used.” Apparently, NI has tested the 41210-based NI PCIe-1249 extensively and found no motherboard compatibility issues that were not resolved by an updated BIOS from the motherboard manufacturer.
Despite the criticism leveled at Intel’s 41210, it still offers the lowest-cost PCI Express available. For this reason, Alacron will use the device it its next generation of PCI Express-based frame-grabber/image-processor boards. According to Alacron’s Sgro, the first product will be capable of digitizing images from Camera Link-compatible cameras. “While an FPGA will be used to control camera functions,” says Sgro, “the board will use the S500 processor from Stretch to attain 12-GFLOP performance.” Optionally, the board will allow developers to add a field-programmable object array from MathStar to the board, adding 400-GOPs, 16-bit processing.
FPGA solutions
Many developers are looking to implement native-mode PCI Express on their boards using a single gate array. In this way, the logic associated with bridging is eliminated, while additional logic for image capture, control, and analysis can be incorporated into the same device.
American Eltec, the first company to incorporate such a device, has introduced a frame grabber with four simultaneous asynchronous input channels and a x1 PCI Express bus interface (see Fig. 5). The company’s PC_EYE/ASYNC board allows four independent (nonsynchronized) camera signals to be digitized at 25 MHz/8 bits/channel.
The design of the board is based on two programmable FPGAs. A Stratix GX FPGA from Altera contains the PCI Express interface while a second FPGA from Xilinx contains FIFOs, sequencers, and sync generators for frame-grabber control. Because the frame grabber supports the x1 - 1 lane interface, it is about twice as fast as the initial 32-bit/33-MHz PCI.
To support these developments, companies such as Altera offer core-based PCI Express products. Perhaps the most popular is the PCI Express core from PLD Applications (PLDA). Its PCI Express Core is an example of how designers have overcome the limitations and power consumption associated with bridging devices (see Fig. 6).
FIGURE 6. Companies such as Altera offer core-based PCI Express products such as the PCI Express core from PLD Applications (top). The Altera PCI Express Core is a good example of how designers such as American Eltec have overcome the limitations and power consumption associated with bridging devices (bottom). The Altera Stratix GX FPGA device is also available as a development board for prototyping purposes.
In the design of the core, PLDA offers developers a way to interface on-board programmable logic directly with frame grabber circuitry in x1, x4, and x8 configurations. The reference design shows how the core can be implemented as an endpoint with a Master/Target Application Layer to interface to host bus cores, although, according to the company, it can also used in root complex designs that connect the processor and memory subsystem to the PCI Express or bridge designs.
By supplying the register transfer level source code of the design, designers can simulate and evaluate the PCI Express Core, modify the design, and perform hardware testing using a prototyping board. According to Chuck Petersen, president of Epix, this type of PLDA development board is being used by his company to ready a version of a low-cost PCI Express frame grabber board that should be available before the end of the year.
“Both bridge and core-based approaches have their pros and cons,” says Giesen. “Although we initially faced some challenges with Intel’s 41210, currently its only drawback is its “overkill” for vision applications. On the other hand, Intel’s bridge technology has been available for over a year and has been extensively tested while PCI Express-compliant FPGA cores have only recently become available.”
Many companies are now offering bridge devices that allow PCI-X, 1394a, and Gigabit Ethernet controllers to the PCI Express interface (www.pcisig.com/developers/compliance_program/integrators_list/pcie). While many of these are offered as IP cores, others use these cores and off-the-shelf ICs to build graphics boards, disk controllers, and network interface cards. Although the benefits of the PCI Express architecture will not immediately be felt by those in the industry, its advantages will solidify its position as the next-generation architecture for years to come.
Company Info
Alacron
Nashua, NH, USA
www.alacron.com
American Eltec
Las Vegas, NV, USA
www.americaneltec.com
BitFlow
Woburn, MA, USA
www.bitflow.com
Dalsa Coreco
St.-Laurent, QC, Canada
www.imaging.com
Epix
Chicago, IL, USA
www.epixinc.com
Leutron
Glattbrugg, Switzerland
www.leutron.com
MathStar
Minneapolis, MN, USA
www.mathstar.com
National Instruments
Austin, TX, USA
www.ni.com
PLD Applications
Aix-en-Provence, France
www.plda.com
Stretch
Mountain View, CA, USA
www.stretchinc.com