For many automation tasks such as bin picking in factories or case picking in warehouses, it is important for vision-guided robots to accurately locate objects in 3D space. But most picking robots are still quite limited in the types of objects and object arrangements they can work with. Amazon's automated fulfillment centers, for example, employ robots that bring racks of goods to a human picker for item selection. However, Amazon has yet to automate object picking from these racks.
To encourage development of such fully-automated picking solutions, the company created the Amazon Picking Challenge (http://amazonpickingchallenge.org). In 2015, the team from MIT that took home second-place in this competition used an early prototype of CapSen Robotics' (Pittsburgh, PA, USA;www.capsenrobotics.com) image detection software that recognizes the poses of multiple objects in a cluttered scene simultaneously . Even if objects are partially occluded by other objects in the scene, the software can still detect them, so long as there is enough of the occluded objects visible to uniquely identify them.
"Over the past few years," says Dr. Jared Glover, Co-Founder of CapSen Robotics, "the field of robotic computer vision has undergone a 3D revolution. One of the biggest challenges in dealing with 3D geometry lies in appropriately handling 3D rotational data. To specify where an object is in space, one must provide both a position and an orientation for the object. Noise and ambiguity in the robot's sensory data necessitate developing robust models to accomplish this task."
"There are many ways to specify a 3-D orientation mathematically," says Glover, "as a set of three Euler angles, as a 3 x 3 rotation matrix, as an axis of rotation and an angle or as a unit quaternion." First described by Irish mathematician William Rowan Hamilton in 1843, these unit quaternions combine the axis and angle representation into a single 4-D unit vector.
Once represented in such a manner, the CapSen Detection software uses a probability model known as the quaternion Bingham distribution to perform robust information processing on noisy orientation information, such as from surface normals or edge orientations.
"The Bingham distribution is perfectly suited to modeling uncertainty (large or small) on 3D rotations when they are represented as unit quaternions," says Glover, "and can be used to help locate specific objects in cluttered 3D images. We use the quaternion Bingham (QBingham) distribution throughout our software suite for a variety of tasks-from 3D alignment of corresponding point sets to encoding prior information about object orientation in a scene."
Because of the statistical geometry tools used, the software can detect objects in any orientation, with any shape, and with or without unique visual patterns such as logos or words on consumer products which many previous systems relied upon for cluttered object detection. "This makes it possible to more easily automate factory or warehouse tasks like bin picking or case picking with less special-purpose equipment like bowl-feeders or shake tables which are currently used to separate objects," he says.
The main obstacle that remains to providing a complete vision solution for the Amazon picking problem is that each warehouse contains several million different types of objects of widely varying sizes, shapes and materials. Furthermore, the picking robots will need a complete physical description of objects from the vision system-including not only object poses, but also shapes/segmentations (for deformable objects), and relationships between objects. "Although progress has been made in these areas, there is still a great deal of work to be done," concludes Glover.