Facebook uses computer vision algorithms to caption your photos
Most Facebookusers know by now that when you upload an image with you or one of your friends in it, that the social networking website uses facial recognition algorithmsto suggest who you might “tag” in the image. What some users may not know, is that Facebook also tags photos with data like how many people are in a photo, the setting of a photo, and even whether or not someone is smiling.
In April of 2016, Facebook rolled out automated alternative (alt) text on Facebook for iOS, which provides visually impaired and blind people with a text description of a photo using computer vision algorithms that enable object recognition. Users with a screen reader can access Facebook on an iOS device, and can hear a list of items that may be shown in a photo.
A new Chrome extension now shows what images Facebook has automatically detected in your photos using deep learning. “Show Facebook Computer Vision for Chrome” shows the alt tags that have been added to images that you upload that are populated with keywords representing the content of your images. Facebook, according to the Chrome extension developer Adam Geitgey is labeling your images using a deep convolutional network built by Facebook’s AI Research (FAIR) Team.
"On one hand, this is really great,"said the developer. "It improves accessibility for blind users who depend on screen readers which are only capable of processing text. But I think a lot of internet users don't realize the amount of information that is now routinely extracted from photographs."
I put the extension to the test myself, and searched for an image that might come up with interesting results. The image was from a wedding of myself and three friends, two of which were the bride and groom. The word "wedding" was not in the caption, nor did it appear anywhere in the image. The results:
While the computer vision algorithms may leave out some details or not always produce results that are 100% accurate, in this particular case, the algorithms were correct, as they were in the image above as well, which I took with my phone from a vacation in Maine this past summer. Other companies are doing this, as well, including Google, which developed machine learning software that can automatically produce captions to describe images as they are presented to the user. The software, according to a May Vision Systems Design article, may eventually help visually impaired people understand pictures, provide cues to operators examining faulty parts and automatically tag such parts in machine vision systems.
To be clear, both Facebook and Google indicate that their intentions here are to assist visually impaired or blind people when it comes to understanding pictures. But to play devil’s advocate, perhaps the folks who err on the side of paranoia
Share your vision-related news by contacting James Carroll, Senior Web Editor, Vision Systems Design
To receive news like this in your inbox, click here.
Join our LinkedIn group | Like us on Facebook | Follow us on Twitter
Learn more: search the Vision Systems Design Buyer's Guide for companies, new products, press releases, and videos
caution when it comes to facial recognition and databases are on to something. To those who have seen it, a scene from the 2002 science fiction movie Minority Reportcomes to mind. Check it out here.
About the Author
James Carroll
Former VSD Editor James Carroll joined the team 2013. Carroll covered machine vision and imaging from numerous angles, including application stories, industry news, market updates, and new products. In addition to writing and editing articles, Carroll managed the Innovators Awards program and webcasts.