Don’t miss the latest developments in business and finance.

Computers can now see images

Image
Jack Clark
Last Updated : May 23 2015 | 12:13 AM IST
Artificial intelligence has graduated past the infancy stage of figuring out what's in an image. Computers have previously been capable of little more than a simple game of I Spy: Name a specific object or person, and they'll show you an image containing it. But thanks to new developments in AI research, machines can now answer more complex questions, like, "What is there on the grass, except the person?"

A research paper at Cornell University outlines a system that learns to identify fine-grained visual features of images, and the words associated with them. Then it combines the two into a dictionary in its digital brain. It then references this to answer new questions about never-before-seen images.

The research was conducted by a team comprised of experts from the Chinese Internet search company Baidu and a student at the University of California at Los Angeles, and coincides with similar research from Microsoft, Virginia Tech, and various other academic institutions that came out recently. "Our goal is to enable the computer to connect language with experiences in the physical world," says Wei Xu, a distinguished scientist in Baidu's research group. "This is important for solving the problem of common sense reasoning."

Bloomberg put the Baidu and UCLA system to its own test. I took a picture of a small citrus fruit in the palm of my hand, and sent it to Baidu with the question, "What is in the centre of the hand?" The software answered: "An orange." (It's actually a satsuma, but we'll let it slide.)

This development may sound small, but teaching computers to discern what's inside of images and associate them with language has proved immensely challenging. Such research draws on different disciplines that have only recently started to converge. Advances in this field brings us closer to a day when we may be able to ask a search engine like Google or Baidu to ferret through millions of images, and find only the ones containing a Volkswagen bus with a flat tire, or seven oranges in a bowl.

The development from Baidu and UCLA, while important, is far from perfect. The system can't handle multiple questions in a row, like asking what types of fruit are in a basket, then asking it to count the number of apples. In tests, it gave the correct answer 64.7 per cent of the time, the paper says. People answered the questions with 94.8 per cent accuracy. "In its current stage, the system is not ready for serious applications, as it still makes errors," Xu says.

Creating computers that can look at images and answer specific questions about them "has the distinctive advantage of pushing the frontiers on 'AI-complete' problems," Microsoft says. "Given the recent progress in the community, we believe the time is ripe to take on such an endeavour."

Work done by UCLA and two startups has focused on the analysis of surveillance videos. One day, AI may be able to monitor security camera footage to quickly and automatically discover unmarked vans parked outside of banks for four hours without moving. Baidu is interested in other aspects, too. "In the future, potential applications are education and mobile image search," Xu says. AI might cater lessons to students by, for instance, quizzing them on the types of animals in a photo their parents shot on a weekend trip to the zoo.

With the new research, computers have reached a milestone, not unlike that of many young kids figuring out the world. You can now show a machine a Dr Seuss book, and it can tell you: On the cover of this book is a cat wearing a red and white striped hat.
Bloomberg

More From This Section

First Published: May 23 2015 | 12:13 AM IST

Next Story