Business Standard

This Microsoft bot can sketch an image from caption-like descriptions

The core of this bot is a technology known as a 'Generative Adversarial Network' or GAN

Image

IANS San Francisco

Microsoft is developing a bot that can draw what you want it to by leveraging Artificial Intelligence (AI) technology -- programmed to pay close attention to individual words when generating images from caption-like text descriptions.

The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes -- such as grazing livestock -- to the absurd and a floating double-decker bus.

Each image contains details that are absent from the text descriptions, indicating that this AI contains an artificial imagination.

"If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch. These birds may not exist in the real world -- they are just an aspect of our computer's imagination of birds," Xiaodong He from Microsoft's research lab in a blog post late on Thursday.

 

According to results on an industry standard test, reported in a research paper posted on arXiv.org, the bot produced a nearly three-fold boost in image quality compared to the previous state-of-the-art technique for text-to-image generation.

The core of this bot is a technology known as a "Generative Adversarial Network" or GAN.

The network consists of two Machine Learning models -- one that generates images from text descriptions and another, known as a discriminator, that uses text descriptions to judge the authenticity of generated images.

The researchers said that text-to-image generation technology could find practical applications acting as a sort of sketch assistant to painters and interior designers or as a tool for voice-activated photo refinement.

For now, the technology is imperfect.

"For AI and humans to live in the same world, they have to have a way to interact with each other. The language and vision are the two most important modalities for humans and machines to interact with each other," The blog post explained.

 

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Jan 19 2018 | 12:54 PM IST

Explore News