Microsoft researchers are developing an artificial intelligence (AI) enabled 'drawing bot' that can create images from text descriptions of an object.
Microsoft researchers are developing an artificial intelligence (AI) enabled ‘drawing bot’ that can create images from text descriptions of an object. The technology can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus, Microsoft said in a blog post. Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination, it said. The technology under development in Microsoft’s research labs is programmed to pay close attention to individual words when generating images from caption-like text descriptions, the company said. This deliberate focus produces a nearly three-fold boost in image quality compared to the previous state-of-the-art technique for text-to-image generation, according to results on an industry standard test reported in a research paper posted on arXiv.org. “If you go to Bing and you search for a bird, you get a bird picture.
But here, the pictures are created by the computer, pixel by pixel, from scratch,” said Xiaodong He, a principal researcher at Microsoft’s research lab in Washington. He and colleagues started with technology that automatically writes photo captions – the CaptionBot – and then moved to the one that answers questions humans ask about images, such as the location or attributes of objects, which can be especially helpful for blind people. “Now we want to use the text to generate the image,” said Qiuyuan Huang, a postdoctoral researcher in He’s group. Text-to-image generation technology could find practical applications acting as a sort of sketch assistant to painters and interior designers, or as a tool for voice-activated photo refinement, the researchers said.
At the core of Microsoft’s drawing bot is a technology known as a Generative Adversarial Network, or GAN. The network consists of two machine learning models, one that generates images from text descriptions and another, known as a discriminator, that uses text descriptions to judge the authenticity of generated images.