Press "Enter" to skip to content

An AI Learns To Envisage A Scene From Merely One Image

DeepMind, the Google subsidiary, revealed a new sort of computer vision algorithm that can produce 3D models of a view from 2D snapshots: the GQN (Generative Query Network).

The GQN can “envisage” and provide pictures from any angle devoid of any human training or direction. Provided merely a handful of images of a landscape—a wallpapered space with a colored globe on the floor, for instance—the algorithm can provide differing, unnoticed views of objects and produce a 3D view from several vantage spots, even elucidating stuff such as lighting in shadows.

It intends to imitate the approach the human brain studies regarding its environs and the physical relations between objects, and remove the requirement for AI researchers to interpret pictures in datasets. A majority of visual recognition systems need a human to make every side of each object in every scene in a dataset, a painstaking and expensive method.

The 2-part system is composed of a generation network and a representation network. The latter receives input data and transforms it into a vector (a mathematical representation) depicting the scene, whereas the former pictures the scene.

To teach the system, the GQN was fed by the DeepMind researchers with pictures of scenes from diverse angles that it utilized to tutor itself regarding the colors, lighting, and textures of objects autonomously of one another as well as the spatial associations between them. Then it estimated what those objects would appear like from behind or off to the side.

The GQN, making use of its spatial understanding, can handle the objects (utilizing a virtual robot arm, for instance, to lift up a ball). Furthermore, it self-corrects as it budges around the scene, fine-tuning its calculations when they prove wrong.

Similarly, Google’s DeepMind has also recently designed a training technique to coach AI how to play video games on the Atari platform.