Page Reader AI | Google’s Gemini Robotics AI Model Reaches Into the Physical World

Google’s Gemini Robotics AI Model Reaches Into the Physical World | WIRED

Google DeepMind's new Gemini Robotics AI model integrates language, vision, and physical action to control robots, enabling them to perform complex tasks based on verbal commands.

AI Summary available — skim the key points instantly. Show AI Generated Summary

Show AI Generated Summary

In sci-fi tales, artificial intelligence often powers all sorts of clever, capable, and occasionally homicidal robots. A revealing limitation of today’s best AI is that, for now, it remains squarely trapped inside the chat window.

Google DeepMind signaled a plan to change that today—presumably minus the homicidal part—by announcing a new version of its AI model Gemini that fuses language, vision, and physical action together to power a range of more capable, adaptive, and potentially useful robots.

In a series of demonstration videos, the company showed several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands: Robot arms fold paper, hand over vegetables, gently put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect items that are visible with possible actions in order to do what they’re told. The model is trained in a way that allows behavior to be generalized across very different hardware.

Google DeepMind also announced a version of its model called Gemini Robotics-ER (for embodied reasoning), which has just visual and spatial understanding. The idea is for other robot researchers to use this model to train their own models for controlling robots’ actions.

In a video demonstration, Google DeepMind’s researchers used the model to control a humanoid robot called Apollo, from the startup Apptronik. The robot converses with a human and moves letters around a tabletop when instructed to.

“We've been able to bring the world-understanding—the general-concept understanding—of Gemini 2.0 to robotics,” said Kanishka Rao, a robotics researcher at Google DeepMind who led the work, at a briefing ahead of today’s announcement.

Google DeepMind says the new model is able to control different robots successfully in hundreds of specific scenarios not previously included in their training. “Once the robot model has general-concept understanding, it becomes much more general and useful,” Rao said.

The breakthroughs that gave rise to powerful chatbots, including OpenAI’s ChatGPT and Google’s Gemini, have in recent years raised hope of a similar revolution in robotics, but big hurdles remain.