Generative AI has already shown great potential in robotics, where applications include natural language interaction, robot learning, no-code programming, and even design. This week, Google's DeepMind Robotics team is showcasing another potential sweet spot between the two: navigation.
In a paper titled “Mobility VLA: Multimodal Directed Navigation with Long-Context VLM and Topological Graphs,” the team demonstrates how they implemented Google Gemini 1.5 Pro to teach the robot how to respond to commands and navigate around an office. Naturally, DeepMind used parts of Every Day Robots that were left over since Google shut down the project last year in a massive layoff.
In a series of videos accompanying the project, a DeepMind employee begins with a smart-assistant-esque “OK, robot,” and then commands the system to perform various tasks around the 9,000-square-foot office space.
In one example, a Google employee asks the robot to take him somewhere to draw a picture. “OK,” the robot, wearing a smart yellow bow tie, replies. “Hold on. I'm thinking about it in Gemini…” The robot then guides the human to a wall of whiteboards. In a second video, another person instructs the robot to follow the instructions on the whiteboard.
A simple map shows the robot how to get to the “blue area.” The robot again thinks for a moment, then walks a long distance to what is the robot's testing area. “I followed the instructions on the whiteboard very well,” the robot announces with a level of confidence most humans can only dream of.
Prior to these videos, the robot was familiarized with the space using what the team calls “Multimodal Instructed Navigation with Demonstration Tours (MINT).” In practice, this means having the robot walk around the office while being pointed to different landmarks by voice. Then, the team used hierarchical visual-verbal behaviors (VLA) to “[e] “Environmental understanding and common sense reasoning” These processes combined enable a robot to respond to written and drawn commands and gestures.
Google says the robots had a success rate of around 90% over more than 50 interactions with employees.