With the help of a robot, Google has found a new way to show what its Gemini AI model is capable of.
This is a robot from Google's Everybody Robots division, which was shut down last year, but apparently the robots still exist, as Google fitted one of them with a yellow bow tie and used Gemini to teach the robot how to respond to commands and navigate around the DeepMind office space.
To achieve this, Google uses visual language models, VLMs, trained not only on text but also on images and videos, to help answer questions and perform tasks that require perception.
For example, in one video, a Google employee asks the robot to take him somewhere to draw a picture. The robot says it needs to think for a moment and takes the employee to a whiteboard. In another video, the robot is told to follow instructions on a whiteboard, which has a map showing directions to a place called the Blue Area. The robot follows the directions to the robotics testing area and announces, “You successfully followed the instructions on the whiteboard.”
Press play to see the robot in action and let us know what you think in the comments.