DeepMind, Google's AI research organization, has unveiled a model that can generate an “infinite” variety of playable 3D worlds.
The model, called Genie 2, is the successor to DeepMind's Genie, released earlier this year, which allows users to create interactive, real-time scenes from a single image and text description (such as “cute humanoid robot in the forest”). Can be generated. In this respect, it's similar to models being developed by Fei-Fei Li's company World Labs and Israeli startup Descart.
DeepMind claims that Genie 2 can generate “a wide variety of rich 3D worlds,” including worlds where users can use the mouse and keyboard to perform actions such as jumping and swimming. Trained on video, this model can simulate object interactions, animations, lighting, physics, reflections, and “NPC” behavior.
Image credit: DeepMind
Many of Genie 2's simulations look like AAA video games. The reason may be that the model's training data includes playthroughs of popular titles. But like many AI labs, DeepMind didn't reveal many details about how it sources its data, including for competitive reasons.
Some may wonder about the implications for intellectual property. DeepMind, a Google subsidiary, has unfettered access to YouTube, and Google has previously hinted that its ToS gives it permission to use YouTube videos to train its models. But is Genie 2 essentially making illegal copies of the video games you've “watched”? That's for the courts to decide.
DeepMind says Genie 2 can generate a consistent world for up to a minute in a variety of perspectives, including first-person and isometric views, most of which take 10 to 20 seconds.
“Genie 2 intelligently reacts to actions performed by pressing keys on your keyboard, identifying your character and moving them correctly,” DeepMind wrote in a blog post. “For example, our model is [can] Understand that the arrow keys need to move the robot, not the trees or clouds. ”
Image credit: DeepMind
Most models (also known as world models) like Genie 2 can simulate games and 3D environments, but they have artifacts, inconsistency, and illusion-related issues. For example, Oasis, Descart's Minecraft simulator, has a low resolution and quickly “forgets” the layout of levels.
However, Genie 2 can remember hidden parts of a simulated scene and accurately render them when they become visible again. (This is also possible with the World Labs model.)
Now, games made with Genie 2 aren't actually all that fun considering your progress is erased every minute. That's why DeepMind positions this model as a research and creative tool, a tool for prototyping “interactive experiences” and evaluating AI agents.
“Thanks to Genie 2's out-of-distribution generalization capabilities, you can turn your concept art and drawings into fully interactive environments,” DeepMind writes. “Also, by using Genie 2 to quickly create rich and diverse environments for AI agents, researchers can generate assessment tasks that the agents never saw during training.”
Image credit: DeepMind
DeepMind says that although Genie 2 is in its early stages, the lab believes it will be a key component in future AI agent development.
Creators, especially those in the video game industry, may have mixed feelings. A recent Wired investigation found that major companies like Activision Blizzard, which has laid off large numbers of employees, are using AI to cut corners, increase productivity, and compensate for layoffs.
Google is pouring more resources into world modeling research, which is expected to be the next big thing in generative AI. In October, DeepMind hired Tim Brooks, who led the development of OpenAI's Sora video generator, to work on video generation technology and world simulators. And two years ago, the institute poached Tim Rocktäschel from Meta, best known for his experiments with “open-endedness” in video games like Nethack.