Can today's AI models really remember, think, plan, and reason like the human brain? Some AI institutes may believe so, but Meta According to Yann LeCun, Chief AI Scientist at , the answer is no. But he thinks they could get there in about 10 years by pursuing a new approach called the “world model.”
Earlier this year, OpenAI released a new feature called “Memory” that allows ChatGPT to “remember” conversations. The startup's latest generation model, o1, displays the word “think” while generating output, and OpenAI says the same model is capable of “complex inference.”
This alone makes it seem like we're getting pretty close to AGI. However, in a recent talk at the Hudson Forum, LeCun said that xAI founder Elon Musk, Google DeepMind co-founder Shane Legg and other AI experts suggest that human-level AI is just around the corner. He rejected the optimists.
“We need machines that understand the world. [machines] They can remember things, they have intuition, they have common sense, and they can reason and plan at the same level as humans,” LeCun said during his talk. “As you may have heard from the most enthusiasts, current AI systems simply don’t have these capabilities.”
LeCun says that today's large-scale language models, such as those powering ChatGPT and Meta AI, are far from “human-level AI.” It might take humans “years to decades” to accomplish something like that, he later said. (That doesn't stop his boss, Mark Zuckerberg, from asking when AGI will happen, though.)
The reason is simple. These LLMs work by predicting the next token (usually a few letters or a short word), and today's image/video models are predicting the next pixel. In other words, language models are one-dimensional predictors, and AI image/video models are two-dimensional predictors. These models are very good at making predictions in their respective dimensions, but they don't really understand the three-dimensional world.
This is why modern AI systems cannot perform simple tasks that most humans can. LeCun points out how humans can clear the dinner table by the age of 10 and learn to drive a car by the age of 17, both in a matter of hours. But even though the most advanced AI systems in the world today are built on thousands or millions of hours of data, they cannot reliably operate in the physical world.
To accomplish more complex tasks, LeCun suggests we need to build a new type of AI architecture around world models, three-dimensional models that can perceive the world around us.
“A world model is a mental model of how the world behaves,” he explained. “You can imagine a course of action that you would take, and using a world model, you can predict what effect that course of action will have on the world.”
Think of a “world model” in your head. For example, imagine you see a messy bedroom and want to clean it. You can imagine how effective it would be to pick up all your clothes. You don't have to try multiple methods or learn how to clean your room first. Your brain observes three-dimensional space and creates a plan of action to achieve your goal on the first try. That plan of action is the secret sauce of the promise of the AI world model.
One advantage here is that the world model can incorporate much more data than the LLM. It also requires more computation, which is why cloud providers are competing to partner with AI companies.
World models are a big idea that several AI labs are currently pursuing, and the term is quickly becoming the next buzzword to attract venture funding. A group of highly regarded AI researchers, including Fei-Fei Li and Justin Johnson, just raised $230 million for their startup, World Labs. The “Godmother of AI” and her team also believe that the global model will enable significantly smarter AI systems. OpenAI also describes its unannounced Sora video generator as a world model, but does not go into details.
LeCun outlined the idea of using world models to create human-level AI in a 2022 paper on “goal-driven AI,” a concept he said is more than 60 years old. points out. That is, a basic representation of the world (e.g., a video of a dirty room) and memory are input into the world model. World models then use that information to predict what the world will be like. Next, the purpose of the world model, such as the modified world state you want to achieve (such as a clean room) or guardrails to ensure that the model does not harm humans in order to achieve the purpose (not killing people). Specify. I'm cleaning my room, please.) The world model then finds a sequence of actions to achieve these objectives.
According to LeCun, Meta's long-term AI research lab, FAIR (Fundamental AI Research), is actively working on building goal-driven AI and world models. While FAIR once worked on AI for Meta's upcoming products, LeCun said the lab has shifted its focus to purely long-term AI research in recent years. LeCun said FAIR doesn't even use LLMs these days.
World models are an interesting idea, but there hasn't been much progress in making these systems a reality, LeCun says. He says there are a lot of very difficult problems where we are today, and it's certainly more complex than we think.
“It's going to take years, if not 10 years, to get everything working here,” Lekun said. “Mark Zuckerberg keeps asking me how long it will take.”