Google has released what it calls a new “inference” AI model. However, this is still in the experimental stage and from our quick testing, there is certainly room for improvement.
The new model, called Gemini 2.0 Flash Thinking Experimental (a mouthful, to be sure), is available in AI Studio, Google's AI prototyping platform. The model card describes it as being “ideal for multimodal understanding, reasoning, and coding” and capable of “reasoning through the most complex problems” in fields such as programming, mathematics, and physics.
In a post on X, AI Studio product lead Logan Kilpatrick calls the Gemini 2.0 Flash Thinking Experimental a “first step.” [Google’s] A Journey of Deduction.” Jeff Dean, principal scientist at Google DeepMind, Google's AI research arm, said in his post that the Gemini 2.0 Flash Thinking Experimental is “trained to use thinking to enhance inference. ” he said.
“Increasing the amount of computation in inference time yields promising results,” Dean said, referring to the amount of computation used to “run” the model when considering questions.
This is still an early version, but check out how the model handles difficult puzzles that include both visual and text clues: (2/3) pic.twitter.com/JltHeK7Fo7
— Logan Kilpatrick (@OfficialLoganK) December 19, 2024
Built on Google's recently announced Gemini 2.0 Flash model, the Gemini 2.0 Flash Thinking Experimental appears to be similar in design to OpenAI's o1 and other so-called inference models. Unlike most AI, inferential models effectively fact-check, allowing them to avoid some of the pitfalls that typically stumble AI models.
On the downside, inference models often take longer to arrive at a solution, typically seconds to minutes longer.
When given a prompt, Gemini 2.0 Flash Thinking Experimental pauses before responding and considers a number of related prompts, “explaining” its reasons along the way. After a while, the model will summarize what it considers to be the most accurate answer.
Well, that's what should happen. When I asked Gemini 2.0 Flash Thinking Experimental how many R's there were in the word “strawberry”, it answered “two”.
Google's new inference model can sometimes have a hard time counting the letters in words. Image credit: Google
Your mileage may vary.
The release of o1 led to an explosion of inference models from Google as well as rival AI labs. In early November, DeepSeek, an AI research company funded by quantitative traders, began previewing its first inference model, DeepSeek-R1. In the same month, Alibaba's Qwen team announced what it claimed was the first “open” challenger to o1.
Bloomberg reported in October that Google has multiple teams developing inference models. Then, in November, The Information reported that the company had at least 200 researchers specializing in the technology.
What opened the floodgates for inference models? One of them was the exploration of new approaches to improving generative AI. As my colleague Max Zeff recently reported, “brute force” methods of scaling up models no longer yield the improvements they once did.
Not everyone is convinced that inferential models are the best way to go. For example, they require a lot of computing power to run, so they tend to be expensive. So far, it has shown good performance in benchmarks, but it remains to be seen whether the inference model can maintain this rate of progress.