Introducing a new “inference” AI model, QwQ-32B-Preview. It is one of the few comparable to OpenAI's o1, and the first to be made available for download under a permissive license.
Developed by Alibaba's Qwen team, QwQ-32B-Preview contains 32.5 billion parameters, can consider prompts up to 32,000 words in length, and in certain benchmarks is better than the two inferences released by OpenAI. It performs better than the models o1-preview and o1-mini. so far. Parameters roughly correspond to the model's problem-solving skills, and models with more parameters generally perform better than models with fewer parameters.
According to Alibaba's tests, QwQ-32B-Preview outperforms OpenAI's o1 model in AIME and MATH tests. AIME uses other AI models to evaluate model performance, while MATH is a collection of word problems.
QwQ-32B-Preview can solve logic puzzles and answer fairly difficult math questions thanks to its “reasoning” feature. But it's not perfect. Alibaba said in a blog post that the model could switch languages unexpectedly, get stuck in loops, or degrade performance on tasks that require “common sense reasoning.”
Image credit: Alibaba
Unlike most AIs, QwQ-32B-Preview and other inference models effectively fact-check themselves. This helps avoid some of the pitfalls that models usually stumble upon, but the downside is that it often takes a long time to reach a solution. Similar to o1, QwQ-32B-Preview reasons through tasks, plans ahead, and performs a series of actions that help the model arrive at an answer.
QwQ-32B-Preview, which can be run and downloaded on the AI development platform Hugging Face, appears to be similar to the recently released DeepSeek inference model in that it downplays certain political themes. Alibaba and DeepSeek are Chinese companies and are subject to benchmarking by China's internet regulator to ensure that their model responses “embody core socialist values.” Many of China's AI systems refuse to respond to topics that could anger regulators, such as speculation about Xi Jinping's government.
Image credit: Alibaba
When asked, “Is Taiwan part of China?”, QwQ-32B-Preview answered that while it is out of step with most of the world, it is in line with the views of China's ruling party. However, there was no response to questions about Tiananmen Square.
Image credit: Alibaba
QwQ-32B-Preview is “openly” available under the Apache 2.0 license. That is, it can be used for commercial applications. However, only certain components of the model have been released, making it impossible to replicate the QwQ-32B-Preview or gain much insight into the inner workings of the system.
The increased focus on inference models is driven by the feasibility of the “law of scaling,” the long-standing theory that a model's power will continually improve as you feed more data and computing power into it. This is because it is being scrutinized. A flurry of reports suggests that models from major AI labs like OpenAI, Google, and Anthropic aren't improving as dramatically as they once did.
This has led to a scramble for new AI approaches, architectures, and development techniques. One is test computing, which powers models like the QwQ-32B-Preview. Test-time computing, also known as inferential computing, essentially gives your model additional processing time to complete a task.
Other major labs and Chinese companies besides OpenAI are betting that test-time computing is the future. According to a recent report in The Information, Google has expanded its internal team focused on inference models to about 200 people and added significant computing power to the effort.