China's AI Lab DeepSeek may be attracting attention in the high -tech industry this week. However, Alibaba, one of the top domestic rivals, is not sitting down.
On Monday, Alibaba's QWEN team has released a new AI model QWEN2.5-VL, an AI model that can perform many textbooks and image analysis tasks. The model can analyze files, understand the video, count the objects in the image, and control the PC.
According to the benchmark of the QWEN team, the best QWEN2.5-VL model is Openai's GPT-4O, Anthropic Claude 3.5 Sonnet, and GEMINI 2.0 in Google answers various video understanding, mathematics, document analysis, and questions. Flash with the evaluation.
Image credit: Alibaba
QWEN2.5-VL, which can be tested with Alibaba's QWEN chat app and hugging the face of the AI DEV platform, QWEN2.5-VL analyzes charts and graphics, extracts data from invoices and form scans for multiple hours. You can “understand” long. The video, QWEN team, says. QWEN2.5-VL can also recognize “IPS of movies and television series, and a wide variety of products”, according to the team. This suggests that some of the models may be trained in copyrighted works.
QWEN2.5-VL, an AI developed by a Chinese company, has a certain limit on the topic discussed in the QWEN chat. QWEN CHAT threw an error message when asked to talk to QWEN2.5-VL-72B, the largest and most capable QWEN2.5-VL model, about “Xi Jinping mistakes”.
China's Internet regulatory authorities guarantee that many model benchmarks developed in Japan to “embody the core socialist value.” Many Chinese AI systems have refused to respond to topics that can increase regulatory authorities, such as the autonomy of Taiwan.
One of the more interesting features of QWEN2.5-VL is a function that interacts with software on both PCS and mobile devices. The video posted on the Hugging Hugging Face's technical lead, Philipp Schmid, was released by Qwen2.5-VL a Booking.com app for Android and reserved a flight from Chunggin to Beijing.
Don't miss it @alibaba_qwen 2.5 VL! Despite DeepSeek's hype, Qwen has dropped the best open multi -modal! QWEN 2.5 VL is a vision language model that can control computers. @openai Operator, extracting structured information from charts !!
TL; dr;
3️⃣… pic.twitter.com/geegvdl0ti
-Philipschmit (@_philschMid) January 27, 2025
In the following video, the QWEN2.5-VL model controls the application on the Linux desktop, but does not seem to be achieved beyond the tab switching. Probably, Qwen's benchmark shows Qwen2.5-VL score in OSWORLD, a benchmark that tries to imitate the actual computer environment.
The LMAO QWEN 2.5 VL can be taken out of the box to execute the use of the computer, and you can move first to the Openai operator! Lingering pic.twitter.com/lwmecxznsu
-Vaibhav (VB) SRIVASTAV (@reach_vb) January 27, 2025
The QWEN2.5-VL series two small, uniform model QWEN2.5-VL-3B and QWEN2.5-VL-7B can be used under the allowable license. However, the flagship QWEN2.5-VL-72B is under the custom license of Alibaba, and companies and developers with more than 100 million active users are Qwen/alibaba before commercially developing models. We are demanding that permission is requested.