French AI startup Mistral has released its first model that can process not only text but also images.
A 12 billion parameter model called Pixtral 12B is about 24 GB in size. Parameters roughly correspond to the model's ability to solve a problem, and models with more parameters generally perform better than models with fewer parameters.
Built on one of Mistral's text models, Nemo 12B, the new model can answer questions about any number and size of images, given either the image URLs or images encoded using base64 (a binary-to-text encoding method). Like other multimodal models such as Anthropic's Claude family and OpenAI's GPT-4o, Pixtral 12B should, at least in theory, be able to perform tasks such as captioning images or counting the number of objects in a photo.
Pixtral 12B is available via GitHub and a torrent link from AI and machine learning development platform Hugging Face, and can be downloaded, tweaked, and used under Mistral's standard development license, which requires a paid license for commercial applications but not for research or academic use.
However, Mistral has not yet clarified which license applies to the Pixtral 12B. The startup offers some models (but not all) without restrictions under the Apache 2.0 license. We've reached out to Mistral PR for more information and will update this post if we hear back.
Unfortunately, I wasn't able to try out Pixtral 12B because there was no working web demo available at the time of publishing. In a post on X, Mistral's head of developer relations, Sophia Yang, said that Pixtral 12B will soon be available for testing on Mistral's chatbot and API delivery platforms, Le Chat and Le Platforme.
It is unclear what image data Mistral used in developing the Pixtral 12B.
Most generative AI models, including Mistral's other models, are trained on vast amounts of public data from the web, which is often protected by copyright. Some model vendors argue that “fair use” rights allow them to scrape public data freely, but many copyright holders disagree and have filed lawsuits against major vendors such as OpenAI and Midjourney to block the practice.
Pixtral 12B was announced after Mistral closed a $645 million funding round led by General Catalyst, which valued the company at $6 billion. Founded just over a year ago, Mistral (with Microsoft as a minority investor) is seen by many in the AI community as Europe's answer to OpenAI. The young company's strategy thus far has been to release free “open” models, charge for managed versions of those models, and offer consulting services to enterprise customers.