Ahead of the holiday season, Microsoft announced that it is upgrading the AI model behind Bing Image Creator, the AI-powered image editing tool built into its Bing search engine. Microsoft says this new model (the latest version of OpenAI's DALL-E 3 model, codenamed PR16) will allow users to create “higher quality” images “twice as fast as before”. I promised.
But it never arrived. Complaints quickly flooded X and Reddit.
“The DALL-E we loved is gone forever,” said one Redditor. “Now that Bing is no longer working for me, I use ChatGPT,” another wrote.
The backlash was so great that Microsoft said it would revert the previous model to Bing Image Creator until the problem could be addressed.
Let's bring back the old Dal 3! The image quality of the older model is definitely better. For example like these images. The images produced by the new model are the worst 🙁 pic.twitter.com/BjIM8MS4ng
— Zeᡣ𐭩ྀིྀི (@riegrowl) December 28, 2024
“We are now able to [reproduce] Some issues have been reported and we plan to revert it [DALL-E 3] PR13 until we can fix it,” Jordi Ribas, head of search at Microsoft, said in a post to X on Tuesday night. “Unfortunately, the implementation process has been very slow. It started over a week ago and will take another two to three weeks to reach 100%.”
So what went wrong?
Comparing model outputs from case reports is difficult, especially when prompts are not standardized. However, many users said that the PR16 tends to reduce the realism of the image. Windows Latest contributor Mayank Parmar noted that the images produced by PR16 lacked detail and polish, making them look oddly cartoonish and “lifeless.”
I don't know who thinks this is a joke. DALL-E is objectively worse than ever since this “update” and has been surpassed by other companies such as Google. Comparing the image quality now to what it was just a few months ago, it's like night and day. pic.twitter.com/EdSdk7aign
— Outward (@roccynoxy) December 19, 2024
This is not the first time that an image model that supposedly passed internal checks has not been publicly accepted. Back in February, Google was forced to disable its AI chatbot Gemini's ability to create portraits after users complained of historical inaccuracies.
This failure shows how difficult it is to measure model improvements in the real world. According to Ribas, Microsoft's benchmarks found that the PR16's quality was “slightly better on average” compared to previous Bing Image Creator models.
It seems clear that whatever internal metrics the company uses, they don't line up with most people's preferences.