Following a series of controversies stemming from technical issues and licensing changes, AI startup Stability AI has announced its latest family of image generation models.
The company claims that the new Stable Diffusion 3.5 series is more customizable, versatile, and offers improved performance than Stability's previous generation technology. There are three models in total.
Stable Diffusion 3.5 Large: The most powerful model with 8 billion parameters that can generate images at up to 1 megapixel resolution. (Parameters roughly correspond to the model's problem-solving skills, and models with more parameters generally perform better than models with fewer parameters.) Stable Diffusion 3.5 Large Turbo: A distilled version of Stable Diffusion 3.5 Large. generates images more quickly, but at the cost of: Some quality. Stable Diffusion 3.5 Medium: A model optimized to run on edge devices such as smartphones and laptops, capable of producing images ranging in resolution from 0.25 to 2 megapixels.
Stable Diffusion 3.5 Large and 3.5 Large Turbo are available now, but 3.5 Medium won't be released until October 29th.
Stability says the Stable Diffusion 3.5 model should produce more “diverse” output, i.e. images depicting people with different skin tones and features, without the need for “extensive” prompts. Masu.
“During training, each image is accompanied by multiple versions of the prompt, with shorter prompts taking precedence,” Hanno Basse, chief technology officer at Stability, told TechCrunch in an interview. “This ensures a broader and more diverse distribution of image concepts for a given text description. Like most generative AI companies, we use a variety of different sources, including filtered public datasets and synthetic data. We conduct training based on data.
Some companies have cleverly built this kind of “diversification” feature into their image generators in the past, sparking an outcry on social media. For example, older versions of Google's Gemini chatbot display anachronistic groups of people in response to historical prompts like “Roman legionaries” and “U.S. senators.” Google was forced to pause image generation of people for nearly six months while it developed a fix.
With any luck, Stability's approach will be more thoughtful than others. Unfortunately, Stability is not in early access, so we are unable to provide any impressions.
Image credit: Stability AI
Stability's previous flagship image generator, Stable Diffusion 3 Medium, was heavily criticized for its unique artifacts and poor adherence to prompts. The company warns that Stable Diffusion 3.5 models may also experience similar prompt errors. It's because of the trade-off between engineering and architecture. However, Stability also claims that this model is more robust than previous models at producing different styles of images, including 3D art.
“There can be large variations in output from the same prompt with different seeds. This is intentional, as it helps maintain a broader knowledge base and diverse styles in the base model.” said Stability in a blog post shared with TechCrunch. “However, as a result, a lack of specificity in the prompts can increase uncertainty in the output, which may result in different aesthetic levels.”
Image credit: Stability AI
One thing that hasn't changed with the new model is the Stability license.
Like previous stability models, the Stability Diffusion 3.5 series models are free to use for “non-commercial” purposes, including research. Businesses with less than $1 million in annual revenue can also commercialize for free. However, organizations with more than $1 million in revenue must contract with Stability for an enterprise license.
Stability sparked controversy this summer over restrictive fine-tuning terms that gave (or at least appeared to give) the company the right to extract fees for models trained on images from image generators. In response to this backlash, the company adjusted its terms to allow for freer commercial use. Stability today reaffirmed that users own the media they generate with Stability models.
“We encourage creators to distribute and monetize their work across the pipeline,” Anna Gillen, Stability’s vice president of marketing and communications, said in an emailed statement. said. Prominently display “Powered by Stability AI” on relevant websites, user interfaces, blog posts, about pages, or product documentation. ”
Stable Diffusion 3.5 Large and Diffusion 3.5 Large Turbo can be self-hosted or used via Stability's API or third-party platforms such as Hugging Face, Fireworks, Replicate, and ComfyUI. Stability says it plans to release ControlNet for fine-tunable models in the coming days.
Stability's models, like most AI models, are trained on public web data. Some of them may be copyrighted or subject to restrictive licenses. Stability and many other AI vendors claim that the fair use doctrine protects them from claims of copyright infringement. However, the number of class action lawsuits filed by data owners continues to rise.
Image credit: Stability AI
At Stability, it is up to the customer to protect themselves from copyright infringement claims, and unlike other vendors, there is no payout carve-out if found responsible.
However, due to stability, data owners can request that their data be removed from the training dataset. As of March 2023, artists have removed 80 million images from Stable Diffusion's training data, according to the company.
When asked about security measures against misinformation in light of the upcoming US general election, Stability responded, “We have taken reasonable measures to prevent the abuse of Stability Diffusion by malicious parties, and will continue to do so.'' I will continue to teach.” However, the startup did not provide specific technical details about those steps.
As of March, Stability only prohibited clearly “misleading” content created using its generative AI tools, including content that could influence elections, It did not prohibit content that could impair impartiality or that featured politicians or celebrities.
TechCrunch has a newsletter focused on AI. Sign up here to get it delivered to your inbox every Wednesday.