Stability has announced Stable Diffusion 3, the latest and most powerful version of its image generation AI model. Details are sparse, but it's clear that this is an attempt to fend off the hype surrounding competitors recently announced by OpenAI and Google.
We'll be covering all these technical details soon, but for now, know that Stable Diffusion 3 is based on a new architecture and will work with a variety of hardware (but not the most powerful ones). is required). It's not on sale yet, but you can join the waiting list here.
The SD3 uses an updated “diffusion transformer”. This technology was developed in 2022, revised in 2023, and has now reached scalability. An impressive video of OpenAI His generator, Sora, appears to work on similar principles (Will Peebles, a co-author of the paper, went on to co-lead the Sora project). It also employs another new technique, Flow Matching, which also improves quality without adding much overhead.
Model suites range from 800 million parameters (commonly used SD less than 1.5) to 8 billion parameters (more than SD XL) with the goal of running on a variety of hardware . You'll probably need a setup intended for serious GPU and machine learning work, but you're not limited to APIs like you would with OpenAI or Google models. (Anthropic is not really part of this conversation, as it is not publicly focused on producing images or videos.)
Emad Mostaque, head of Stable Diffusion, said on Twitter that the new model is capable of multimodal understanding, video input and generation, and has all the features that competitors have been highlighting in their API-based competing products. I am. Although these features are still in the theoretical stage, there appears to be no technical barrier to their inclusion in future releases.
Of course, it is impossible to compare these models. Because none have actually been released, and only competing claims and selected examples need to be considered. However, Stable Diffusion has one decisive advantage. It exists in the zeitgeist as the go-to model for doing any kind of image production anywhere, with few inherent limitations on method or content. (In fact, if SD3 survives the safety mechanisms, it will almost certainly usher in a new era of AI-generated porn.)
Stable Diffusion seems to be aiming for white-label generative AI that you can't live without, rather than boutique generative AI that you may or may not need. To this end, the company is also upgrading its tools to lower the hurdles to use, but like the rest of the announcement, these improvements are left to the imagination.
Interestingly, the company put safety at the forefront of its announcement, stating:
We have taken and will continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by malicious parties. Safety begins when you start training your model and continues through testing, evaluation, and deployment. In preparation for this early preview, we have put in place a number of safeguards. By continuously collaborating with researchers, experts, and the community, we hope to continue to innovate further in our models toward public release.
What exactly are these safeguards? Undoubtedly, previews will reveal their contours to some extent, and then public release will either refine them further or censor them depending on your perspective on these things. It will be. We'll have more details soon, but in the meantime, we'll dive into the technical side of things to better understand the theory and methodology behind this new generation of models.