Stability AI, the startup behind the AI-powered art generator Stable Diffusion, has released an open AI model that generates sounds and songs that it says were trained exclusively on royalty-free recordings.
The generative model, called “Stable Audio Open,” takes a text description (e.g., “processed studio-played rock beats, session drums on an acoustic kit”) and outputs recordings up to 47 seconds long. The model was trained using approximately 486,000 samples from the free music libraries FreeSound and Free Music Archive.
Stability AI says the model can create drum beats, instrumental riffs, ambient sounds and “production elements” for videos, films and TV shows, and can even be used to “edit” existing songs, applying one song's style (e.g. smooth jazz) to another.
“A key benefit of this open-source release is that it enables users to fine-tune the models with their own custom audio data,” Stability AI said in a company blog post. “For example, a drummer could tweak samples of their own drum recordings to generate new beats.”
But Stable Audio Open has limitations. It can't generate complete songs, melodies, or vocals — at least not good ones. Stability AI says it's not optimized for this, and suggests users looking for that functionality opt for its premium Stable Audio service.
Stable Audio Open also can't be used commercially because its terms of use prohibit it, and it doesn't perform equally across musical styles and cultures, or with descriptions in languages other than English, something Stability AI attributes to bias in its training data.
“Data sources may lack diversity and all cultures may not be equally represented in the dataset,” Stability AI said in its model description, “so examples generated from the model will reflect biases in the training data.”
Stability AI, which has long struggled to turn around its flagging business, has been the subject of controversy recently after its vice president of generative audio, Ed Newton-Rex, resigned after disagreeing with the company's position that training generative AI models on copyrighted works constitutes “fair use.” Stable Audio Open appears to be an attempt to turn that around, while also being a subtle promotion for Stability AI's paid products.