Like many AI companies, Udio and Suno relied on large-scale theft to create their generative AI models, something they all but admitted even before the new music industry lawsuit against them reached court. If it goes to a jury, the case could be both a damaging revelation and a highly beneficial precedent for similarly unethical AI companies facing legal peril.
The lawsuit, filed with great fanfare on Monday by the Recording Industry Association of America (RIAA), puts us all in the awkward position of rooting for the villain of digital media for decades — I've received hate letters from the RIAA myself! This case is as clear as day.
The crux of the two very similar lawsuits is that Suno and Udio (or, more precisely, Uncharted Labs, doing business as Udio) indiscriminately plundered nearly the entire history of recorded music to create datasets and use them to train music-generating AI.
It’s worth briefly mentioning here that these AIs don’t “generate,” but rather match user prompts with patterns in their training data and then attempt to perfect those patterns. In a sense, all these models do is perform covers or mashups of the songs they ingest.
The inclusion of the above data by Suno and Udio is beyond question in every sense (including legal sense), and both companies' management and investors have foolishly remained silent about copyright issues in this space.
They acknowledge that the only way to create a good music generation model is to ingest large amounts of high-quality copyrighted music, which is a necessary step in creating this kind of machine learning model.
And they admitted to doing so without permission from the copyright holders, investor Brian Hiatt told Rolling Stone just a few months ago.
To be honest, if I'd been signed to a label when this company was starting up, I probably wouldn't have invested in it. I feel like this product needed to be made without constraints.
I get it, he says he stole a century's worth of music without saying they stole a century's worth of music. Just to be clear, the “constraints” he's referring to are copyright law.
Finally, the companies informed the RIAA's lawyers that they believe stealing all this media falls under the doctrine of fair use, which essentially only applies to unauthorized use of a work. Fair use is certainly a complex and fuzzy concept, both in idea and execution, but a corporation with $100 million in funding stealing every song ever made and mass-producing and selling it seems a bit outside of the safe harbor intended for, say, seventh graders using a Pearl Jam song in the background of a video about global warming.
Frankly, it seems like the game is over for these companies. They were clearly following OpenAI's lead by secretly using copyrighted works and then using evasive and misleading language to stall less well-funded critics like authors and journalists. When AI companies are found guilty of wrongdoing, it no longer matters if they are the only distribution option.
In other words, denial, deflection, delay. Ideally, you can drag the case out until the tables are turned and you can make a deal with the critics. In the case of the Law Masters, it's clear that the news media and others, in this case the record labels, were hoping that music creators would finally come to the forefront. “Sure, we stole your stuff, but now we're big business. Wouldn't you rather play with us than against us?” This is a common strategy in Silicon Valley, and it's a successful one, mainly because it just costs money.
But that's hard to pull off when you have hard evidence in hand. And unfortunately for Udio and Suno, the RIAA included thousands of hard evidence in its lawsuit: songs owned by the RIAA that have been clearly reproduced by the music model. The “generated” songs, such as Jackson 5 and Maroon 5, are merely slightly distorted versions of the originals. This wouldn't be possible without the originals in the training data.
The nature of LLMs — specifically, their tendency to hallucinate and lose plot as you write — means you can't, say, transcribe an entire book in its entirety. This raises the possibility that authors could sue OpenAI, who could plausibly argue that the snippets their models cite were taken from reviews, first pages published online, etc. (The latest move of the goalposts is that they used copyrighted works early on, but no longer do so, which is a bit like saying you squeezed orange juice once, but no longer use it.)
You can't plausibly claim that a music generator heard only a few bars of “Great Balls of Fire” and then spit out the rest exactly word for word, chord for chord. Any judge or jury would laugh in your face and, with any luck, an artist in the courtroom would get a chance to explain it.
This is not only intuitively obvious, but it also has significant legal implications, because it's clear that the models are recreating entire works — entire songs, yes, sometimes poorly — and this allows the RIAA to argue that Udio and Suno are causing substantial and significant harm to the businesses of copyright holders and the artists they've recreated, and it could ask a judge at the start of the trial to halt the AI companies' entire operations with an injunction.
Is the opening paragraph of a book from a law degree? That's an intellectual question that should be debated at length. Is a dollar store “Call Me Maybe” generated on demand? Don't. I'm not saying that's right, but it's likely.
The predictable response from the companies is that the system is not intended to reproduce copyrighted works, a desperate and blatant attempt to shift liability onto users under the safe harbor of Section 230, just as Instagram is not liable if you use a copyrighted song as the backing for a Reels. Here, coupled with the aforementioned admission that the company itself has ignored copyright from the start, this argument seems unlikely to gain traction.
What will the outcome of these lawsuits be? As with all things AI, it is simply impossible to predict in advance as there is little precedent or established principles that can be applied.
My prediction, again without any expertise, is that these companies will be forced to make public their training data and methods, which are obviously of evidentiary interest. Seeing these and their obvious misuse of copyrighted material, and (probably) communications indicating knowledge of the violations of the law, will likely prompt settlements or attempts to avoid court cases and/or swift judgments against Udio and Suno. They will also be forced to cease all operations relying on theft-based models. At least one in two will try to continue business using legal (or at least nearly legal) music sources, but the resulting models will be of such poor quality that users will flee.
Investors? Ideally, they would lose all their money on something that could be proven to be clearly illegal and unethical not only in the eyes of a servile authors association, but also in the eyes of the notoriously relentlessly litigious RIAA lawyers. Whether the damages amount to cash on hand or promised funding is anyone's guess.
The consequences could be far-reaching. If investors who invested in a hot generative media startup suddenly see $100 million disappear due to the fundamental nature of generative media, suddenly a different level of caution seems appropriate. Companies will learn from trials (if there are any), settlement documents, etc. what they should have said, or perhaps more importantly, what they should not have said, to avoid liability and keep copyright holders guessing.
This lawsuit seems almost a foregone conclusion, but not all AI companies will be so free to leave their mark on the crime scene. This should be a lesson in hubris, not a how-to for prosecuting other generative AI companies or forcing them to pay settlements. It's good to have a lesson like this every once in a while, even if that lesson is the RIAA.