OpenAI said it is developing tools that will give creators better control over how their content is used in training generative AI.
The tool, called Media Manager, allows creators and content owners to identify their works to OpenAI and specify how those works should be included or excluded from AI research and training. Become.
OpenAI says the goal is to have the tool in place by 2025, and the company is working with “creators, content owners, and regulators” on the standard, likely through an industry steering committee it recently joined. That's what it means.
“This includes cutting-edge efforts to build a first-of-its-kind tool that helps identify copyrighted text, images, audio, and video across multiple sources and reflects creators' preferences. machine learning research will be required,” OpenAI said in a blog post. “Over time, we plan to introduce additional choices and features.”
Whatever form Media Manager ultimately takes, it appears to be a response to OpenAI's growing criticism of its AI development approach, which relies heavily on scraping publicly available data from the web. Most recently, eight leading U.S. newspapers, including the Chicago Tribune, accused OpenAI of plagiarizing articles used to train generative AI models and commercializing them without compensation or credit to the source publications. sued for intellectual property infringement related to the company's use of generated AI.
Generative AI models, including OpenAI (the types of models that can analyze and generate text, images, videos, etc.), are typically trained on large numbers of samples from public sites and datasets. OpenAI and other generative AI vendors are using fair use, a legal doctrine that allows the use of copyrighted works to create derivative works, as long as they are transformative, to scrape publicly available data to train models. It claims to protect the practice of using But not everyone agrees.
In fact, OpenAI recently claimed that it is impossible to create useful AI models without copyrighted material.
But to appease critics and protect itself from future lawsuits, OpenAI has taken steps to reach a compromise with content creators.
Last year, OpenAI began allowing artists to “opt out” of having their work removed from the datasets it uses to train image-generating models. The company also announced that via the robots.txt standard, a website owner can web-crawl his bots, give them instructions about his website, and scrape content on the site to train his AI models. We allow you to specify whether OpenAI also continues to sign licensing agreements with large content owners, including news organizations, stock media libraries, and Q&A sites like Stack Overflow.
But some content creators say OpenAI doesn't go far enough.
Artists have described OpenAI's image opt-out workflow as cumbersome, requiring them to submit a separate copy of each image to be removed along with an explanation. OpenAI reportedly pays relatively small fees for content licenses. And as OpenAI itself acknowledged in a blog post on Tuesday, the company's current solution doesn't address scenarios where creators' work is quoted, remixed, or reposted on platforms they don't control. do not have.
In addition to OpenAI, many third parties are working to build universal provenance and opt-out tools for generative AI.
Startup Spawning AI, which includes partners Stability AI and Hugging Face, has an app that identifies and tracks bots' IP addresses and blocks scraping attempts, and an app that allows artists to register their work and respect their rights. We provide a database that allows you to prohibit training by vendors of your choice. request. Steg.AI and Imatag help creators establish ownership of images by applying watermarks that are imperceptible to the human eye. And Nightshade, a project from the University of Chicago, “contaminates” image data to make it useless or destructive for training AI models.