OpenAI has reached an agreement with Reddit to use the social news site's data to train AI models.
In a blog post on OpenAI's press site, the company said its partnership with Reddit provides access to “real-time, structured, proprietary content” (such as posts and replies) from Reddit, and the company's tools and models. “I will be able to understand it more deeply,'' he said. We will introduce its contents. Reddit content will be integrated into ChatGPT, OpenAI's popular conversational AI, and the companies will work together to bring unspecified new “AI-powered features” to both Reddit users and moderators.
OpenAI will also become Reddit's advertising partner.
“Reddit plans to build on OpenAI's platform of AI models to realize its powerful vision,” OpenAI wrote in the post. “By using LLM, ML, and AI, Reddit can improve the user experience for all users.”
OpenAI has several similar licensing agreements with content providers, ranging from stock media libraries to news publishers. But an unusual angle to this is that OpenAI CEO Sam Altman owns 8.7% of Reddit, making him its third largest shareholder and once a member of the company's board of directors.
To avoid scrutiny, OpenAI said in a press release that while Altman remains a shareholder in Reddit, the partnership was “led by OpenAI's COO.” [Brad Lightcap]” and “Approved by” [OpenAI’s] Independent Board of Directors. (Note here that Mr. Altman himself is a member of his OpenAI board of directors.)
As Reddit navigates the market as a publicly traded company, it is increasingly emphasizing data licensing agreements as a central part of its growth strategy.
Reddit disclosed in its IPO prospectus that it has agreements to license its data to customers including Google for a total value of more than $200 million. And in its first earnings report as a public company, Reddit reported a 450% increase in non-advertising revenue year-over-year, largely thanks to these deals.
Reddit shares rose 11% in after-hours trading following the announcement of the acquisition with OpenAI.
“The paradox is that as more content is written by machines on the internet, content written by real people is becoming more valuable,” Reddit CEO Steve Huffman said during an earnings call in March. That's what I see,” he said. “And we've been having real conversations for nearly 20 years.”
Reddit has over 1 billion posts and 16 billion comments on its platform, and that number is growing every day thanks to hundreds of millions of active users. The platform is a goldmine for generative AI companies, whose models learn from examples of text and other content. Use images to generate new, similar content.
But the company could face backlash from users concerned about how it monetizes their data.
You may find it helpful to take a look at Stack Overflow, a Q&A forum for software developers. The forum recently signed an agreement with OpenAI to provide data for OpenAI's model training. As part of the protest, some users deleted their highly rated answers to questions on the community. However, Stack Overflow restored the deleted posts and banned the users for violating their terms of service.
Reddit has already expressed displeasure at attempts to give Reddit users more control over their data.
Vana, a startup built on blockchain, is a data app that allows Reddit users to pool their data and decide together how that combined data will be used (or sold). We are about to launch a “DAO” (Digital Autonomous Organization). In a statement to TechCrunch, Reddit banned a Vana subreddit dedicated to discussion of DAOs, accusing the company of “abusing” data export regulations.