Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Pets ready-made stem cell therapy may come

July 4, 2025

Everyone in high tech has an opinion about Soham Parekh

July 3, 2025

All stages of TechCrunch regain early release prices for limited time

July 3, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    Not everyone is excited about DMs on the thread

    July 3, 2025

    Meta has found another way to engage you: message that message first

    July 3, 2025

    Everything you need to know about Flash, Blueski-based Instagram alternatives

    July 3, 2025

    Substack brings new updates to live streaming as it increases video push

    July 2, 2025

    Amazon shuts down the Freevee app in August

    July 2, 2025
  • Crypto

    Vitalik Buterin reserves for Sam Altman's global project

    June 28, 2025

    Calci will close a $185 million round as rival Polymeruk reportedly seeks $200 million

    June 25, 2025

    Stablecoin Evangelist: Katie Haun's Battle of Digital Dollars

    June 22, 2025

    Hackers steal and destroy millions of Iran's biggest crypto exchanges

    June 18, 2025

    Unique, a new social media app

    June 17, 2025
  • Security

    Ransomware Gang Hunter International says it's shut down

    July 3, 2025

    India's biggest finance says hackers have accessed customer data from insurance units

    July 2, 2025

    Data breaches reveal that Catwatchful's “Stalkerware” is spying on thousands of phones

    July 2, 2025

    Hacking, Leaking, Exposure: Do not use stalkerware apps

    July 2, 2025

    Qantas Hacks lead to theft of personal data for 6 million passengers

    July 2, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    Pets ready-made stem cell therapy may come

    July 4, 2025

    Everyone in high tech has an opinion about Soham Parekh

    July 3, 2025

    All stages of TechCrunch regain early release prices for limited time

    July 3, 2025

    Kristen Craft brings fresh fundraising strategies to every stage

    July 3, 2025

    The Y Combinator alumni have launched a new $34 million fund dedicated to YC startups.

    July 3, 2025
TechBrunchTechBrunch

Spawning wants to build more ethical AI training datasets

TechBrunchBy TechBrunchJune 11, 20248 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


Jordan Meyer and Matthew Dryhurst founded Spawning AI to create tools that give artists more control over how their work is used online, and their latest project, called Source.Plus, aims to curate “copyright-free” media for training AI models.

The Source.Plus project's initial effort is a dataset of about 40 million public domain and Creative Commons CC0-licensed images, which allow creators to waive nearly all legal rights to their work. Meyer claims that the Source.Plus dataset is “high quality” enough to train state-of-the-art image generation models, despite being significantly smaller than other generative AI training data sets.

“At Source.Plus, we're building a universal 'opt-in' platform,” says Meyer. “Our goal is to make it easy for rights holders to provide media for use in generative AI training on their own terms, and for developers to seamlessly incorporate that media into their training workflows.”

Rights Management

The debate over the ethics of training generative AI models, particularly art-generating models like Stable Diffusion and OpenAI’s DALL-E 3, continues unabated and, no matter how it ends, has major implications for artists.

Generative AI models “learn” to create an output (e.g., photorealistic art) by training on large amounts of relevant data (in this case, images). Some developers of these models argue that they have the right to scape data from public sources through fair use, regardless of the copyright status of the data. Other developers try to toe the line by compensating or at least crediting content owners for their contributions to the training set.

Meyer, the Spawning CEO, believes no one has yet determined the best approach.

“AI training often defaults to using the easiest data available, but that data doesn't necessarily come from the most impartial or responsible sources,” he told TechCrunch in an interview. “Artists and rights holders have had little control over how their data is used in AI training, and developers have had no high-quality alternatives that are more likely to respect data rights.”

Available in limited beta, Source.Plus builds on Spawning's existing tools for managing art provenance and usage rights.

In 2022, Spawning launched HaveIBeenTrained, a website where creators can opt out of training datasets used by vendors that Spawning partners with, such as Hugging Face and Stability AI. After raising $3 million in venture capital from investors including True Ventures and Seed Club Ventures, Spawning rolled out ai.text, a way for websites to “authorize” their AI, and Kudurru, a system to protect against data-scraping bots.

Source.Plus is Spawning's first effort to build a media library and manage it in-house. The initial image dataset, PD/CC0, can be used for commercial or research purposes, Meyer said.

Spawning Sauce.PlusSource.Plus library. Image credit: Spawning

“Source.Plus is not just a repository of training data, it's an enrichment platform with tools to support training pipelines,” he continues. “Our goal is to provide high-quality, copyright-free CC0 datasets that can support powerful base AI models within a year.”

Organisations like Getty Images, Adobe, Shutterstock and AI startup Bria claim they only use fairly sourced data to train their models (Getty even calls its generative AI products “commercially safe”), but Meyer says Spawning aims to set a “higher bar” for what fairly sourced data means.

Source.Plus filters images based on “opt-out” and other artist training settings, displays provenance information about how and where the image was acquired, and filters out images that aren't CC0-licensed, including Creative Commons BY 1.0 licenses that require attribution. Spawning also says it monitors for copyright infringement from sources such as Wikimedia Commons, where someone other than the creator is responsible for indicating the copyright status of a work.

“We meticulously verified the licenses reported for the images we collected and removed any that seemed questionable, a step that is not taken by many 'unbiased' datasets,” Meyer said.

Historically, problematic images such as violent, pornographic, and sensitive personal images have plagued both open and commercial training datasets.

The administrators of the LAION dataset were forced to take one library offline after reports of medical records and depictions of child sexual abuse. This week, a Human Rights Watch investigation found that one of LAION's repositories contained the faces of Brazilian children without their consent or knowledge. And Adobe's stock media library, Adobe Stock, which the company uses to train generative AI models such as its Firefly Image model that generates art, was found to contain AI-generated images from rivals such as Midjourney.

Spawning Sauce.PlusArtwork from Source.Plus Gallery. Image courtesy of Spawning

Spawning's solution is a classification model trained to detect nudity, gore, personally identifiable information, and other undesirable aspects in images. Recognizing that classifiers are not perfect, Meyer says, Spawning plans to give users “flexible” filtering of the Source.Plus dataset by adjusting the classifier's detection threshold.

“We employ moderators to verify ownership of data,” Meyer added, “and we also have built-in remediation capabilities, allowing users to flag potentially violating or infringing works, and we can audit trails of how that data has been used.”

compensation

Most of the programs that pay creators for providing generative AI training data haven't been all that successful: Some programs calculate creator payments based on opaque criteria, while others pay out amounts that artists consider unfairly low.

Take Shutterstock, for example. The stock media library, which has signed deals with AI vendors worth tens of millions of dollars, contributes to a “contributor fund” for artwork used to train its generative AI models and licenses it to third-party developers. But Shutterstock isn't transparent about how much artists can expect to earn, nor does it allow artists to set their own prices and terms. One third-party estimate puts earnings at $15 for 2,000 images, which is hardly an impressive amount.

When Source.Plus leaves beta later this year and expands to non-PD/CC0 datasets, it will take a different approach from other platforms by allowing artists and rights holders to set their own price per download. Spawning will charge a fee, but only a flat rate of “a tenth of a cent,” Meyer says.

Customers can also pay Spawning $10 per month (plus the usual image download fee) for Source.Plus Curation, a subscription plan that gives them the ability to privately manage their image collections, download their datasets up to 10,000 times per month, and early access to new features like “premium” collections and data enrichment.

Spawning Sauce.PlusImage credit: Spawning

“While we provide guidance and recommendations based on current industry standards and internal metrics, it is ultimately up to the contributors to our dataset to decide what is valuable to them,” Meyer said. “We intentionally chose this pricing model to give artists a majority of the revenue and allow them to set their own terms of participation. We believe this revenue share is significantly more favorable to artists than the more common percentage revenue share, and will lead to higher payouts and transparency.”

If Source.Plus proves as popular as Spawning hopes, the company plans to expand it beyond images to other types of media, such as audio and video. Spawning is in talks with unnamed companies to make their data available on Source.Plus. Meyer said Spawning may also use data from the Source.Plus dataset to build its own generative AI models.

“We want to ensure that rights holders who want to participate in the generative AI economy have the opportunity to do so and are fairly compensated,” Meyer said, “and that artists and developers who have felt conflicted about engaging with AI have the opportunity to do so in a way that is respectful of other creators.”

Certainly, Spawning could carve out a niche for itself here, and Source.Plus seems like one of the more promising attempts to involve artists in the generative AI development process, allowing them to share in the profits of their work.

As my colleague Amanda Silberling recently wrote, the emergence of apps like Cara, an art-hosting community that saw a surge in usage after Meta announced it might train its generative AI on Instagram content, including that of artists, signals that the creative community has reached a breaking point: They’re desperate for alternatives to the companies and platforms they see as thieves, and Source.Plus just might be a viable option.

But if Spawning always acts in the best interest of artists (a big assumption, given that Spawning is a venture-capital-backed company), it's questionable whether Source.Plus will be able to successfully scale as Meyer envisions. If social media has taught us anything, it's that moderation (especially of millions of user-generated content) is an intractable problem.

You'll see it soon.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025

Startup Weekly: Wiz bets paid off at M&A Rich Week

March 21, 2025

Wayve CEO shares his key elements for scaling autonomous driving technology

March 21, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

Pets ready-made stem cell therapy may come

July 4, 2025

Everyone in high tech has an opinion about Soham Parekh

July 3, 2025

All stages of TechCrunch regain early release prices for limited time

July 3, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.