If you've ever used ChatGPT Search or Perplexity, you know that being able to search the web and pull citations inline greatly improves these AI chatbots. Results will be better if timely information is included, and web searches may reduce so-called hallucinations (i.e., when generative AI outputs false information).
That's why French startup Linkup is building an API that allows developers to access web content from premium, trusted sources and pass the results through large-scale language models (LLMs) to enrich answers. Many AI developers refer to this workflow as search augmentation generation (RAG).
More importantly, the future of bot scraping is uncertain. In the absence of a prior financial agreement between content publishers and the organizations scraping web pages, these bots remove content from the open web without paying, and many people are disappointed in the transaction. Not satisfied. This has led to increased regulatory oversight of AI training.
It also includes high-profile lawsuits, such as the ongoing one between OpenAI, the developer of ChatGPT, and the New York Times, which could change the landscape around web scraping in the near future. there is. That's why OpenAI has signed multi-year content licensing agreements with major publishers including AP, Axel Springer, Condé Nast, El País, Financial Times, and Le Monde.
“We started the company around the time OpenAI was contracting news sources to augment the answers coming from OpenAI models and its products for training and inference purposes. “Okay, this is great because we finally have an AI company that will pay for the source'' content providers, hopefully for mutual benefit.
Content publishers are now facing difficult decisions about how to address GenAI's thirst for data. You can block web scrapers using the (non-legally binding) robots.txt metadata file, which indicates whether a website can be used to train AI models. Additionally, you can sue AI companies that you believe have infringed your copyright. Alternatively, you can let the bot index your content at will (um, YOLO?). Alternatively, you may be able to license your content to AI developers and receive compensation for your intellectual property.
But there are thousands of AI companies (or technology companies that use AI) that don't have the scale and scope of OpenAI. At the same time, the great thing about the web is that there is a long tail of content publishers. However, this means that smaller content publishers typically do not have sufficient financial resources to sue. It also means it will be difficult to switch from a scraping model to a licensing model for millions of websites.
That's why Linkup is more than just a technical solution. It's the market. An intermediary between content publishers and companies who want to enhance their LLM answers with web content.
Linkup signs content license agreements with publishers and integrates with their CMS, allowing you to acquire content from publishers without scraping. Linkup then pays content partners based on how often their content is accessed by Linkup clients.
Linkup Founding Team Image Credit: Linkup
“We're really targeting applications that are implementing AI in their products,” Mizrahi says. “So a typical use case is to create an AI application using a model from Mistral or OpenAI. I am building my own pipeline, but I need to enrich this pipeline with external information. There is.”
As a side note, ChatGPT can browse the web, but GPT models cannot. OpenAI offers both a very popular application (ChatGPT) and an LLM that developers can use with an API (GPT). However, web search is a feature of ChatGPT.
“My favorite example is one of our customers. We built an internal application for our sales reps,” Mizrahi said. “On the one hand, they enumerate all the benefits of their product. And thanks to us, they get fresh, high-quality information about their prospects and enter it into Mistral LLM. Mistral's LLM will generate a type of sales pitch for sales reps and present it in front of them when they are on the phone with a prospect.”
Linkup initially decided to focus on corporate and business information. In addition to a news website, the startup also features a knowledge database. Think Statista, Xerfi, or other resources as well.
It's not the only startup working behind the scenes to provide premium content to LLMs through licensing deals. The most visible competitor is ScalePost, a startup that works with Perplexity to accelerate licensing deals with publishers.
A few months ago, Linkup raised a seed round of 3 million euros ($3.2 million at current exchange rates) from Axeleo Capital, Motier Ventures, Seedcamp, and 100 business angels. The startup currently employs around 10 people and plans to hire another 10 staff over the next year.