Labeling and annotation platforms may not get as much attention as flashy new generative AI models. But they're essential: the data that many models train on must be labeled, otherwise the models will not be able to interpret that data during the training process.
Annotation is a massive task, with the large, sophisticated datasets in use requiring thousands or even millions of annotations. To ease that burden, Eric Landau and Ulrik Hansen founded Encord, which they call a “data development” platform for businesses to manage and prepare data for AI models.
Now the company has raised another $30 million in a Series C round led by Next47, bringing Encord's total funding to $50 million, and the new funding will be used to double the size of Encord's product, engineering and AI research teams over the next six months and to expand the company's San Francisco office, Landau told TechCrunch.
“We plan to grow our headcount from 70 to 100 by the end of the year,” he added. “We're currently headquartered in London and San Francisco, with team members across the world.”
Landau conducted particle physics research and began working in big data systems as an undergraduate at Stanford University, while Hensen worked in global markets at JP Morgan, dealing in emerging market derivatives.
Hensen says the seed of the idea for Encode was born while he was working on a data-intensive AI project during his Masters in Computer Science at Imperial College London. Frustrated by the time it took to curate and label data, he met with Landau, whom he'd met in London's entrepreneurial scene, to discuss how they could solve data problems together.
Image credit: Encord
“Combining Hensen's software development expertise with my insights from quantitative research to automate data development, we launched the first iteration of our Encord product at Y Combinator in the spring of 2021,” Landau told TechCrunch. “The Encord platform gives companies the tools to prepare their data for AI and evaluate how effectively that data supports their models.”
Encord is one of many vendors vying for the contract, with the data annotation and labeling market predicted to grow to $3.6 billion by 2027. Besides Scale AI, which is at the center of the conversation, other startups include Datasaur, which can automatically create models from a set of labels, Heartex, which is building an open source data labeling platform, and Dataloop, a data annotation tool provider.
Landau says what sets Encord apart is the versatility of its platform.
Encord allows teams to explore and visualize data sets (including image, video, and audio data sets) from private and public cloud storage, and compare the performance of different models trained on the same sets. The platform detects accuracy issues with the model and suggests additional training data to help fix those issues.
“Unlike piecemeal solutions that only address certain parts of the data stack, Encord enables companies to consolidate all their data workflows into one platform,” Landau said. “This consolidation gives companies traceability and sheds light into the often opaque 'black box' of AI, helping them understand why their models make certain decisions.”
Image credit: Encord
Encode's strategy seems to be working so far: The company counts 120 customers, including Philips, the buzzy AI startup Synthesia, and healthcare providers Cedars-Sinai and Northwell Health, as well as undisclosed contracts with military and government agencies. Landau claims that Encode quadrupled its revenue last year and could be cash-flow positive by 2025 if it doesn't continue to grow its headcount.
“We are seeing the opposite of an economic slowdown,” Landau said, “that said, we are cognizant of broader market conditions and are taking a conservative approach to capital allocation.”
Other participants in the new funding round included Y Combinator, CRV and Crane Venture Partners.