The key to high-quality AI may be high-quality data. Dataset management practices are becoming increasingly important, as research shows that it is dataset curation, not size, that actually impacts AI model performance. is not surprising. According to several studies, today's AI researchers spend a lot of time on data preparation and organization tasks.
Brothers Vahan and Tigran Petrosyan were frustrated by the amount of data they had to manage when training algorithms at university. Vahan even worked on creating data management tools during his doctoral studies. Research on image segmentation.
After a few years, Vahan realized that developers and even companies were willing to pay for similar tools. So the brothers founded a company called SuperAnnotate to build it.
“With the explosion of innovation around models and multimodal AI in 2023, the need for high-quality datasets will become even more acute as organizations have multiple use cases that require specialized data. ,” Vahan said in a statement. “We saw an opportunity to build an easy-to-use, low-code platform that is like a Swiss Army knife for modern AI training data.”
SuperAnnotate, whose clients include Databricks and Canva, helps users create and track large-scale AI training datasets. The startup initially focused on labeling software, but now offers tools for fine-tuning, iterating, and evaluating datasets.
Image credit: SuperAnnotate
SuperAnnotate's platform allows users to connect data from local sources and the cloud to create data projects that can be collaborated with teammates. From the dashboard, users can compare the performance of models by the data used to train them, and when ready, deploy those models to different environments.
SuperAnnotate also provides businesses with access to a marketplace of crowdsourced workers for data annotation tasks. Annotations are typically text that labels the meaning or parts of the data that a model is trained on, acting as guideposts for the model and “teaching” it to differentiate between things, places, and ideas.
Frankly, there are several threads on Reddit about how SuperAnnotate handles the data annotators they use, and they're not very satisfying. Annotators complain of communication problems, unclear expectations, and low pay.
SuperAnnotate, on the other hand, claims that it pays fair market rates and that its demands on its annotators do not deviate from industry standards. We have asked the company to provide more information about its practices and will update this article if we receive a response.
There are several competitors in the AI data management space, including startups such as Scale AI, Weka, and Dataloop. But San Francisco-based SuperAnnotate has managed to hold on, recently raising $36 million in a Series B round led by Socium Ventures with participation from Nvidia, Databricks Ventures, Play Time Ventures, and Defy.vc. .
The new capital, which brings SuperAnnotate's total funding to just over $53 million, will be used to expand its current team of approximately 100 people, research and develop its products, and expand SuperAnnotate's customer base of approximately 100 companies.
“We aim to build a platform that can fully adapt to the evolving needs of enterprises and offer extensive customization in data fine-tuning,” Vahan said.