For years, Vyas Sekhar has been calling his old friend from undergrad, Mackay Girish, to discuss potential startup ideas and seeking Girish's opinion. The two usually discussed the idea and ended the conversation there. When Sekhar called Girish in early 2022 about his idea for synthetic data, the conversation didn't end when he hung up.
Sekar and his colleague Julia Fanti at Carnegie Mellon University were working on building synthetic data to solve the reproducibility crisis in academia: the inability to reproduce data. While Sekhar recognized the need for a solution in academia, Girish knew his customers at the time were facing the same problem. After speaking with several companies, this thesis was further validated.
“At the time, this felt very real and there was an opportunity,” CEO Girish told TechCrunch. “That was our starting point, and over the next few months we talked to some investors, people we know, and more importantly companies, and we realized that this was a significant issue. I realized that it was worth my whole life.”
The result is Rockfish, a startup that uses generative AI to create synthetic data for operational workflows and helps companies break down data silos. Rockfish integrates with database providers such as AWS and Azure, allowing users to choose the best configuration for their data based on company policies and data usage.
Synthetic data is increasingly gaining traction in the world of AI, but when the company launched in June 2022, it was already gaining momentum. Girish said Rockfish wanted to make sure it built a product that was differentiated from its own offerings. It's a solution that not only your colleagues but also your company uses on a daily basis, not just once in a while.
As such, the company's products are designed to continuously ingest data, with a focus on operational data, including data from financial transactions, cybersecurity, supply chain, and more. These areas continuously generate data for businesses and are constantly changing. Girish believes this focus will differentiate Rockfish from other competitors.
Girish said the company currently works with government departments such as the U.S. Army and the U.S. Department of Defense, as well as a handful of enterprise customers, including streaming analytics platform Conviva.
Rockfish announces a $4 million seed round led by Emergent Ventures with participation from Foster Ventures, TEN13, Dallas VC and others. This brings the company's total funding to approximately $6 million.
Anupam Rastogi, managing partner at Emergent Ventures, told TechCrunch that he had been tracking Sekar long before Rockfish was founded. He said the company's motivation for making the investment was “team, market and product, in that order.” Additionally, Rockfish's focus on building for enterprises makes it a better fit for Emergent than other players in the space.
“The team is very high quality data scientists with multiple Ph.D.s,” Rastogi said. “This is a very technologically sophisticated space, and we think it’s very important to have that technical ability around the table. We did a lot of the basic work.”
Rockfish hopes its focus will help it build a moat among its competitors, but the fact remains that synthetic data is likely to become an increasingly crowded market. AI companies are turning to synthetic data as multiple players believe the market has exhausted other AI training data.
A number of startups are already trying to take on the market, including Tonic AI, which has raised more than $45 million in venture funding. Raised $31 million in VC funding, mostly in AI. Just to name a few, Hazy raised $14.5 million before being acquired by SAS in 2024.
Girish said the company is considering further enhancing its approach to synthetic data by incorporating other types of models, such as state-space models and mathematical models that use state variables. The company is also working on improving end-to-end functionality.
“It's not like taking random data and generating synthetic data for the Internet,” Girish said. “There's no guarantee it's going to work. But when you put all this together for an enterprise, it actually becomes very relevant and realistic. That's the key to this, being able to do it consistently. That’s what we feel helps.”