Data is almost everything when it comes to training AI systems, but accessing enough data to produce high-quality products that live up to their promises is a challenge for even the most well-funded companies.
Advex AI is the problem that Advex AI seeks to address by using generative AI and synthetic data to “solve data problems.” More specifically, Advex allows customers to train computer vision systems using small samples of images, from which Advex generates thousands of “fake” images.
Today we announced that Advex will officially launch on the Startup Battlefield stage at TechCrunch Disrupt 2024, but Advex has already secured a handful of customers in the stealth stage. This includes what the company calls its “seven largest” corporate customers, which it is not at liberty to reveal publicly. TechCrunch can also reveal that the San Francisco-based startup has raised $3.6 million in funding, the majority of which was raised through a $3.1 million seed tranche last December, and with Construct Capital. , Pear VC, and Laurene Powell Jobs' Emerson Collective.
CEO Pedro Pachuca founded Advex with CTO and co-founder Qasim Wani a little more than a year ago. The company has 6 employees. It's remarkable that such a sophisticated startup is entering the industry with real paying customers already, and Pachuca owes that, at least in part, to his background and good old-fashioned connections. It is said that this is due to the construction and cold contact. In fact, Pachuca was previously a machine learning researcher at Berkeley and later joined the research team at Google Brain before it was merged with DeepMind.
“If the ROI is [return on investment] Of course they would. [customers] Please have some faith in us,” Pachuca said. “I've done a lot of research in this field, and being at Google Brain before gave me a little bit of credibility. But it started with cold emails, and that's what gave us My first two big clients were conferences. So I go to conferences a lot!”
Immediately after finishing his interview with TechCrunch, Pachuca is heading to Europe to attend various conferences, including the European Conference on Computer Vision (ECCV) in Milan (Italy) and Vision in Stuttgart (Germany). I was scheduled to attend a conference.
“There are a lot of conferences happening in Europe,” Pachuca said. “Basically, we are going to ECCV to learn and hire,” Pachuca added. “And the vision is more on the industrial side, so we're there to sell.”
Potential customers include traditional developers of machine vision systems, like Cognex and Keyence, who are looking to enhance their products with better AI. But on the other hand, Advex may sell directly to end-user companies, such as automakers and logistics companies, who are building their own in-house tools.
For example, an automaker may need to train computer vision systems to recognize defects in car seat materials. But even if a company has access to hundreds of individual images, the reality is that no two defects look the same. Instead, manufacturers upload a dozen or so photos of torn sheets, from which Advex extrapolates to generate thousands of photos of “defective” sheets, providing a broader and more diverse set of training data. You can build a pool.
The same is true for almost every manufacturing sector, from oil and gas to wooden furniture. It's all about reducing data collection time and cost by artificially creating training images.
Synthetic image generation of resin defects in wood. Image credit:Advex
Synthetic data is, of course, not a new concept, but as the AI revolution is in full swing, companies are looking to fill the data gap. This includes areas such as market research and computers where research samples may be too small. This is the kind of vision we're seeing at companies like Advex, among other VC-backed startups like Synthesis AI and Parallel Domain.
There are two main types of models handled by Advex. The models deployed on the customer's site, trained by the customer's own images, are just standard, off-the-shelf “open source stuff,” as Pachuca puts it. “That's because they need to be small. And I don't think the benefit comes from the architecture of the model. The benefit comes from training it on the right data,” he said.
But the real secret sauce lies in the company's proprietary diffusion model, similar to the likes of Midjourney and Dall-E, which is used to create synthetic data. “It's custom, it's very complex, and that's what we're all about,” Pachuca added.
While Advex's focus on manufacturing is one way it differentiates itself, it's actually its pervasive model approach that it believes sets it apart.
Compared to other simulation and modeling techniques tailored to game/physics engines (such as Unity), Pachuca says Diffusion requires no setup and takes just a few seconds to generate each image-label pair. It means something, and even states that it is much closer. to real data.
“We're not just creating images, we're creating images that you don't have. Specifically, we're trying to understand what's missing and let's create it.” ,” Pachuca said. “And this 'what's missing' part is very difficult and very invisible, but it's one of the biggest innovations we've made.”