AI models that understand video as well as text enable powerful new applications. At least that's what Jae Lee, co-founder of Twelve Labs, believes.
Admittedly, Mr. Lee is a bit biased. Twelve Labs trains video analytics models for a variety of use cases. However, there may be some truth to his argument.
Twelve Labs' model allows users to search for videos of specific moments, summarize clips, and ask questions such as “When did the person in the red shirt enter the restaurant?” You can. This is a powerful feature set. Perhaps this is why the company has attracted big-name backers like Nvidia, Samsung, and Intel.
video search
For Lee, a data scientist by training, basic search didn't make sense for video. Keyword searches can retrieve the title, tags, and description, but not the actual content of the clip.
“Video is the fastest growing and most data-intensive medium, yet most organizations aren't willing to commit the human resources to sifting through their entire video archive,” Lee told TechCrunch told. “Attempting manual tagging doesn't solve the problem. Finding a specific moment or angle in a video is like looking for a needle in a haystack.”
After failing to find a better solution, Lee recruited colleagues Aiden Lee, SJ Kim, Dave Chung, and Soyoung Lee to build one. This was the beginning of Twelve Labs, which trains models that map text to what's happening in a video, such as actions, objects, and background sounds.
Models such as Google's Gemini can search footage, and Microsoft and Amazon, among others, offer video analysis services that identify objects in clips. But Lee insists that Twelve Labs' products stand out with customization options that allow customers to use their own data to tweak the models.
Jae Lee, co-founder and CEO of Twelve Labs. Image credit: Twelve Labs
“Companies like OpenAI and Google are investing heavily in general-purpose multimodal models,” Lee says. “But these models are not optimized for video. Our differentiation is in being video-first from the start. We believe video deserves to be the sole focus, not an add-on.”
Developers can build apps on Twelve Labs models to search for video footage and more. The company's technology can power things like ad insertion, content moderation, and the automatic generation of highlight reels from clips.
When I spoke with Lee last year, I asked about the possibility of bias in Twelve Labs' models. That's a big risk factor. A 2021 study found that training a video understanding model on local news clips that tend to cover crime in racist ways can cause the model to learn racist patterns. .
Lee said at the time that Twelve Labs plans to release benchmarks and datasets related to model ethics. The company hasn't done that yet. In a recent chat, Lee assured me that these tools are in development, and that Twelve Labs conducts bias testing on all models before release.
“We have not yet published a formal bias benchmark because we want to make sure it is meaningful, practical, and actionable,” he said. “Our overall goal is to develop benchmarks that not only hold us accountable, but also set standards for the industry…until we fully achieve this goal, and the team working on this is able “We are actively working to create AI that responsibly empowers organizations, respects people's civil liberties, and drives technological change.”
Lee added that Twelve Labs uses a combination of public domain and licensed data to train its models and does not source customer data for training.
growth mode
Video analytics remains at the core of what Twelve Labs does. But to stay agile, the company is also branching out into areas like “any-for-any” search and multimodal embedding.
Marengo, one of Twelve Labs' models, can search for images and audio in addition to video, and can accept reference audio recordings, images, or video clips to guide the search.
Additionally, the company offers the Embed API, an API for creating multimodal embeddings of video, text, images, and audio files. Embeddings are mathematical representations that capture the meaning and relationships between different data points, and are useful for applications such as anomaly detection.
Twelve Labs' growing product portfolio has helped the startup secure customers in the enterprise, media, and entertainment sectors. Two key partners are Databricks and Snowflake, both of which have incorporated Twelve Labs tools into their products.
Twelve Labs builds multimodal video understanding models. Some answer questions, others run searches. Image credit: Twelve Labs
Databricks has developed an integration that allows customers to call Twelve Labs' embedded services from their existing data pipelines. Meanwhile, Snowflake is creating a connector to the Twelve Labs model for Cortex AI, a fully managed AI service.
“More than 30,000 developers are currently using our platform, from individuals experimenting to large enterprises integrating our technology into their workflows,” said Lee. “For example, we are partnering with local governments on use cases such as real-time threat detection, enhancing emergency response times, and assisting with traffic management.”
As part of their strategic support, both Databricks and Snowflake invested in Twelve Labs this month through their respective venture arms. SK Telecom and Hubspot Ventures participated along with In-Q-Tel, an Arlington, Virginia-based nonprofit VC firm that invests in startups that support U.S. intelligence operations.
Total new investment is $30 million, bringing Twelve Labs' total funding to $107.1 million. Lee says the proceeds will go towards product development and hiring.
“While we are in a very strong financial position, we saw an opportunity to deepen important strategic relationships with leaders who deeply believe in Twelve Labs,” said Lee. “We currently have 73 full-time employees and plan to invest significantly in recruiting across engineering, research and customer-facing roles.”
new employee
Speaking of hiring, Twelve Labs announced Thursday that it is adding a president to its executive leadership. Yoon Kim is the former chief technology officer of SK Telecom and the key architect behind Apple's Siri. Yun will also serve as Twelve Labs' chief strategy officer, spearheading the startup's aggressive expansion plans.
“It's unusual for a company of Twelve Labs' age to hire a president, but this move is a testament to the demand we've been experiencing,” Lee said, adding that Yoon splits his time between Twelve Labs' San Francisco headquarters and its headquarters. He added that he would be splitting his time. Office in Seoul. “Yun is the right person to help us execute. He will drive future growth with significant acquisitions, expand our global presence, and lead our team toward our ambitious goals. It will help you adjust.”
Lee says the goal is to grow into new adjacencies such as automotive and security over the next few years. Given In-Q-Tel's involvement, security (and possibly defense operations) seems essential. Mr. Lee did not explicitly acknowledge this.
“The investment from In-Q-Tel reflects the versatility and potential of our technology in many areas, including national security,” said Lee. “We are always open to exploring opportunities where our technology can have a positive, meaningful and responsible impact, consistent with our ethical guidelines.”