Generative AI has captured the public's imagination with breakthroughs in creating elaborate, plausible, and authentic text and images from verbal prompts. But the problem, and it's a common one, is that if you look closely, the results are often far from perfect.
Whether it's pointing out something wrong with your finger or a tile slipping on the floor, that's what math problems are all about. The problem is that sometimes the numbers don't add up.
Synthesia, one of the more ambitious AI startups currently working on videos, specifically custom avatars designed for business users to create promotions, training, and other enterprise video content, has identified some of its challenges. We're releasing an update that we hope will help you jump through the cracks. specific field. Its latest version features avatars built based on real humans shot in a studio, with more emotion, better lip tracking, and more expressive content when you enter text to generate videos. It is said that it can express human movements in a natural way.
This release follows the company's impressive progress to date. Unlike other generative AI players like OpenAI, the company plans to significantly increase public awareness with consumer-facing tools like ChatGPT, while also building B2B services whose APIs will be used by independent developers and large corporations. We are building a core strategy, but Synthesia is starting to tilt. It builds on the approach taken by other prominent AI startups.
Just as Perplexity is focused on ensuring that generative AI search is possible, Synthesia is focused on ensuring that we build the most human-like generative video avatars possible. More specifically, we are considering applying this only to the business market and use cases such as training and marketing.
This focus helps Synthesia stand out in a very crowded market for AI that risks becoming commoditized once the hype settles into longer-term concerns such as ARR, unit economics, and operational costs associated with implementing AI. It's helpful.
Synthesia describes the new version of Expressive Avatar being released today as the first of its kind, describing it as “the world's first avatar completely generated by AI.” Synthesia is built on large-scale pre-trained models, and its breakthrough lies in the way they are combined to achieve multimodal distributions that more closely mimic real human speech. It states that there is.
According to Synthesia, these are generated on the fly, and are intended to approximate the experiences we have when speaking and reacting in real life, and many of them are based on today's avatars. In contrast to the behavior of AI video tools. In practice, it quickly stitches together many videos to create a facial reaction that more or less matches the input script. The goal is to look less robotic and more authentic.
Previous version:
New version:
As you can see in the two examples here (an older version of Synthesia and the version released today), there's still a way to go, as CEO Victor Riparbelli himself admits.
“Of course, we're not 100% there yet, but we'll be there very soon, by the end of the year. That would be pretty amazing,” he told TechCrunch. “You can also see that the AI part of this problem is very subtle. In humans, there is so much information in very small details, such as facial muscle movements. I don't think you'll ever be able to sit down and explain, “Sure, I smile like this when I'm happy, but it's fake, right?'' It's very complicated for humans, but it's possible. [captured in] Deep learning network. They can actually see patterns and reproduce them in a predictable way. ” The next thing he’s working on is his hands, he added.
“My hands seem very stiff,” he added.
The B2B focus will also help Synthesia focus its messaging and products more on the use of “secure” AI. This is especially essential given today's great concerns about deepfakes and the use of AI for malicious purposes such as misinformation and fraud. Still, Synthesia hasn't completely avoided controversy on that front. As previously noted, Synthesia's technology has previously been misused to generate propaganda in Venezuela and misinformation promoted by pro-Beijing social media accounts.
The company said today it has taken further steps to limit its use. Last month, the company updated its policies to “restrict the types of content people can create, invest in early detection of malicious actors, increase the team working on AI safety, and strengthen content authentication technologies such as C2PA.” We will experiment.” ”
Despite these challenges, the company has continued to grow.
The last time Synthesia raised $90 million, it was valued at $1 billion. Notably, this funding took place almost a year ago, in June 2023.
Riparbelli (pictured above, right, with other co-founders Steffen Tjerrild, Professor Lourdes Agapito and Professor Matthias Niessner) said in an interview earlier this month that there were no plans to raise further funding at this time, but that It doesn't actually answer questions like. Whether Synthesia is being actively approached. (Note: We're really looking forward to having the real-life Riparbelli speak at our event in London in May. We'll definitely be hearing about this again. When you're in London Please come visit us.)
What we know for sure is that building and running AI costs a lot of money, and Synthesia has done a lot of the building and running.
The company says that before the release of today's version, approximately 200,000 people had created more than 18 million video presentations in approximately 130 languages using Synthesia's 225 legacy avatars. (It doesn't say how many users are in the paid tier, but it has a number of high-profile customers, including Zoom, BBC, DuPont, and others, and companies are actually paying the fees.) Of course, the startup's hopes are It is as follows. That number will likely increase further as a new version is released today.