There is a huge opportunity for generative AI in the world of translation, and a startup called Panjaya is taking the concept to the next level. It is a hyper-realistic generative AI-based video dubbing tool that recreates the original voice of a person speaking a new language. , the video and speaker's physical movements are automatically modified to naturally match the new audio pattern.
The startup, which has been operating in secrecy for the past three years, announced the first version of its product, BodyTalk, along with its first external funding of $9.5 million.
Panjaya is the brainchild of two deep learning specialists, Hirik Shani and Ariel Shalom. They have spent most of their professional lives quietly working on deep learning technology for the Israeli government, and currently serve as the startup's general manager and CTO, respectively. They hung up their G-man hats in 2021 with startup woes, and Guy Piekarz joined as CEO a year and a half ago.
Although Mr. Piekarz is not the founder of Panjaya, he is a prominent figure to bring into the company. Back in 2013, he sold his startup to Apple. The startup, called Matcha, was an early and buzzy player in streaming video discovery and recommendations, and was acquired in the early days of Apple's TV and streaming strategy when it was more talk than actual product. . Matcha was conceived in-house and sold for $10 million to $15 million per song, a modest sum given Apple's eventual pivot to streaming media.
Piekarz spent nearly a decade at Apple, building Apple TV and then its sports division. I was then introduced to Panjaya through one of my backers, Viola Ventures (others include R-Squared Ventures, JFrog co-founder and CEO Shlomi Ben Haim, Chris Rice, Guy Schory, and Ryan of Storm Ventures). Floyd, Riviera Partners' Ali Behnam) and Oded Vardi.
“By that time I had left Apple and was planning to do something completely different,” Piekarz said. “But I was blown away when I saw a demonstration of this technology, and the rest is history.”
BodyTalk is interesting in the way it simultaneously introduces several technologies into the frame that utilize different aspects of synthetic media.
Starting with voice-based translation, we can now provide translations in 29 languages. The translation is then read in a voice that mimics the original speaker and set to the original video version, changing the speaker's lips and other movements to match the new word or phrase. All these are automatically created on the video after the user uploads it to the platform. The platform also comes with a dashboard that includes further editing tools. Future plans include an API to move closer to real-time processing. (For now, Piekarz says BodyTalk is “near real-time,” meaning it takes several minutes to process the video.)
Piekarz said the company uses third-party large-scale language models and other tools: “We use the best when we have to.” “And we’re building unique AI models for which there really aren’t any solutions on the market.”
One example of this, he continued, is the company's lip-syncing. “Our entire lip sync engine was developed in-house by our AI research team, because we have yet to find anything that reaches that level and quality across multiple speakers, angles, and all the business use cases we want to support. Because there isn't.
We are only focused on B2B at the moment. Clients include JFrog and TED media organizations. The company plans to further expand in media, especially in sports, education, marketing, healthcare, and medical fields.
The resulting translated video is quite creepy and no different from what you'd get with a deepfake, but Piekarz winced at the term. Over the years, the term has acquired negative connotations that are diametrically opposed to the startup's target market.
“'Deepfakes' are not something we are interested in,” he said. “We try to avoid that whole name.” Instead, think of Panjaya as part of a “deep reality category,” he said.
By targeting only the B2B market and controlling who has access to its tools, the company is building “guardrails” around its technology to prevent abuse, he added. He also believes that in the long term, more tools will be built, such as watermarking, to detect when a video has been altered to create synthetic media, including both legitimate and malicious content. are. “We definitely want to be a part of that and not allow misinformation,” he said.
It's not that detailed
There are a number of startups competing with Panjaya in the broader field of AI-based translation of videos. These include big players like Vimeo and Celebrity Labs, as well as smaller companies like Speechify and Synthesis. For all of them, building ways to improve dubbing mechanics feels like swimming against a strong current. That's because captions have become a pretty standard part of how videos are consumed these days.
Television does that for a variety of reasons, including poor speakers, ambient noise from busy lives, actors muttering, limited production budgets, and more sound effects. A CBS survey of American TV viewers found that more than half leave subtitles on “some of the time (21%) or all of the time (34%).”
But some people love captions just because they're fun to read, and entire cults have been built around them.
On social media and other apps, subtitles are just built into the experience. As an example, TikTok started turning on subtitles by default on all videos in November 2023.
Still, there is still a huge market for dubbed content internationally, and even though English is considered the lingua franca of the internet, research groups like CSA show that even if English is considered the lingua franca of the internet, there is still a huge market for dubbed content, and even if English is considered the lingua franca of the internet, research groups like CSA show that There is evidence that content that is written in English receives more engagement, especially in English. B2B context. Panjaya's pitch is that more natural, native language content can be even more effective.
Some customers seem to support that theory. According to TED, views of dubbed talks using Panjaya's tools increased by 115%, and the completion rate of translated videos doubled.