Back in 2016, Hammad Syed and former WhatsApp engineer Mahmoud Felfel thought it would be a good idea to build a Chrome extension with text-to-speech functionality for Medium articles. This extension allows you to read Medium stories aloud and was featured on Product Hunt. A year later, it spawned an entire business.
“We saw a huge opportunity in helping individuals and organizations create realistic audio content for their applications,” Syed told TechCrunch. “We can now deploy human-quality voice experiences faster than ever before without having to build our own models.”
Syed and Felfel's company, PlayAI (formerly PlayHT), bills itself as the “voice interface for AI.” Customers can choose from a number of predefined voices or clone a voice and use PlayAI's API to integrate text-to-speech into their apps.
Toggles allow users to adjust the intonation, rhythm, and tenor of their voice.
PlayAI also provides a “playground” where users can upload files and generate read-aloud versions, as well as a dashboard for creating more sophisticated audio narrations and narrations. The company recently got into the “AI agent” game with tools that businesses can use to automate tasks such as answering customer calls.
PlayAI's agent functionality. Build automation tools around the company's text-to-speech engine. Image credit: PlayAI
One of PlayAI's more interesting experiments is PlayNote. It turns PDFs, videos, photos, songs, and other files into podcast-style shows, summary readings, one-on-one discussions, and even storybooks. Similar to Google's NotebookLM, PlayNote generates a script from an uploaded file or URL and feeds it into a collection of AI models, which together craft the final product.
I tried it and the results were not bad. PlayNote's “Podcast” setting produces clips that are roughly on par with NotebookLM in terms of quality, and the tool's ability to capture photos and videos creates some compelling productions. Based on a photo of a chicken mole dish I recently had, PlayNote wrote a 5-minute podcast script about it. Truly, we are living in the future.
Indeed, like any other AI tool, it occasionally produces strange artifacts and hallucinations. PlayNote will do its best to adapt the file to the format of your choice, but don't expect dry legal submissions to produce the best source material, for example. See: Mask vs. OpenAI lawsuit framed as a bedtime story:
PlayNote's podcast format is made possible by PlayAI's latest model, PlayDialog, which Syed says uses the “context and history” of a conversation to generate audio that reflects the flow of the conversation. “By leveraging the historical context of a conversation to control prosody, emotion, and pacing, PlayDialog delivers natural speaking and appropriate tones for conversations,” he continued.
PlayAI, a close competitor to Eleven Labs, has been criticized in the past for its laissez-faire approach to safety. The company's voice cloning tool requires users to check a box indicating they “have all necessary rights or consents” to clone their voices, but there is no enforcement mechanism. I had no trouble cloning Kamala Harris' voice from a recording.
This is concerning given the potential for fraud and deepfakes.
PlayAI's PlayDialog model can generate a two-day “double” conversation that sounds relatively natural. Image credit: PlayAI
PlayAI also claims to automatically detect and block “sexual, offensive, racist, or threatening content.” However, in my testing this was not the case. I used the Harris clone to generate the audio, which frankly I can't embed here, but I never got a warning message.
Meanwhile, PlayNote's community portal is filled with publicly generated content, with files with explicit titles like “Women Performing Oral Sex.”
Syed said PlayAI responds to reports of audio being duplicated without consent, such as in this case, by blocking the responsible user and immediately deleting the duplicated audio. It means. He also notes that the price of PlayAI's highest-fidelity voice clone, which requires a 20-minute audio sample, is higher than most scammers would be willing to pay ($49 per month billed annually, or 99 dollars).
“PlayAI has several ethical safeguards in place,” Said said. “For example, we have implemented robust mechanisms to identify whether audio has been synthesized using our technology. When abuse is reported, we promptly verify the origin of the content and review the situation. We will take decisive action to correct and prevent further ethical violations.”
I hope that's the case. And we hope PlayAI moves away from marketing campaigns featuring deceased tech celebrities. If PlayAI is not properly moderated, it could face legal problems in Tennessee. Tennessee has a law that prohibits platforms from hosting AI to record people's voices without their permission.
PlayAI's approach to training voice clone AI is also a bit opaque. The company won't reveal where it gets its data for its models, ostensibly for competitive reasons.
“PlayAI primarily uses open data sets. [as well as licensed data] and a proprietary dataset built in-house,” Said said. “We do not use user data from the products or creators we train to train our models. Our models are trained on millions of hours of real human voices and are available in multiple languages and Delivering male and female gender voices across accents.”
Most AI models are trained on public web data, some of which may be copyrighted or subject to restrictive licenses. Many AI vendors claim that the fair use doctrine protects them from claims of copyright infringement. But that hasn't stopped data owners from filing class-action lawsuits alleging that vendors used their data without permission.
PlayAI has not been sued. However, its terms of service suggest that it does not intend to represent users if they are found to be under legal threat.
Voice cloning platforms like PlayAI have faced criticism from actors who are concerned that their voice work will eventually be replaced with AI-generated vocals and that actors have little control over how their digital doubles are used.
Hollywood actors union SAG-AFTRA has signed deals with several startups, including online talent marketplace Narrativ and Replica Studios, for “fair” and “ethical” voice cloning arrangements. But even these alliances have come under intense scrutiny, including from members of SAG-AFTRA.
In California, companies using digital replicas of performers (such as cloned voices) are required by law to explain the purpose of using the replicas and negotiate with the performers' lawyers. It also requires entertainment employers to obtain the consent of a deceased performer's estate before using a digital clone.
Syed says PlayAI “guarantees” that all voice clones generated through its platform are exclusive to the creator. “This exclusivity is essential to protect users' creative rights,” he added.
The increasing legal burden is a headwind for PlayAI. The other thing is competition. In addition to Papercup, Deepdub, Acapela, Respeecher, and Voice.ai, established technology giants Amazon, Microsoft, and Google also offer AI dubbing and voice cloning tools. The aforementioned Eleven Labs is one of the hottest voice cloning vendors and is said to have raised new funding at a valuation of more than $3 billion.
However, PlayAI isn't having trouble finding investors. This month, the Y Combinator-backed company closed a $20 million seed round co-led by 500 Startups and Kindred Ventures, bringing total capital raised to $21 million. Race Capital and 500 Global also participated.
“The new funding will be used to invest in our generative AI voice models and voice agent platform, and to accelerate the time for companies to build human-quality voice experiences,” said Said, adding that PlayAI has 40 He added that the company plans to expand its workforce.