Alexis Conneau thinks deeply about the movie “Her.” In recent years, he has become obsessed with trying to bring Samantha, the film's fictional voice technology, to life.
Conneau also uses a photo of Joaquin Phoenix's character in the film as his Twitter banner.
Conneau's X/twitter banner (Image credit: X)
He kind of did that with ChatGPT's Advanced Voice Mode, a project Conneau started at OpenAI after doing similar work at Meta. AI systems process audio natively and respond with conversations just like a human would.
Now, he's building something better with a new startup called WaveForms AI.
Conneau spends a lot of time thinking about ways to avoid the dystopias depicted in movies, he told TechCrunch in an interview. “Her” was a science fiction movie about a world where people form intimate relationships with AI systems rather than other humans.
“This movie is dystopian, right? That's not the future we want,” Conneau said. “We want that technology to be here, to be here, and to be there forever. We want to do the exact opposite of what the company in that movie did. is.”
Building technology, minus the dystopia that comes with it, seems contradictory. But Conneau plans to build it anyway, and is confident his new AI startup will help people “feel AGI” with their ears.
On Monday, Conneau launched WaveForms AI, a new audio LLM company that trains its own foundational models. The company aims to release an AI audio product in 2025 that will compete with products from OpenAI and Google. The startup announced Monday that it has raised $40 million in seed funding led by Andreessen Horowitz.
Conneau says Marc Andreessen, who has previously written that AI should be integrated into every aspect of human life, has a personal interest in his efforts.
It's worth noting that Conneau's obsession with the movie “Her” may have put OpenAI in trouble at one point. Scarlett Johansson sent a legal threat to Sam Altman's startup earlier this year, ultimately forcing OpenAI to remove one of ChatGPT's voices, which closely resembled her character in the film. OpenAI denied ever attempting to recreate her voice.
But there's no denying how much of an influence this film had on Conneau. When released in 2013, “Her” was decidedly science fiction. At the time, Apple's Siri was very new and very limited in functionality. But today, this technology feels frighteningly within reach.
AI companionship platforms like Character.AI reach millions of users every week who just want to have a conversation with a chatbot. This field is emerging as a popular use case for generative AI, albeit with sometimes tragic and disturbing consequences. You can imagine how someone who types on chatbots all day would appreciate the opportunity to talk to them, especially with a persuasive technology like ChatGPT's advanced voice mode.
WaveForms AI's CEO is wary of the AI companion space, which is not the core of his new company. Conneau believes people will use WaveForms' products in new ways (like learning about something by having a 20-minute conversation with an AI in your car), but he wishes the company was more “horizontal.” states.
“[WaveForms AI] You can be an inspirational teacher. “Maybe it's a teacher you'll never meet in your life, at least not in your physical life,” the CEO said.
In the future, he thinks talking to generative AI will become a more common way to interact with all kinds of technology. This may include talking to your car or talking to your computer. WaveForms aims to provide “emotionally intelligent” AI that makes everything easier.
“I don't believe in a future where human-AI interaction replaces human-human interaction,” Conneau said. “If anything, it will be complementary.”
AI can learn from social media mistakes, he says. For example, he doesn't think AI should optimize for “time on platform,” a common metric of success for social apps that can encourage unhealthy habits like doomscrolling. More broadly, we want to ensure that WaveForms' AI is in the best interests of humans, calling this “the most important work you can do.”
Conneau said OpenAI's project name, “Advanced Voice Mode,” doesn't accurately represent how different the technology is from ChatGPT's regular voice mode.
The old voice mode was really just translating voice to text, running it over GPT-4, and converting that text to voice. This was a somewhat hacked solution. However, in Advanced Voice Mode, GPT-4o actually decomposes the voice's audio into tokens (apparently, each second of audio equates to approximately 3 tokens) and passes those tokens through an audio-specific transformer model. Conneau said they are doing so directly. . He explained that this is what allows Advanced Voice Mode to have such low latency.
One of the claims often thrown around when talking about AI audio models is that they can “understand emotions.” Just as text-based LLMs are based on patterns found in large amounts of text documents, audio LLMs do the same for audio clips of humans speaking. Humans label these clips as “sad” or “excited,” so when the AI model hears what you say, it can recognize similar audio patterns and respond with its own emotional intonation. there is. So rather than “understanding emotions,” they are systematically recognizing the quality of audio that humans associate with those emotions.
Make AI more personal, not smarter
Conneau is betting that today's generative AI doesn't need to be significantly smarter than GPT-4o to produce better products. Rather than improving the underlying intelligence of these models, like OpenAI's o1, WaveForms simply aims to make it easier to interact with AI.
“There will be a market of people.” [using generative AI] No one will choose the interaction that is most enjoyable to them,” Conneau said.
That's why the startup is confident it can develop its own basic model — ideally one that's cheaper, faster to run, and smaller. This isn't a bad bet considering recent evidence that the old AI scaling laws are slowing down.
According to Konow, his former colleague at OpenAI, Ilya Satskeva, often talked about trying to “feel AGI,” or intuitively assessing whether humans are on the path to superintelligent AI. Apparently they were talking. The CEO of WaveForms believes that achieving AGI is a feeling, rather than hitting some kind of benchmark, and that Audio LLM is the key to that feeling.
“I think you can feel the AGI more when you talk to it, hear the sounds of the AGI, and actually talk to the transformer itself,” Connaugh said, echoing comments he made to Sutskever. Ta. dinner.
But it's clear that startups have a responsibility to make AI more talkable, while also finding ways to prevent people from becoming addicted to it. But Martin Casado, a general partner at Andreessen Horowitz who led the investment in WaveForms, says people having more conversations with AI isn't necessarily a bad thing.
“I can talk to random people on the internet, and they can bully me and take advantage of me. I can talk to video games that can be arbitrarily violent, and I can interact with AI. You can also talk,” Casado said in an interview with TechCrunch. “I think this is an important study of the problem, and I wouldn’t be surprised if it turns out that [talking to AI] In fact, that's preferable. ”
Some companies may view someone having a loving relationship with an AI as an indicator of success. But from a social perspective, it could also be seen as a sign of complete failure, just as the movie Her tried to portray. That's the tightrope WaveForms must walk now.