Google announced Gemini Live on Tuesday at its Made By Google event in Mountain View, Calif. The feature lets you have semi-natural conversations, rather than typing, with an AI chatbot powered by Google's latest large-scale language models. TechCrunch was there to test it out firsthand.
Gemini Live is Google's answer to OpenAI's Advanced Voice Mode, a near-identical feature from ChatGPT that's currently in limited alpha testing. OpenAI demoed the feature before Google, but Google is the first to show the finished feature publicly.
In my experience, these low-latency voice features feel much more natural than sending a text with ChatGPT or talking to Siri or Alexa. I found that Gemini Live responded to questions in under two seconds and was pretty quick to respond when interrupted. Gemini Live isn't perfect, but it's the best way to use your phone hands-free that I've seen to date.
structure
Before you speak on Gemini Live, the feature lets you choose from 10 voices, although OpenAI only has three. Google worked with voice actors to create each voice, and the variety is commendable, and they all sound very human.
In one example, a Google product manager verbally asked Gemini Live to find family-friendly wineries near Mountain View that had outdoor areas and playgrounds that they could bring their kids to. This is a much more complicated task than asking Siri or Google Search, but Gemini was able to successfully recommend a place that fit the criteria: Cooper-Garrod Vineyards in Saratoga.
But Gemini Live leaves something to be desired. It seems to hallucinate a nearby playground, the Henry Elementary School playground, which is reportedly “10 minutes” from the Vineyard. There are other playgrounds near Saratoga, but the closest one, Henry Elementary, is over a two-hour drive away. Redwood City has Henry Ford Elementary, but it's a half-hour drive.
Google liked to show that if a user interrupted a Gemini Live conversation, the AI would quickly change direction, the company said, giving the user control over the conversation. In practice, the feature didn't work perfectly: At times, Google project managers and Gemini Live were talking over each other, and the AI didn't seem to pick up on what the other person was saying.
Notably, product manager Leland Rechis said Google doesn't allow Gemini Live to sing or imitate any voice outside of the 10 it offers, likely to avoid violating copyright laws. Additionally, Rechis said Google isn't focusing on helping Gemini Live understand the emotive intonation of a user's voice, something OpenAI was touting during the demo.
Overall, this seems like a great way to dig deeper into a subject in a more natural way than a simple Google search. Google says Gemini Live is the first step towards Project Astra, a fully multimodal AI model that the company announced at Google I/O. For now, Gemini Live only supports voice conversations, but the company hopes to add real-time video understanding capabilities in the future.