Google is committed to AI, and we want you to know that. In Tuesday's I/O Developer Conference keynote, Google mentioned “AI” more than 120 times. That's a lot!
But not all of Google's AI announcements were significant in and of themselves. Some were done in stages. Others have been rehashed. So to help you sort the wheat from the chaff, we've rounded up the top new AI products and features announced at Google I/O 2024.
Generative AI in search
Google plans to use generative AI to organize the entire Google search results page.
What would an AI-composed page look like? Well, it depends on the search query. However, Google said it may display AI-generated summaries of reviews, discussions from social media sites like Reddit, and a list of AI-generated suggestions.
For now, Google plans to show you AI-powered results pages when it detects that you're looking for inspiration, such as when you're planning a trip. Soon, users will also see these results when they search for meal options and recipes, and will soon see results for movies, books, hotels, e-commerce, and more.
Project Astra and Gemini Live
Image credit: Google / Google
Google is improving its AI-powered chatbot Gemini to help it better understand the world around it.
The company previewed a new experience for Gemini called Gemini Live. This allows users to have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot is speaking and ask clarifying questions, and the chatbot adapts to the user's speech patterns in real time. Gemini can also see and react to your surroundings through photos and videos taken with your smartphone's camera.
Gemini Live (scheduled to launch until later this year) can answer questions about what's in (or recently came into) a smartphone's camera's field of view, such as what region a user is in or the name of a broken bike part. . Some of the innovation driving Live comes from Project Astra, a new initiative within DeepMind to create AI-powered apps and “agents” that enable real-time, multimodal understanding.
Google Veo
Image credit: Google
Google is eyeing OpenAI's Sora with Veo. This AI model can create 1080p video clips approximately one minute long when given a text prompt.
Veo lets you capture a variety of visual and cinematic styles, including landscapes and time-lapse shots, and make edits and adjustments to the footage you've already produced. The model understands camera movements and his VFX pretty well from prompts (think descriptors like “pan”, “zoom”, “explosion”, etc.). And Veo has some grasp of physics such as fluid mechanics and gravity, which contributes to the realism of the videos it produces.
Veo also supports mask editing to change specific areas of a video, and generative models like Stability AI's Stable Video allow you to generate video from still images. Perhaps most interestingly, Veo can generate longer videos (over a minute) given a series of prompts that tell a story.
ask for photos
Image credit: TechCrunch
Google Photos is bringing AI to life with the release of an experimental feature called “Ask Photos,” which leverages Google's Gemini family of generative AI models.
Coming later this summer, Ask Photos will allow users to search their entire Google Photos collection using natural language queries that leverage Gemini's understanding of photo content and other metadata.
For example, instead of searching for something specific in a photo, such as “One World Trade,” users will be able to perform broader and more complex searches, such as “find the best photos from each national park I visit.” . ” In that example, Gemini uses signals such as lighting, blur, and the presence or absence of background distortion to determine why a photo is the “best” in a given set, and then combines that with geolocation and date information. Combined with understanding, it returns relevant images.
Gmail Gemini
Image credit: TechCrunch
Gmail users will soon be able to search, summarize, and draft emails thanks to Gemini. You will also be able to take actions on emails for more complex tasks, such as assisting in processing returns.
In a demo at I/O, Google showed how parents can keep track of what's going on at their child's school by asking Gemini to summarize all recent emails from the school. Ta. In addition to the email body, Gemini also analyzes her attachments, such as PDFs, and spits out a summary with important points and action items.
From the Gmail sidebar, users can ask Gemini to organize their email receipts and save them to a Google Drive folder, or extract information from receipts and paste them into a spreadsheet. If it happens often, for example, as a business traveler tracking expenses, Gemini can also automate your workflow for future use.
Detect fraud during calls
Image credit: Google
Google has previewed an AI-powered feature that warns users about potential scams during calls.
The feature, which will be built into future versions of Android, uses Gemini Nano, the smallest version of Google's generative AI service, to run entirely on-device and listen in real-time for “conversation patterns commonly associated with fraud.” Masu. .
A specific release date for this feature has not been determined. As with many of these things, Google is previewing just how much Gemini Nano will be able to do in the future. However, I know this feature is opt-in. This is good. Using Nano means the system doesn't automatically upload audio to the cloud, but the system is still effectively listening to the user's conversations, which is a potential privacy risk.
AI for accessibility
Image credit: Google
Google is enhancing its TalkBack accessibility features for Android with a little bit of generative AI magic.
Soon, TalkBack will leverage Gemini Nano to create auditory descriptions of objects for users with low vision or visual impairment. For example, TalkBack might describe a clothing article as follows: “Close-up of a white and black gingham dress. The dress is short with a collar and long sleeves. It is tied at the waist with a large ribbon.”
According to Google, TalkBack users encounter around 90 unlabeled images a day. Nano allows the system to provide insights about content, potentially eliminating the need for someone to manually enter that information.
Publish an AI newsletter. Sign up here to start receiving it in your inbox on June 5th.