In February, Google suspended the ability of its AI-powered chatbot Gemini to generate portraits after users complained about historical inaccuracies: When asked to depict a “Roman legion,” for example, Gemini drew an anachronistic, racially diverse group of soldiers and portrayed a “Zulu warrior” as a stereotypically black person.
Google CEO Sundar Pichai apologized, and Demis Hassabis, co-founder of the company's AI research division DeepMind, said a fix would be in place “very quickly,” meaning within a few weeks. In the end, it took much longer than that (despite some Googlers working 120-hour weeks!). But over the next few days, Gemini will be able to create photos with people in them again.
Well…I guess so.
Gemini's persona generation feature will once again be available to select users as part of an early access, English-only test, specifically those who subscribe to one of Google's paid Gemini plans: Gemini Advanced, Business, or Enterprise. Google did not say when the test will be expanded to the free Gemini tier or other languages.
“Gemini Advanced gives users priority access to our newest features,” a Google spokesperson told TechCrunch, “which allows us to gather valuable feedback while providing highly-anticipated features to our premium subscribers first.”
So what fixes did Google implement to its people generation? The company says that Imagen 3, the latest image generation model built into Gemini, includes mitigations to make the people images it generates more “fair.” For example, Imagen 3 was trained on AI-generated captions and designed to “improve the variety and diversity of concepts associated with images.” [its] According to a technical paper shared with TechCrunch, the model's training data was filtered based on “safety” and “reviews.”[ed] “…taking into account issues of fairness,” Google argues.
When asked for more details about Imagen 3's training data, a spokesperson would only say that the model was trained on “a large dataset containing images, text, and associated annotations.”
“We work with independent experts to ensure continuous improvement and have significantly reduced the likelihood of unwanted responses through extensive internal and external red team testing,” the spokesperson continued. “We have been focused on rigorously testing talent generation before it goes live again.”
Image 3 and jewels
The good news is that all Gemini users will be able to get Imagen 3 sometime this week (except for the version for non-Gemini Advanced subscribers).
According to Google, Imagen 3 is more accurate at understanding text prompts to convert into images and has become more “creative and detailed” with each generation compared to its predecessor, Imagen 2. Additionally, Google claims that the model has fewer artifacts and errors, making it the best Imagen model for rendering text.
To ease concerns about the possibility of creating deepfakes, Imagen 3 will use SynthID, a technique developed by DeepMind that applies an invisible cryptographic watermark to media. Google's Pixel Studio output is not watermarked yet. Google previously announced that Imagen 3 would use SynthID, so this isn't too surprising. But it's a bit interesting to see the difference between how Google is handling image generation in Gemini and how it's handling it in its other products.
Alongside Imagen 3, Google is rolling out Gems for Gemini, but only for Gemini Advanced, Business, and Enterprise users. Similar to OpenAI's GPT, Gems are custom versions of Gemini that can act as “experts” on a topic. To create a Gem, users write the instructions for the Gem, give it a name, and it's ready to go.
As Google explains in a blog post:
“With Gems, you can create a team of experts to help you think through tough projects, brainstorm ideas for an upcoming event, or craft the perfect caption for a social media post. Gems can also remember detailed sets of instructions, saving you time on tedious, repetitive or difficult tasks.”
Gems is available on desktop and mobile devices in 150 countries and “most languages,” according to Google, but it's not yet supported on Gemini Live. At launch, there will be several pre-built ones available, including a “learning coach,” a “career guide,” and a “coding partner.”
I asked Google if they had plans to allow users to share and use other people's Gems, like GPT does in the OpenAI GPT Store, and the answer was essentially “no.”
“Right now, we're focused on learning how people use Gems for creativity and productivity,” the spokesperson said. “We don't have anything more to say at this time.”