Google is making generally available SynthID Text, a technology that allows developers to watermark and detect text written by generative AI models.
SynthID Text can be downloaded from the AI platform Hugging Face and Google's updated Responsible GenAI Toolkit.
“We are open sourcing the SynthID Text watermarking tool,” the company said in a post on Ta.
So how does it work?
Given a prompt like “What's your favorite fruit?”, the text generation model will tell you which “token” is most likely to follow another one token at a time. Predict each. A token is a single character or word and is a building block that a generative model uses to process information. The model assigns a score to each possible token. The score is the probability that the token is included in the output text. SynthID Text inserts additional information into this token distribution by “adjusting the likelihood that a token will be generated,” Google says.
“The final pattern of word choice scores for both the adjusted probability score and the combined model is considered a watermark,” the company wrote in a blog post. “This pattern of scores is compared to the pattern of expected scores for watermarked and non-watermarked text to determine whether the AI tool generated the text or whether it came from some other source. SynthID can help you discover it.
Google claims that SynthID Text, which has been integrated into its Gemini models since this spring, will work even with trimmed, paraphrased, or altered text without compromising the quality, accuracy, or speed of text generation.
But the company also acknowledges that the watermarking approach has limitations.
For example, SynthID Text does not work well with short texts, texts that have been rewritten or translated from another language, or answers to fact-based questions. “In response to factual prompts, there is less opportunity to adjust token distribution without affecting factual accuracy,” the company explains. “This includes prompts such as 'What is the capital of France?'” or queries where little or no variation is expected, such as “Recite a poem by William Wordsworth.” ”
Google isn't the only company working on AI text watermarking technology. OpenAI has been researching watermarking techniques for years, but delayed its release due to technical and commercial concerns.
If widely adopted, watermarking technology could help turn the tide against inaccurate but increasingly popular “AI detectors” that falsely flag essays written in more common voices. . But the question is whether they will be widely adopted and whether one standard or technology will trump another.
Legal mechanisms to force developers' hands may soon emerge. The Chinese government has introduced mandatory watermarking of AI-generated content, and the state of California is considering doing the same.
The situation has an urgency. According to a European Union law enforcement report, 90% of online content could be synthetically generated by 2026, leading to new law enforcement challenges around disinformation, propaganda, fraud and deception. Possibly.