Hello everyone, welcome to TechCrunch's regular AI newsletter. If you'd like to have this sent to your inbox every Wednesday, sign up here.
It was surprisingly easy to create a convincing audio deepfake of Kamala Harris on Election Day. It cost $5 and took less than two minutes. This shows how cheap and ubiquitous generative AI has opened the floodgates of disinformation.
Creating a Harris deepfake was not my original goal. I was trying out Cartesia's Voice Changer. This is a model that transforms your voice into another while preserving the original prosody. That second voice could be a “clone” of another person. Cartesia creates digital audio dubs from 10-second recordings.
So I wondered if the voice changer would convert my voice to Harris' voice. I paid $5 to unlock Cartesia's voice cloning feature, cloned Harris' voice using a recent campaign speech, and selected that clone as the voice changer's output.
It worked like a charm:
I'm sure Cartesia didn't exactly intend for their tool to be used this way. To enable audio cloning, Cartesia requires you to check a box indicating that you will not produce anything harmful or illegal, and to consent to the cloning of your audio recordings.
But it's just an honor system. Without any real safeguards, there's nothing stopping you from creating as many “harmful or illegal” deepfakes as you like.
That's a problem, needless to say. So what's the solution? Do you have it? Cartesia, like any other platform, can implement voice authentication. But by the time that happens, there may be new, free audio duplication tools.
I spoke with experts about this very issue at TC's Disrupt conference last week. Some supported the idea of invisible watermarking to make it easier to tell if content was generated by AI. Others pointed to content moderation laws such as the UK's Online Safety Act, which they argued could help stem the tide of misinformation.
Call me a pessimist, but I think those ships have sailed. As Imran Ahmed, CEO of the Center to Combat Digital Hate, puts it, we are witnessing an “eternal bull machine.”
Disinformation is spreading at an alarming rate. High-profile examples over the past year include the X bot network targeting U.S. federal elections and President Joe Biden's voicemail deepfake that discouraged New Hampshire residents from voting. However, since US voters and tech-savvy people are not the target of most of this content, we tend to underestimate its presence in other regions, according to True Media.org's analysis.
According to data from the World Economic Forum, the amount of AI-generated deepfakes increased by 900% between 2019 and 2020.
On the other hand, there are relatively few laws targeting deepfakes. And deepfake detection is shaping up to be a never-ending arms race. Some tools inevitably choose not to use safety measures such as watermarks, or are deployed with clearly malicious applications in mind.
Barring major changes, I think the best thing we can do is be highly skeptical of what's out there, especially viral content. Telling truth from fiction online is not as easy as it once was. But we can still control what we share and what we don't share. And it's much more impactful than you might think.
news
ChatGPT Search Review: My colleague Max tried out ChatGPT Search, OpenAI's new search integration for ChatGPT. He found that while this is impressive in some ways, it is unreliable for short queries containing just a few words.
Amazon Drones in Phoenix: Months after the end of its Prime Air drone-based delivery program in California, Amazon announced that it has started making drone deliveries to some customers in Phoenix, Arizona.
Former Meta AR leader joins OpenAI: The former head of Meta's AR glasses efforts, including Orion, announced Monday that he is joining OpenAI to lead robotics and consumer hardware. The news comes after OpenAI hired X (formerly Twitter) co-founder and challenger Pebble.
Compute holds us back: In a Reddit AMA, OpenAI CEO Sam Altman acknowledged that a lack of computing power is one of the major factors preventing the company from shipping products as often as it would like.
AI-generated summaries: Amazon has launched “X-Ray Recaps,” an AI-powered generation feature that creates concise summaries of entire TV seasons, individual episodes, and even parts of episodes.
Anthropic's Haiku price increase: Anthropic's latest AI model, Claude 3.5 Haiku, is now available. However, it is more expensive than the previous generation, and unlike Anthropic's other models, it still cannot analyze images, graphs, and charts.
Apple acquires Pixelmator: AI-powered image editor Pixelmator announced Friday that it will be acquired by Apple. The partnership comes as Apple has become more aggressive in integrating AI into its imaging apps.
'Agent' Alexa: Amazon CEO Andy Jassy last week hinted at an improved 'agent' version of the company's Alexa assistant, one that can perform actions on your behalf. The revamped Alexa has reportedly faced delays and technical setbacks and may not be released until 2025.
This week's research paper
Pop-ups on the web can fool not only your grandparents but also AI.
In a new paper, researchers at the Georgia Institute of Technology, the University of Hong Kong, and Stanford University show that an AI “agent” (an AI model that can complete a task) can “hostile pop-ups” that instruct the model to do things like: ” indicates that there is a possibility of hijacking. Download malicious file extensions.
Image credit: Zhang et al.
Some of these pop-ups are clearly traps to the human eye, but AI is less insightful. Researchers say the image and text analysis models they tested were unable to ignore pop-ups 86% of the time, and were 47% less likely to complete the task as a result.
Basic defenses such as telling the model to ignore pop-ups had no effect. “Significant risks still exist with the deployment of computer-based agents, and more robust agent systems are needed to ensure secure agent workflows,” study co-authors said. .
this week's model
Meta announced yesterday that it is working with partners to make Llama's “open” AI models available for defense applications. Today, one of those partners, Scale AI, announced Defense Llama, a model built on Meta's Llama 3 and “customized and fine-tuned to support America's national security mission.” .
Defense Llama, available on Scale's Donavan chatbot platform for U.S. government customers, is optimized for military and intelligence planning, Scale said. Defense Llama can answer defense-related questions, such as how an enemy plans attacks on US military bases.
So what's the difference between a defense rama and a stock rama? According to Scales, the “defense rama'' is based on military doctrine, international humanitarian law, the capabilities of various weapons and defense systems, and other topics that may be relevant to military operations. It is said that it has been adjusted. It is also not restricted to answering questions about the war like civilian chatbots are.
Image credit:Scale.ai
However, it is unclear who is likely to use it.
The US military has been slow to adopt generative AI and is skeptical about its ROI. So far, the U.S. military is the only branch of the U.S. military to deploy generative AI. Military officials have expressed concerns about security vulnerabilities in commercial models, as well as legal challenges related to sharing intelligence data and the models' unpredictability in the face of edge cases.
grab bag
Spawning AI, a startup that creates tools that allow creators to opt out of generative AI training, has released an image dataset for training AI models that it claims is completely in the public domain.
Most generative AI models are trained on public web data, some of which may be copyrighted or subject to restrictive licenses. OpenAI and many other AI vendors claim that the fair use doctrine protects them from claims of copyright infringement. But that hasn't stopped data owners from filing lawsuits.
Spawning AI's training dataset of 12.4 million image/caption pairs includes only content with “known provenance” and “clear and unambiguous rights labeling” for AI training. says. Unlike other datasets, it can also be downloaded from a dedicated host, eliminating the need for web scraping.
“Importantly, the dataset's public domain status is essential to these larger goals,” Spawning wrote in a blog post. “Datasets containing copyrighted images will continue to rely on web scraping, as hosting the images will infringe the copyright.”
Spawning's dataset PD12M and a version selected for “aesthetically pleasing” images, PD3M, can be found at this link.