Hello everyone and welcome to TechCrunch's regular AI newsletter. If you'd like to receive this newsletter in your inbox every Wednesday, sign up here.
It's been just a few days since OpenAI unveiled its latest flagship generative model, o1, to the world. Marketed as an “inference” model, o1 essentially takes time to “think” about a question before answering it, breaking down the problem and finding its own answer.
There's a lot that o1 doesn't do well, and OpenAI itself acknowledges this. But in some tasks, like physics and math, o1 is better than OpenAI's previous best-performing model, GPT-4o, despite not necessarily having many more parameters. (In AI and machine learning, billions of “parameters” typically equate to roughly the model's problem-solving ability.)
And this has implications for AI regulation.
For example, California’s bill, SB 1047, imposes safety requirements on AI models that cost more than $100 million to develop or that are trained using computational power above a certain threshold. However, models like o1 show that scaling up training compute is not the only way to improve model performance.
In a post on X, Nvidia research manager Jim Fan argued that future AI systems may rely on small, easy-to-train “inference cores” rather than the training-intensive architectures that have been all the rage recently (such as Meta's Llama 405B). He noted that recent academic studies have shown that small models like the o1 can significantly outperform larger models if they're given time to think carefully about a question.
So were policymakers shortsighted in tying AI regulatory measures to computing? Sarah Hooker, lab director at AI startup Cohere, said in an interview with TechCrunch that yes.
[o1] It points out how incomplete this idea of using model size as a proxy for risk is. It doesn't take into account everything that inference and model runs can do. To me, this is a combination of flawed science and policy that focuses on future risks instead of the risks we see in the world today.
So does that mean lawmakers should gut their AI bills and start over? No. Many bills are written to be easily amended, with the assumption that AI will continue to evolve long after the bill is enacted. For example, California's bill would give the state's Department of Government Operations the power to redefine the computing thresholds that trigger the law's safety requirements.
The challenge, indeed, is determining which metrics may be better proxies for risk than training computing. As with many other aspects of AI regulation, this is something to ponder as bills move toward passage in the U.S. and around the world.
news
Image credit: David Paul Morris/Bloomberg/Getty Images
First reactions to o1: Max got initial impressions of o1 from AI researchers, startup founders, and venture capitalists, and tested the model himself.
Altman steps down from safety committee: OpenAI CEO Sam Altman has stepped down from the company's committee that reviews the safety of models like o1, apparently in response to concerns that he would not act impartially.
Slack transforms into an agent hub: At parent company Salesforce's annual Dreamforce conference, Slack announced new features such as AI-generated meeting summaries and integration with image-generated and AI-driven web search tools.
Google will start flagging AI images: Google says it will be making changes to Google Search to make it clearer which images in search results were generated with AI or edited by AI tools.
Mistral Launches Free Plan: French AI startup Mistral has launched a new free plan to allow developers to tweak and test their apps using the company's AI models.
Snap unveils video generator: Snapchat announced at its annual Snap Partner Summit on Tuesday that it's introducing a new AI video generation tool for creators that will allow select creators to generate AI videos from text prompts, and soon from image prompts.
Intel signs major chip deal: Intel announced it would co-develop AI chips with AWS using its 18A chip manufacturing process. The companies described the deal as a “multi-year, multi-billion-dollar framework” that could include additional chip designs.
Oprah's AI Special: Oprah Winfrey aired a special on AI with guests including OpenAI's Sam Altman, Microsoft's Bill Gates, tech influencer Marques Brownlee, and current FBI Director Christopher Wray.
Research Paper of the Week
We know AI can be persuasive, but can it dig people out of deep holes of conspiracy theories? Well, not just with AI. But a new model from Costello and colleagues at MIT and Cornell University can deal a blow to beliefs about false conspiracy theories that last at least a few months.
In their experiments, they had people who believed conspiracy-related statements (such as “9/11 was an inside job”) engage in conversations with a chatbot that gently, patiently, and endlessly presented evidence to counter their claims. These conversations, at least measurably, led the humans involved to say that after two months, related beliefs were reduced by 20%. Here's an example of a conversation in progress:
While it’s unlikely that someone deeply involved in reptilian or deep state conspiracies would consult or believe such an AI, the approach could be more effective when used at critical times when people are first exposed to these theories. For example, a teenager searching for “can jet fuel melt steel” might experience a learning moment rather than a tragic one.
Model of the Week
This isn't a model, but it has something to do with models. Researchers at Microsoft published an AI benchmark this week called Eureka, which (in their words) ” [model] The evaluation will be carried out in an open and transparent manner.”
There are many AI benchmarks out there. So what makes Eureka different? Researchers say that Eureka (which is actually a collection of existing benchmarks) has chosen tasks that are difficult for even the “most competent models.” Specifically, Eureka tests capabilities that are often overlooked in AI benchmarks, such as visuospatial navigation skills.
To show how challenging Eureka is for models, the researchers benchmarked systems including Anthropic's Claude, OpenAI's GPT-4o, and Meta's Llama. Not a single model scored well on all of Eureka's tests. The researchers said this highlights the importance of “continuous innovation” and “targeted improvements” to models.
Grab Bag
In a victory for professional actors, California passed two laws, AB 2602 and AB 1836, restricting the use of AI digital replicas.
The legislation, supported by the performers' union SAG-AFTRA, would require companies that use digital replicas of performers (such as voice or image clones) to describe the uses of the replicas with “reasonable specificity” and negotiate with the performers' legal counsel or labor union. It would also require entertainment industry employers to get the consent of a deceased performer's estate before using a digital replica of that performer.
According to a report from The Hollywood Reporter, the bill codifies concepts that SAG-AFTRA fought in a 118-day strike against studios and major streaming platforms last year. California is the second state to impose restrictions on the use of actors' digital likenesses, after Tennessee, which SAG-AFTRA also sponsored.