Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

TC Last time to save all stage paths

June 22, 2025

2 days left to save up to $210 with TC All Stage Pass

June 21, 2025

New Mathematics: Why seed investors have sold winners before

June 20, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    SNAP gets Saturn, a social calendar app for high school and university students

    June 20, 2025

    The X app code refers to the physical card that comes to X money

    June 20, 2025

    Deezer begins labeling AI-generated music to tackle streaming scams

    June 20, 2025

    New code for Spotify's apps refers to the much-anticipated “lossless” layer

    June 18, 2025

    Glitch turns the thread into a literal echo chamber

    June 18, 2025
  • Crypto

    Hackers steal and destroy millions of Iran's biggest crypto exchanges

    June 18, 2025

    Unique, a new social media app

    June 17, 2025

    xNotify Polymarket as partner in the official forecast market

    June 6, 2025

    Circle IPOs are giving hope to more startups waiting to be published to more startups

    June 5, 2025

    GameStop bought $500 million in Bitcoin

    May 28, 2025
  • Security

    Iran's government says it will shut down the internet to protect against cyber attacks

    June 20, 2025

    According to web surveillance companies, the internet will collapse across Iran

    June 18, 2025

    Pro-Israel hacktivist group claims responsiveness to alleged Iranian bank hacks

    June 17, 2025

    Pro-Israel Hacktivist Group has allegedly blamed for alleged Iranian bank hacks

    June 17, 2025

    As food shortages continue, UNFI says it is recovering from cyberattacks

    June 17, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    TC Last time to save all stage paths

    June 22, 2025

    2 days left to save up to $210 with TC All Stage Pass

    June 21, 2025

    New Mathematics: Why seed investors have sold winners before

    June 20, 2025

    Boston Side Event Lineup TechCrunch, loyal private shares, Women Tech Meetups, 4 VC preparations and more

    June 20, 2025

    Pulley, 645 Venture, and Epigram Legal disrupt the 2025 agenda

    June 20, 2025
TechBrunchTechBrunch

Anthropic's new AI model can control your PC

TechBrunchBy TechBrunchOctober 22, 20248 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


In its pitch to investors last spring, Anthropic said it would build AI to power virtual assistants that can perform research, respond to emails and handle other back-office tasks on their own. The company calls it a “next generation algorithm for AI self-learning” and believes that if all goes according to plan, it could one day automate large parts of the economy.

It took a while, but AI is starting to emerge.

Anthropic on Tuesday released an upgraded version of its Claude 3.5 Sonnet model that can understand and interact with any desktop app. Via a new “Computer Usage” API, currently in open beta, the model can mimic keystrokes, button clicks, and mouse gestures, essentially emulating a person sitting at a PC. .

“We trained Claude to see what's happening on the screen and use the available software tools to accomplish the task,” Anthropic said in a blog post shared with TechCrunch. I wrote it. “When a developer asks Claude to use computer software and gives him the necessary access, Claude sees a screenshot of what the user is seeing and can move the cursor vertically or horizontally to click. Count the number of pixels you need. You're in the right place.”

Developers can experiment with computing through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI platform. The new non-computer 3.5 Sonnet has been rolled out to the Claude app and offers various performance improvements over the previous 3.5 Sonnet model.

App automation

Tools that can automate tasks on your PC aren't a new idea. Countless companies offer such tools, from decades-old established RPA vendors to startups like Relay, Induced AI, and Automat.

In the race to develop so-called “AI agents,” the field is only getting more crowded. AI agent remains an ill-defined term, but generally refers to AI that can automate software.

Some analysts say AI agents could offer companies an easy path to monetizing the billions of dollars they are pouring into AI. Companies seem to agree. According to a recent study by Capgemini, 10% of organizations are already using AI agents and 82% plan to integrate them within the next three years.

Salesforce made some splashy announcements about its AI agent technology this summer, and Microsoft yesterday touted new tools for building AI agents. OpenAI, which plans its own branded AI agent, sees this technology as a step toward super-intelligent AI.

Anthropic calls the AI ​​agent concept an “action execution layer” that allows the new 3.5 Sonnet to execute desktop-level commands. Thanks to the ability to browse the web (a first for an AI model, but a first for Anthropic), the 3.5 Sonnet can use any website and application.

claude 3.5 sonnet newAnthropic's new AI can control apps on your PC. Image credit: Humanity

“Humans maintain control by providing specific prompts that direct Claude's actions, such as 'Please use your computer and online data to fill out this form,'” an Anthropic spokesperson said. told TechCrunch. “Users can enable or restrict access as needed. Claude breaks down the user's prompts into computer commands (such as moving the cursor, clicking, typing, etc.) to accomplish that specific task. Execute.”

Software development platform Replit has used an early version of the new 3.5 Sonnet model to create an “autonomous validator” that can be evaluated while building apps. Meanwhile, Canva said it is exploring ways the new model can support the design and editing process.

But how is this different from other AI agents? That's a fair question. Rabbit, a consumer gadget startup, builds web agents that can do things like buy movie tickets online. Adept, recently acquired by Amazon, trains models to browse websites and interact with software. Twin Labs uses off-the-shelf models such as OpenAI's GPT-4o to automate desktop processes.

Anthropic claims the new 3.5 Sonnet is a more powerful and robust model that can perform better on coding tasks than OpenAI's flagship o1, according to SWE bench-validated benchmarks. Despite not being explicitly trained to do so, the upgraded 3.5 Sonnet will self-correct and retry tasks when it encounters obstacles, requiring tens or hundreds of steps. You can work towards your goals.

claude 3.5 sonnet newPerformance of the new Claude 3.5 Sonet model on various benchmarks. Image credit: Humanity

But don't fire your secretary just yet.

In an evaluation aimed at testing the AI ​​agent's ability to assist airlines with reservation tasks, such as changing flight reservations, the new 3.5 Sonet successfully completed less than half of the tasks. In another test involving tasks such as initiating returns, the 3.5 Sonnet failed about a third of the time.

Anthropic says the upgraded 3.5 Sonnet struggles with basic operations like scrolling and zooming, and the way it takes screenshots and stitches them together can make it possible to miss “short-term” actions and notifications. I admit that I have sex.

“Claude's computer usage remains slow and error-prone,” Anthropic wrote in its post. “We encourage developers to start exploring with low-risk tasks.”

risky business

But is the new 3.5 Sonet dangerously capable? Probably.

A recent study found that models without the ability to use desktop apps, like OpenAI's GPT-4o, “if attacked, could perform a harmful multi-step process, such as ordering a fake passport from someone on the dark web.” agent behavior.'' ” using jailbreak techniques. The researchers found that even on models protected by filters and safeguards, jailbreaking resulted in a high rate of successful execution of harmful tasks.

One can imagine that a model with desktop access could cause even greater disruption, such as by exploiting vulnerabilities in apps to compromise personal information (or store chats in clear text). Apart from the software levers at your disposal, this model's online and app connectivity could open the way for malicious jailbreakers.

Anthropic doesn't deny that there are risks with the new 3.5 Sonnet release. But the company argues that the benefits of seeing how the model is used in practice ultimately outweigh this risk.

“We believe it is far better to provide computer access to today's more limited and relatively secure models,” the company wrote. “This means we can begin to observe and learn from potential problems that occur at this lower level, and build computer usage and safety mitigations incrementally and simultaneously.”

claude 3.5 sonnet newImage credit: Humanity

Anthropic also says it has taken steps to prevent abuse, including not training the new 3.5 Sonnet based on user screenshots or prompts and preventing models from accessing the web during training. The company says it has developed a classifier to “guide” 3.5 Sonnet from activities deemed high-risk, such as posting on social media, creating accounts, and interacting with government websites.

As the U.S. general election approaches, Anthropic said it is focused on mitigating election-related abuse of its model. The US AI Safety Institute and the UK Safety Institute, two separate but collaborating government agencies specializing in risk assessment of AI models, tested the new 3.5 Sonnet before it was introduced.

Anthropic told TechCrunch that it has the ability to restrict access to additional websites and features “as needed” to protect against spam, fraud, misinformation, and more. As a security measure, the company stores screenshots captured from computer usage for at least 30 days. This retention period may alarm some developers.

We asked Anthropic under what circumstances it would pass on screenshots to third parties (such as law enforcement) if requested. A spokesperson said the company “will comply with requests for data pursuant to valid legal process.”

“There is no foolproof method. We will continually evaluate and iterate on safety measures to balance Claude's functionality with responsible use,” Antropic said. “Those using the computer version of Claude should take appropriate precautions to minimize this type of risk, including isolating Claude from particularly sensitive data on their computers.”

I think it's enough to prevent the worst from happening.

cheap model

Today's highlight may have been the upgraded 3.5 Sonnet model, but Anthropic also said an updated version of the Haiku, the cheapest and most efficient model in its Claude series, is coming soon. .

Claude 3.5 Haiku, scheduled to be released in the coming weeks, will match the performance of Claude 3 Opus, Anthropic's former cutting-edge model, on certain benchmarks at the same cost and “approximate speed” as Claude 3 Haiku. Masu.

“Claude 3.5 Haiku offers lower latency, easier to follow instructions, and more precise tools to personalize products for users, specialized subagent tasks, and vast amounts of data such as purchase history, pricing, and more. “Inventory data is perfect for generating customized experiences,'' Anthropic wrote in a blog post.

3.5 Haiku will be available initially as a text-only model and later as part of a multimodal package that can analyze both text and images.

Claude 3.5 Haiku3.5 Haiku benchmark performance. Image credit: Humanity

So, will there be more reason to use 3 Opus once 3.5 Haiku is available? What about 3.5 Opus, the successor to 3 Opus that Anthropic teased back in June?

“Every model in the Claude 3 model family has individual applications for customers,” said an Anthropic spokesperson. “Claude 3.5 Opus is on our roadmap and we plan to share more details as soon as possible.”

TechCrunch has a newsletter focused on AI. Sign up here to get it delivered to your inbox every Wednesday.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

SNAP gets Saturn, a social calendar app for high school and university students

June 20, 2025

The X app code refers to the physical card that comes to X money

June 20, 2025

Deezer begins labeling AI-generated music to tackle streaming scams

June 20, 2025

New code for Spotify's apps refers to the much-anticipated “lossless” layer

June 18, 2025

Glitch turns the thread into a literal echo chamber

June 18, 2025

Facebook will soon roll out support for PassKeys for Android and iOS

June 18, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

TC Last time to save all stage paths

June 22, 2025

2 days left to save up to $210 with TC All Stage Pass

June 21, 2025

New Mathematics: Why seed investors have sold winners before

June 20, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.