Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Final Call: suspend ticket savings for 2025

August 6, 2025

Clay confirms that it has closed a $100 million round at a $3.1 billion valuation

August 5, 2025

WhatsApp adds new features to protect against fraud

August 5, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    WhatsApp adds new features to protect against fraud

    August 5, 2025

    Spotify expands audiobook access to US Family Plan members for the first time

    August 5, 2025

    Openai says ChatGpt is on track to reach 700m-a-week users

    August 4, 2025

    Elon Musk says he's reclaiming Vine's archives

    August 4, 2025

    Chariture.ai adds social feeds to its app

    August 4, 2025
  • Crypto

    North Korean spies pretending to be remote workers have invaded hundreds of businesses, CloudStrike says

    August 4, 2025

    Telegram's Crypto Wallet will be released in the US

    July 22, 2025

    Indian Crypto ExchangeCoindCX confirms $44 million stolen during hack

    July 21, 2025

    North Korean hackers blamed record-breaking spikes in 2025

    July 17, 2025

    Bitcoin surpasses $118K at the second highest high in 24 hours

    July 11, 2025
  • Security

    Final Call: suspend ticket savings for 2025

    August 6, 2025

    Hackers stole the identity of Cisco customers using voice phishing attacks

    August 5, 2025

    SonicWall urges customers to disable SSLVPN in reporting ransomware attacks

    August 5, 2025

    Google says AI-based bug hunter has discovered 20 security vulnerabilities

    August 4, 2025

    Confusing accused of scraping websites that explicitly blocked AI scraping

    August 4, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    Clay confirms that it has closed a $100 million round at a $3.1 billion valuation

    August 5, 2025

    NEA destroys ICONIQ and get startup insights from Chefrobotics in 2025

    August 5, 2025

    Only two days left to save $675 to destroy tickets for 2025

    August 5, 2025

    What should the founder think about if they are trying to raise the series c?

    August 2, 2025

    Venture company CRV raises $750 million and returns capital to investors, then downsizing

    August 1, 2025
TechBrunchTechBrunch

Anthropic's new AI model can control your PC

TechBrunchBy TechBrunchOctober 22, 20248 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


In its pitch to investors last spring, Anthropic said it would build AI to power virtual assistants that can perform research, respond to emails and handle other back-office tasks on their own. The company calls it a “next generation algorithm for AI self-learning” and believes that if all goes according to plan, it could one day automate large parts of the economy.

It took a while, but AI is starting to emerge.

Anthropic on Tuesday released an upgraded version of its Claude 3.5 Sonnet model that can understand and interact with any desktop app. Via a new “Computer Usage” API, currently in open beta, the model can mimic keystrokes, button clicks, and mouse gestures, essentially emulating a person sitting at a PC. .

“We trained Claude to see what's happening on the screen and use the available software tools to accomplish the task,” Anthropic said in a blog post shared with TechCrunch. I wrote it. “When a developer asks Claude to use computer software and gives him the necessary access, Claude sees a screenshot of what the user is seeing and can move the cursor vertically or horizontally to click. Count the number of pixels you need. You're in the right place.”

Developers can experiment with computing through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI platform. The new non-computer 3.5 Sonnet has been rolled out to the Claude app and offers various performance improvements over the previous 3.5 Sonnet model.

App automation

Tools that can automate tasks on your PC aren't a new idea. Countless companies offer such tools, from decades-old established RPA vendors to startups like Relay, Induced AI, and Automat.

In the race to develop so-called “AI agents,” the field is only getting more crowded. AI agent remains an ill-defined term, but generally refers to AI that can automate software.

Some analysts say AI agents could offer companies an easy path to monetizing the billions of dollars they are pouring into AI. Companies seem to agree. According to a recent study by Capgemini, 10% of organizations are already using AI agents and 82% plan to integrate them within the next three years.

Salesforce made some splashy announcements about its AI agent technology this summer, and Microsoft yesterday touted new tools for building AI agents. OpenAI, which plans its own branded AI agent, sees this technology as a step toward super-intelligent AI.

Anthropic calls the AI ​​agent concept an “action execution layer” that allows the new 3.5 Sonnet to execute desktop-level commands. Thanks to the ability to browse the web (a first for an AI model, but a first for Anthropic), the 3.5 Sonnet can use any website and application.

claude 3.5 sonnet newAnthropic's new AI can control apps on your PC. Image credit: Humanity

“Humans maintain control by providing specific prompts that direct Claude's actions, such as 'Please use your computer and online data to fill out this form,'” an Anthropic spokesperson said. told TechCrunch. “Users can enable or restrict access as needed. Claude breaks down the user's prompts into computer commands (such as moving the cursor, clicking, typing, etc.) to accomplish that specific task. Execute.”

Software development platform Replit has used an early version of the new 3.5 Sonnet model to create an “autonomous validator” that can be evaluated while building apps. Meanwhile, Canva said it is exploring ways the new model can support the design and editing process.

But how is this different from other AI agents? That's a fair question. Rabbit, a consumer gadget startup, builds web agents that can do things like buy movie tickets online. Adept, recently acquired by Amazon, trains models to browse websites and interact with software. Twin Labs uses off-the-shelf models such as OpenAI's GPT-4o to automate desktop processes.

Anthropic claims the new 3.5 Sonnet is a more powerful and robust model that can perform better on coding tasks than OpenAI's flagship o1, according to SWE bench-validated benchmarks. Despite not being explicitly trained to do so, the upgraded 3.5 Sonnet will self-correct and retry tasks when it encounters obstacles, requiring tens or hundreds of steps. You can work towards your goals.

claude 3.5 sonnet newPerformance of the new Claude 3.5 Sonet model on various benchmarks. Image credit: Humanity

But don't fire your secretary just yet.

In an evaluation aimed at testing the AI ​​agent's ability to assist airlines with reservation tasks, such as changing flight reservations, the new 3.5 Sonet successfully completed less than half of the tasks. In another test involving tasks such as initiating returns, the 3.5 Sonnet failed about a third of the time.

Anthropic says the upgraded 3.5 Sonnet struggles with basic operations like scrolling and zooming, and the way it takes screenshots and stitches them together can make it possible to miss “short-term” actions and notifications. I admit that I have sex.

“Claude's computer usage remains slow and error-prone,” Anthropic wrote in its post. “We encourage developers to start exploring with low-risk tasks.”

risky business

But is the new 3.5 Sonet dangerously capable? Probably.

A recent study found that models without the ability to use desktop apps, like OpenAI's GPT-4o, “if attacked, could perform a harmful multi-step process, such as ordering a fake passport from someone on the dark web.” agent behavior.'' ” using jailbreak techniques. The researchers found that even on models protected by filters and safeguards, jailbreaking resulted in a high rate of successful execution of harmful tasks.

One can imagine that a model with desktop access could cause even greater disruption, such as by exploiting vulnerabilities in apps to compromise personal information (or store chats in clear text). Apart from the software levers at your disposal, this model's online and app connectivity could open the way for malicious jailbreakers.

Anthropic doesn't deny that there are risks with the new 3.5 Sonnet release. But the company argues that the benefits of seeing how the model is used in practice ultimately outweigh this risk.

“We believe it is far better to provide computer access to today's more limited and relatively secure models,” the company wrote. “This means we can begin to observe and learn from potential problems that occur at this lower level, and build computer usage and safety mitigations incrementally and simultaneously.”

claude 3.5 sonnet newImage credit: Humanity

Anthropic also says it has taken steps to prevent abuse, including not training the new 3.5 Sonnet based on user screenshots or prompts and preventing models from accessing the web during training. The company says it has developed a classifier to “guide” 3.5 Sonnet from activities deemed high-risk, such as posting on social media, creating accounts, and interacting with government websites.

As the U.S. general election approaches, Anthropic said it is focused on mitigating election-related abuse of its model. The US AI Safety Institute and the UK Safety Institute, two separate but collaborating government agencies specializing in risk assessment of AI models, tested the new 3.5 Sonnet before it was introduced.

Anthropic told TechCrunch that it has the ability to restrict access to additional websites and features “as needed” to protect against spam, fraud, misinformation, and more. As a security measure, the company stores screenshots captured from computer usage for at least 30 days. This retention period may alarm some developers.

We asked Anthropic under what circumstances it would pass on screenshots to third parties (such as law enforcement) if requested. A spokesperson said the company “will comply with requests for data pursuant to valid legal process.”

“There is no foolproof method. We will continually evaluate and iterate on safety measures to balance Claude's functionality with responsible use,” Antropic said. “Those using the computer version of Claude should take appropriate precautions to minimize this type of risk, including isolating Claude from particularly sensitive data on their computers.”

I think it's enough to prevent the worst from happening.

cheap model

Today's highlight may have been the upgraded 3.5 Sonnet model, but Anthropic also said an updated version of the Haiku, the cheapest and most efficient model in its Claude series, is coming soon. .

Claude 3.5 Haiku, scheduled to be released in the coming weeks, will match the performance of Claude 3 Opus, Anthropic's former cutting-edge model, on certain benchmarks at the same cost and “approximate speed” as Claude 3 Haiku. Masu.

“Claude 3.5 Haiku offers lower latency, easier to follow instructions, and more precise tools to personalize products for users, specialized subagent tasks, and vast amounts of data such as purchase history, pricing, and more. “Inventory data is perfect for generating customized experiences,'' Anthropic wrote in a blog post.

3.5 Haiku will be available initially as a text-only model and later as part of a multimodal package that can analyze both text and images.

Claude 3.5 Haiku3.5 Haiku benchmark performance. Image credit: Humanity

So, will there be more reason to use 3 Opus once 3.5 Haiku is available? What about 3.5 Opus, the successor to 3 Opus that Anthropic teased back in June?

“Every model in the Claude 3 model family has individual applications for customers,” said an Anthropic spokesperson. “Claude 3.5 Opus is on our roadmap and we plan to share more details as soon as possible.”

TechCrunch has a newsletter focused on AI. Sign up here to get it delivered to your inbox every Wednesday.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

WhatsApp adds new features to protect against fraud

August 5, 2025

Spotify expands audiobook access to US Family Plan members for the first time

August 5, 2025

Openai says ChatGpt is on track to reach 700m-a-week users

August 4, 2025

Elon Musk says he's reclaiming Vine's archives

August 4, 2025

Chariture.ai adds social feeds to its app

August 4, 2025

TrueCaller's call recording function will not work on iPhones from September 30th

August 1, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

Final Call: suspend ticket savings for 2025

August 6, 2025

Clay confirms that it has closed a $100 million round at a $3.1 billion valuation

August 5, 2025

WhatsApp adds new features to protect against fraud

August 5, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.