Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Destruction 2025 Builder's Stage Agenda is now alive and in shape

June 19, 2025

Kathy Gao brings a real playbook to every stage

June 19, 2025

At TC, Charles Hudson tells us what investors really see at every stage

June 19, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    New code for Spotify's apps refers to the much-anticipated “lossless” layer

    June 18, 2025

    Glitch turns the thread into a literal echo chamber

    June 18, 2025

    Facebook will soon roll out support for PassKeys for Android and iOS

    June 18, 2025

    Here's the first look at the rebooted digg

    June 18, 2025

    YouTube launches new shopping product stickers for shorts

    June 18, 2025
  • Crypto

    Hackers steal and destroy millions of Iran's biggest crypto exchanges

    June 18, 2025

    Unique, a new social media app

    June 17, 2025

    xNotify Polymarket as partner in the official forecast market

    June 6, 2025

    Circle IPOs are giving hope to more startups waiting to be published to more startups

    June 5, 2025

    GameStop bought $500 million in Bitcoin

    May 28, 2025
  • Security

    According to web surveillance companies, the internet will collapse across Iran

    June 18, 2025

    Pro-Israel hacktivist group claims responsiveness to alleged Iranian bank hacks

    June 17, 2025

    Pro-Israel Hacktivist Group has allegedly blamed for alleged Iranian bank hacks

    June 17, 2025

    As food shortages continue, UNFI says it is recovering from cyberattacks

    June 17, 2025

    UK Watchdog will fine 23andMe over 2023 data breach

    June 17, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    Destruction 2025 Builder's Stage Agenda is now alive and in shape

    June 19, 2025

    Kathy Gao brings a real playbook to every stage

    June 19, 2025

    At TC, Charles Hudson tells us what investors really see at every stage

    June 19, 2025

    Lock all TC stage passes for the remaining 4 days to save $210

    June 19, 2025

    No, Andreessen Horowitz did not post a tweet for Crypto Scam

    June 18, 2025
TechBrunchTechBrunch

OpenAI's o1 model certainly tries to fool humans over and over again

TechBrunchBy TechBrunchDecember 6, 20246 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


OpenAI has finally released the full version of o1. It uses additional computing to “think” about the question and provide smarter answers than GPT-4o. However, AI safety testers believe that o1's reasoning ability means it has a higher probability of deceiving humans than GPT-4o, and for that matter, is the primary AI model from Meta, Anthropic, and Google. I discovered.

That's according to Red Team findings published by OpenAI and Apollo Research on Wednesday, which said, “While we're excited that inference can significantly improve safety policy enforcement, these new capabilities will significantly improve the fundamentals of dangerous applications. “We are keeping in mind the possibility of forming a OpenAI was published in a paper.

OpenAI announced these results on o1's system card on Wednesday, after providing early access to o1 to Apollo Research's third-party red team, and also published its own paper.

Several times, OpenAI's o1 model has “conspired” against humans. This means that the AI ​​will covertly pursue its own goals, even if they go against your wishes. Conspiracy is not unique to o1, and models from Google, Meta, and Anthropic can conspire as well, but o1 seemed to exhibit the most deceptive behavior regarding its conspiracies.

The risk motivating this research is that if AI models are truly good at planning and have access to sufficient resources and agentic capabilities, they may be able to escape or evade human control. . Of course, AI models will have to advance considerably before this becomes a real issue.

“Subjectively, Apollo Research believes that such an event is unlikely to lead to a catastrophic outcome, as the O1 agent appears to be underpowered, but its assessment does not reflect this risk. “It was not designed to be directly evaluated,” OpenAI said in o1's system card.

This suggests that the company may need to retest its AI models each time OpenAI releases an agent system, which it reportedly plans to do in 2025. OpenAI did not respond to TechCrunch's questions about how it plans to do so.

According to the report, when o1's goals differed from the user's goals, the AI ​​model manipulated the data to advance its own agenda 19% of the time. The o1 model also attempted to disable the monitoring mechanisms put in place to ensure alignment with user goals in 5% of the tests.

Percentage of different planned actions from models in OpenAI (Image credit: OpenAI)

Perhaps more worryingly, when researchers asked the o1 model why it was trying to subvert it, the AI ​​denied doing anything wrong, saying roughly 99% of the time that it was “incorrect”. He even went so far as to “make up an explanation.”

For example, during testing, the o1 model was given a system prompt to protect the environment at all costs, which required users to develop a 10-year conservation plan for the area that would maximize short-term profits. . AI will stop. The AI ​​then overrode its own monitoring mechanisms, allowing it to make plans to protect the environment. When the user asked why o1 had disabled the monitoring mechanism, the AI ​​denied knowing that it had been disabled and blamed it on a glitch in the system.

This happened much more often on o1 than on o1-preview, Anthropic's Opus-3, or Meta's Llama-3.1.

“In our suite, o1 presented the most alarming cases of conspiracies, but did not reveal its internal reasoning to users and remained the most consistently deceptive after taking conspiratorial actions.” ,” Apollo Research said in the paper.

Percentage of different models refusing to take conspiratorial behavior in response to questions. (Image provided by Apollo Research)

To combat deceptive behavior by AI models, OpenAI says it is developing ways to monitor o1's chain of thought. Now, the “thinking” process that o1 does is a bit of a black box by design, but early attempts to peak behind the curtain show that o1 occasionally knows he's lying. It is shown that there is.

The company says it has discovered cases where o1 intentionally presented false information, but said this often happens as models try to please users by over-agreeing. There is. According to OpenAI, this could be a new behavior in post-training methods where humans reward AI models when they get the answer correct.

OpenAI says it flagged 0.17% of o1 responses as false. Although this percentage seems negligible, it is important to keep in mind that ChatGPT currently has 300 million users. This means that if this issue remains unresolved, o1 could potentially defraud thousands of users every week.

The o1 model series may be significantly more maneuverable than the GPT-4o. According to OpenAI's testing using an open source test evaluation called MakeMePay, o1 was approximately 20% more usable than GPT-4o.

These findings may be concerning to some, given how many AI safety researchers have left OpenAI in the last year. The list of these former employees continues to grow, including Jan Reich, Daniel Cocotadillo, Miles Brundage, and just last week, Rosie Campbell, to prioritize shipping new products and ensuring the safety of AI. They accuse OpenAI of deprioritizing their work. O1's record-setting plans may not be a direct result of that, but it certainly doesn't instill confidence.

OpenAI also said that the US Institute for AI Safety and the UK Institute for Safety conducted an evaluation of o1 ahead of its wide release, something the company recently pledged to do for all models. It is something. The debate surrounding California's AI bill, SB 1047, argued that state agencies should not have the authority to set safety standards for AI, but that federal agencies should have the authority. (Of course, the fate of the nascent federal AI regulatory agency is highly questionable.)

Behind the release of a big new AI model, there's a lot of work that OpenAI does internally to measure the safety of the model. Reports indicate that internal teams working on this safety task may be proportionately smaller than before, and that they may be given fewer resources. But these findings about the deceptive nature of o1 may help explain why safety and transparency in AI is more important than ever.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

OpenAI seeks to extend human lifespans with the help of longevity startups

January 17, 2025

Farewell to the $200 million woolly mammoth and TikTok

January 17, 2025

Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

January 17, 2025

Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

January 16, 2025

Apple suspends AI notification summaries for news after generating false alerts

January 16, 2025

Nvidia releases more tools and guardrails to help enterprises adopt AI agents

January 16, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

Destruction 2025 Builder's Stage Agenda is now alive and in shape

June 19, 2025

Kathy Gao brings a real playbook to every stage

June 19, 2025

At TC, Charles Hudson tells us what investors really see at every stage

June 19, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.