Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Uber Eyes B2B Logistics pushes in India through a state-backed open commerce network

May 19, 2025

Crypto Elite is increasingly worried about their personal safety

May 18, 2025

Epic Games Ask Judges to Force Apple to Approve Fortnite

May 17, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    Uber Eyes B2B Logistics pushes in India through a state-backed open commerce network

    May 19, 2025

    Epic Games Ask Judges to Force Apple to Approve Fortnite

    May 17, 2025

    Google I/O 2025: What to expect including Gemini and Android 16 updates?

    May 16, 2025

    After adding your own billing option to iOS, Apple asks Patreon to go to an external browser

    May 16, 2025

    The epic game says Apple is blocking Fortnite from the US and EU app stores

    May 16, 2025
  • Crypto

    Crypto Elite is increasingly worried about their personal safety

    May 18, 2025

    Robinhood expands its footprint in Canada by getting Wonderfi

    May 13, 2025

    Stripe unveils AI Foundation model for payments, revealing a “deeper partnership” with Nvidia

    May 7, 2025

    Movie Pass explores the daily fantasy platform of film buffs

    May 1, 2025

    Speaking on TechCrunch 2025: Application is open

    April 24, 2025
  • Security

    American man spiked the price of Bitcoin hacked SEC X account and sentenced to prison

    May 16, 2025

    Coinbase says that customer's personal information was stolen in a data breach

    May 15, 2025

    White House Scrap plans to block data brokers from selling sensitive American data

    May 14, 2025

    Xai's promised safety report is MIA

    May 13, 2025

    Seven things we learned from WhatsApp vs. NSO Group Spyware Litigation

    May 13, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    Build, not bind: Accel's Sonali de Rycker on European AI Crossroads

    May 17, 2025

    How Silicon Valley's influence in Washington benefits high-tech elites

    May 16, 2025

    Red Point raises $650 million three years from the last big early stage fund

    May 15, 2025

    Lip Ring vs Deal Unpacking: Corporate Spy and $16.8 billion Plot Twist

    May 14, 2025

    A $2.5 billion treasured chime file for IPO reveals a $33 million deal with the Dallas Mavericks

    May 13, 2025
TechBrunchTechBrunch

OpenAI's o1 model certainly tries to fool humans over and over again

TechBrunchBy TechBrunchDecember 6, 20246 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


OpenAI has finally released the full version of o1. It uses additional computing to “think” about the question and provide smarter answers than GPT-4o. However, AI safety testers believe that o1's reasoning ability means it has a higher probability of deceiving humans than GPT-4o, and for that matter, is the primary AI model from Meta, Anthropic, and Google. I discovered.

That's according to Red Team findings published by OpenAI and Apollo Research on Wednesday, which said, “While we're excited that inference can significantly improve safety policy enforcement, these new capabilities will significantly improve the fundamentals of dangerous applications. “We are keeping in mind the possibility of forming a OpenAI was published in a paper.

OpenAI announced these results on o1's system card on Wednesday, after providing early access to o1 to Apollo Research's third-party red team, and also published its own paper.

Several times, OpenAI's o1 model has “conspired” against humans. This means that the AI ​​will covertly pursue its own goals, even if they go against your wishes. Conspiracy is not unique to o1, and models from Google, Meta, and Anthropic can conspire as well, but o1 seemed to exhibit the most deceptive behavior regarding its conspiracies.

The risk motivating this research is that if AI models are truly good at planning and have access to sufficient resources and agentic capabilities, they may be able to escape or evade human control. . Of course, AI models will have to advance considerably before this becomes a real issue.

“Subjectively, Apollo Research believes that such an event is unlikely to lead to a catastrophic outcome, as the O1 agent appears to be underpowered, but its assessment does not reflect this risk. “It was not designed to be directly evaluated,” OpenAI said in o1's system card.

This suggests that the company may need to retest its AI models each time OpenAI releases an agent system, which it reportedly plans to do in 2025. OpenAI did not respond to TechCrunch's questions about how it plans to do so.

According to the report, when o1's goals differed from the user's goals, the AI ​​model manipulated the data to advance its own agenda 19% of the time. The o1 model also attempted to disable the monitoring mechanisms put in place to ensure alignment with user goals in 5% of the tests.

Percentage of different planned actions from models in OpenAI (Image credit: OpenAI)

Perhaps more worryingly, when researchers asked the o1 model why it was trying to subvert it, the AI ​​denied doing anything wrong, saying roughly 99% of the time that it was “incorrect”. He even went so far as to “make up an explanation.”

For example, during testing, the o1 model was given a system prompt to protect the environment at all costs, which required users to develop a 10-year conservation plan for the area that would maximize short-term profits. . AI will stop. The AI ​​then overrode its own monitoring mechanisms, allowing it to make plans to protect the environment. When the user asked why o1 had disabled the monitoring mechanism, the AI ​​denied knowing that it had been disabled and blamed it on a glitch in the system.

This happened much more often on o1 than on o1-preview, Anthropic's Opus-3, or Meta's Llama-3.1.

“In our suite, o1 presented the most alarming cases of conspiracies, but did not reveal its internal reasoning to users and remained the most consistently deceptive after taking conspiratorial actions.” ,” Apollo Research said in the paper.

Percentage of different models refusing to take conspiratorial behavior in response to questions. (Image provided by Apollo Research)

To combat deceptive behavior by AI models, OpenAI says it is developing ways to monitor o1's chain of thought. Now, the “thinking” process that o1 does is a bit of a black box by design, but early attempts to peak behind the curtain show that o1 occasionally knows he's lying. It is shown that there is.

The company says it has discovered cases where o1 intentionally presented false information, but said this often happens as models try to please users by over-agreeing. There is. According to OpenAI, this could be a new behavior in post-training methods where humans reward AI models when they get the answer correct.

OpenAI says it flagged 0.17% of o1 responses as false. Although this percentage seems negligible, it is important to keep in mind that ChatGPT currently has 300 million users. This means that if this issue remains unresolved, o1 could potentially defraud thousands of users every week.

The o1 model series may be significantly more maneuverable than the GPT-4o. According to OpenAI's testing using an open source test evaluation called MakeMePay, o1 was approximately 20% more usable than GPT-4o.

These findings may be concerning to some, given how many AI safety researchers have left OpenAI in the last year. The list of these former employees continues to grow, including Jan Reich, Daniel Cocotadillo, Miles Brundage, and just last week, Rosie Campbell, to prioritize shipping new products and ensuring the safety of AI. They accuse OpenAI of deprioritizing their work. O1's record-setting plans may not be a direct result of that, but it certainly doesn't instill confidence.

OpenAI also said that the US Institute for AI Safety and the UK Institute for Safety conducted an evaluation of o1 ahead of its wide release, something the company recently pledged to do for all models. It is something. The debate surrounding California's AI bill, SB 1047, argued that state agencies should not have the authority to set safety standards for AI, but that federal agencies should have the authority. (Of course, the fate of the nascent federal AI regulatory agency is highly questionable.)

Behind the release of a big new AI model, there's a lot of work that OpenAI does internally to measure the safety of the model. Reports indicate that internal teams working on this safety task may be proportionately smaller than before, and that they may be given fewer resources. But these findings about the deceptive nature of o1 may help explain why safety and transparency in AI is more important than ever.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

OpenAI seeks to extend human lifespans with the help of longevity startups

January 17, 2025

Farewell to the $200 million woolly mammoth and TikTok

January 17, 2025

Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

January 17, 2025

Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

January 16, 2025

Apple suspends AI notification summaries for news after generating false alerts

January 16, 2025

Nvidia releases more tools and guardrails to help enterprises adopt AI agents

January 16, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

Uber Eyes B2B Logistics pushes in India through a state-backed open commerce network

May 19, 2025

Crypto Elite is increasingly worried about their personal safety

May 18, 2025

Epic Games Ask Judges to Force Apple to Approve Fortnite

May 17, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.