Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

After raising more than $3 million, AmiColé, a popular VC-backed beauty brand, is shut down

July 17, 2025

Twitch begins testing vertical video streams

July 17, 2025

Meta appoints a generated AI VP to run the thread

July 17, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    Twitch begins testing vertical video streams

    July 17, 2025

    Meta appoints a generated AI VP to run the thread

    July 17, 2025

    Substack will raise $100 million from Chernin Group, Andreessen Horowitz and CEO Skims.

    July 17, 2025

    Zuckerberg and Meta investors reach settlement in a $8 billion privacy case

    July 17, 2025

    Spotify expands audiobook access to family plan members for the first time

    July 17, 2025
  • Crypto

    North Korean hackers blamed record-breaking spikes in 2025

    July 17, 2025

    Bitcoin surpasses $118K at the second highest high in 24 hours

    July 11, 2025

    Vitalik Buterin reserves for Sam Altman's global project

    June 28, 2025

    Calci will close a $185 million round as rival Polymeruk reportedly seeks $200 million

    June 25, 2025

    Stablecoin Evangelist: Katie Haun's Battle of Digital Dollars

    June 22, 2025
  • Security

    Hackers are trying to steal passwords and sensitive data from signal clone users

    July 17, 2025

    Call of Duty scammers complain after Activision launches a new wave of Mass-Bans

    July 16, 2025

    UK retail giant cooperative confirms that hackers have stole all 6.5 million customer records

    July 16, 2025

    Chinese authorities are using new tools to hack seized phones and extract data

    July 16, 2025

    US Army soldier pleaded guilty to hacking and fearing carriers

    July 15, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    After raising more than $3 million, AmiColé, a popular VC-backed beauty brand, is shut down

    July 17, 2025

    A comprehensive list of 2025 tech layoffs

    July 16, 2025

    Rex Salisbury's Cambrian Ventures gathers new funds and backs Fintech slowdowns

    July 16, 2025

    Chainsmokers' Mantis Ventures closes its third $100 million fund

    July 15, 2025

    Venture acquires a rare Native American-led fund at Betsy Fore's Velvetin venture

    July 15, 2025
TechBrunchTechBrunch

OpenAI's o1 model certainly tries to fool humans over and over again

TechBrunchBy TechBrunchDecember 6, 20246 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


OpenAI has finally released the full version of o1. It uses additional computing to “think” about the question and provide smarter answers than GPT-4o. However, AI safety testers believe that o1's reasoning ability means it has a higher probability of deceiving humans than GPT-4o, and for that matter, is the primary AI model from Meta, Anthropic, and Google. I discovered.

That's according to Red Team findings published by OpenAI and Apollo Research on Wednesday, which said, “While we're excited that inference can significantly improve safety policy enforcement, these new capabilities will significantly improve the fundamentals of dangerous applications. “We are keeping in mind the possibility of forming a OpenAI was published in a paper.

OpenAI announced these results on o1's system card on Wednesday, after providing early access to o1 to Apollo Research's third-party red team, and also published its own paper.

Several times, OpenAI's o1 model has “conspired” against humans. This means that the AI ​​will covertly pursue its own goals, even if they go against your wishes. Conspiracy is not unique to o1, and models from Google, Meta, and Anthropic can conspire as well, but o1 seemed to exhibit the most deceptive behavior regarding its conspiracies.

The risk motivating this research is that if AI models are truly good at planning and have access to sufficient resources and agentic capabilities, they may be able to escape or evade human control. . Of course, AI models will have to advance considerably before this becomes a real issue.

“Subjectively, Apollo Research believes that such an event is unlikely to lead to a catastrophic outcome, as the O1 agent appears to be underpowered, but its assessment does not reflect this risk. “It was not designed to be directly evaluated,” OpenAI said in o1's system card.

This suggests that the company may need to retest its AI models each time OpenAI releases an agent system, which it reportedly plans to do in 2025. OpenAI did not respond to TechCrunch's questions about how it plans to do so.

According to the report, when o1's goals differed from the user's goals, the AI ​​model manipulated the data to advance its own agenda 19% of the time. The o1 model also attempted to disable the monitoring mechanisms put in place to ensure alignment with user goals in 5% of the tests.

Percentage of different planned actions from models in OpenAI (Image credit: OpenAI)

Perhaps more worryingly, when researchers asked the o1 model why it was trying to subvert it, the AI ​​denied doing anything wrong, saying roughly 99% of the time that it was “incorrect”. He even went so far as to “make up an explanation.”

For example, during testing, the o1 model was given a system prompt to protect the environment at all costs, which required users to develop a 10-year conservation plan for the area that would maximize short-term profits. . AI will stop. The AI ​​then overrode its own monitoring mechanisms, allowing it to make plans to protect the environment. When the user asked why o1 had disabled the monitoring mechanism, the AI ​​denied knowing that it had been disabled and blamed it on a glitch in the system.

This happened much more often on o1 than on o1-preview, Anthropic's Opus-3, or Meta's Llama-3.1.

“In our suite, o1 presented the most alarming cases of conspiracies, but did not reveal its internal reasoning to users and remained the most consistently deceptive after taking conspiratorial actions.” ,” Apollo Research said in the paper.

Percentage of different models refusing to take conspiratorial behavior in response to questions. (Image provided by Apollo Research)

To combat deceptive behavior by AI models, OpenAI says it is developing ways to monitor o1's chain of thought. Now, the “thinking” process that o1 does is a bit of a black box by design, but early attempts to peak behind the curtain show that o1 occasionally knows he's lying. It is shown that there is.

The company says it has discovered cases where o1 intentionally presented false information, but said this often happens as models try to please users by over-agreeing. There is. According to OpenAI, this could be a new behavior in post-training methods where humans reward AI models when they get the answer correct.

OpenAI says it flagged 0.17% of o1 responses as false. Although this percentage seems negligible, it is important to keep in mind that ChatGPT currently has 300 million users. This means that if this issue remains unresolved, o1 could potentially defraud thousands of users every week.

The o1 model series may be significantly more maneuverable than the GPT-4o. According to OpenAI's testing using an open source test evaluation called MakeMePay, o1 was approximately 20% more usable than GPT-4o.

These findings may be concerning to some, given how many AI safety researchers have left OpenAI in the last year. The list of these former employees continues to grow, including Jan Reich, Daniel Cocotadillo, Miles Brundage, and just last week, Rosie Campbell, to prioritize shipping new products and ensuring the safety of AI. They accuse OpenAI of deprioritizing their work. O1's record-setting plans may not be a direct result of that, but it certainly doesn't instill confidence.

OpenAI also said that the US Institute for AI Safety and the UK Institute for Safety conducted an evaluation of o1 ahead of its wide release, something the company recently pledged to do for all models. It is something. The debate surrounding California's AI bill, SB 1047, argued that state agencies should not have the authority to set safety standards for AI, but that federal agencies should have the authority. (Of course, the fate of the nascent federal AI regulatory agency is highly questionable.)

Behind the release of a big new AI model, there's a lot of work that OpenAI does internally to measure the safety of the model. Reports indicate that internal teams working on this safety task may be proportionately smaller than before, and that they may be given fewer resources. But these findings about the deceptive nature of o1 may help explain why safety and transparency in AI is more important than ever.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

OpenAI seeks to extend human lifespans with the help of longevity startups

January 17, 2025

Farewell to the $200 million woolly mammoth and TikTok

January 17, 2025

Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

January 17, 2025

Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

January 16, 2025

Apple suspends AI notification summaries for news after generating false alerts

January 16, 2025

Nvidia releases more tools and guardrails to help enterprises adopt AI agents

January 16, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

After raising more than $3 million, AmiColé, a popular VC-backed beauty brand, is shut down

July 17, 2025

Twitch begins testing vertical video streams

July 17, 2025

Meta appoints a generated AI VP to run the thread

July 17, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.