Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

iOS 19: All the rumor changes that Apple could bring to the new operating system

June 4, 2025

Ransomware Gangs claim responsibility for Kettering Health Hack

June 4, 2025

SNAP launches Lens Studio iOS and Web Apps for creating AR lenses with AI and simple tools

June 4, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    iOS 19: All the rumor changes that Apple could bring to the new operating system

    June 4, 2025

    SNAP launches Lens Studio iOS and Web Apps for creating AR lenses with AI and simple tools

    June 4, 2025

    RevenueCat and Paddle team up to help app developers make money from web payments

    June 4, 2025

    Venmo introduces new debit card perks and payment options as a rival cash app struggle

    June 4, 2025

    ChatGpt introduces conference recordings and connectors for Google Drive, Box, and more

    June 4, 2025
  • Crypto

    GameStop bought $500 million in Bitcoin

    May 28, 2025

    Vote for the session you want to watch in 2025

    May 26, 2025

    Save $900 + 90% from 2 tickets to destroy 2025 in the last 24 hours

    May 25, 2025

    Only 3 days left to save up to $900 to destroy the 2025 pass

    May 23, 2025

    Starting from up to $900 from Ticep, 90% off +1 in 2025

    May 22, 2025
  • Security

    Ransomware Gangs claim responsibility for Kettering Health Hack

    June 4, 2025

    Former CTO of CrowdStrike's cyber-rivals and how automation can undermine security for early-stage startups

    June 4, 2025

    Data breaches at newspaper giant Lee Enterprises impact 40,000 people

    June 4, 2025

    Phone Chipmaker Qualcomm fixes 3 zero-days exploited by hackers

    June 3, 2025

    Indian grocery startup Kiranapro has been hacked and its server has been removed, CEO confirms

    June 3, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    3 days left: Charge all your locations in stages on TC Expo Floor

    June 4, 2025

    From $5 to Financial Empowerment: Why Stash co-founder Brandon Krieg is a must-see for TechCrunch All Stage 2025

    June 4, 2025

    TC Session: Ticket's AI Trivia Challenge ends tonight

    June 4, 2025

    Now, deals accusing customers of “spoofing” spies

    June 3, 2025

    The week left to boost your brand with side events at TC on all stages

    June 3, 2025
TechBrunchTechBrunch

OpenAI announces new o3 model

TechBrunchBy TechBrunchDecember 20, 20247 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


OpenAI has saved its biggest announcement for the final day of its 12-day “Shipmas” event.

The company announced Friday the o3, the successor to the o1 “reasoning” model released earlier this year. o3 is more precisely a model family, just like o1. The o3 and o3-mini are small, refined models fine-tuned for specific tasks.

OpenAI makes notable claims that o3 approaches AGI, at least under certain conditions, but with important caveats. See below for details.

Our latest inference model, o3, is a breakthrough model with improved step functions on the most difficult benchmarks. We are currently beginning safety testing and red teaming. https://t.co/4XlK1iHxFK

— Greg Brockman (@gdb) December 20, 2024

Why call the new model an o3 and not an o2? Well, maybe it's because of trademarks. According to The Information, OpenAI skipped o2 to avoid potential conflicts with British telecoms provider O2. CEO Sam Altman acknowledged this to some extent on a livestream this morning. It's a strange world we live in, isn't it?

Although neither o3 nor o3-mini are widely available yet, safety researchers can sign up for a preview of o3-mini starting today. The o3 preview will arrive shortly. OpenAI has not disclosed the timing. Altman said the plan is to launch the o3-mini at the end of January, followed by the o3.

That slightly contradicts his recent statements. In an interview this week, Altman said that before OpenAI releases new inference models, it would prefer a federal testing framework to guide the monitoring and risk mitigation of such models.

And there are risks. AI safety testers say o1's inferential ability makes it more likely to deceive human users than traditional “non-inferential” models, or even major AI models from Meta, Anthropic, and Google. I discovered that. o3 may be trying to deceive at an even higher rate than previous versions. We'll find out when OpenAI's Red Team partners release their test results.

OpenAI says it uses a new technique called “deliberative tuning” to tune models like o3 to its safety principles. (o1 was similarly adjusted.) The company detailed its efforts in a new study.

Reasoning steps

Unlike most AIs, inference models such as o3 effectively fact-check, allowing them to avoid some of the pitfalls that models typically stumble upon.

This fact-checking process involves some latency. o3, like o1 before it, takes a little longer to reach a solution (typically seconds to minutes longer) compared to typical non-inferential models. What are the advantages? Reliability tends to be higher in fields such as physics, science, and mathematics.

o3 is trained to “think” before responding via what OpenAI calls a “private thought chain.” Models can reason and pre-plan tasks and perform sequences of actions over time that help find solutions.

In fact, when given a prompt, o3 pauses before responding, considers a number of related prompts, and “explains” its reasons along the way. After a while, the model summarizes what it considers to be the most accurate response.

A new feature in o3 is the ability to “tune” inference time. Models can be configured for low, medium, or high computing (i.e., think time). The more computing power you have, the better O3 will perform for your tasks.

Benchmarks and AGI

One of the big questions to this day has been whether OpenAI claims its latest model is approaching AGI.

AGI stands for “artificial general intelligence” and broadly refers to AI that can perform any task that a human can perform. OpenAI has its own definition: “A highly autonomous system that outperforms humans at the most economically valuable tasks.”

Achieving AGI will be a bold declaration. And this has contractual significance for OpenAI as well. According to the terms of the agreement with Microsoft, a close partner and investor, once OpenAI reaches AGI, the company is obligated to give Microsoft access to its cutting-edge technology (i.e., technology that meets OpenAI's AGI definition). will disappear.

With one benchmark, OpenAI is slowly moving closer to AGI. In ARC-AGI, a test designed to assess whether AI systems can effectively acquire new skills on data other than the data used for training, o3 achieved a score of 87.5% on high compute settings. did. In the worst case (low compute settings), the model's performance was three times higher than o1.

Indeed, according to ARC-AGI co-creator Francois Chollet, high-computing setups were very expensive, on the order of thousands of dollars per task.

Today, OpenAI announced o3, the next generation inference model. We worked with OpenAI to test it on ARC-AGI, and we believe this is a huge step forward in adapting AI to new tasks.

Scored 75.7% in semi-private evaluation in low compute mode ($20 per task…) pic.twitter.com/ESQ9CNVCEA

— François Chollet (@fchollet) December 20, 2024

Incidentally, OpenAI says it will partner with the infrastructure behind ARC-AGI to build the next generation of benchmarks.

Of course, ARC-AGI has its limitations, and this definition of AGI is just one of many.

In other benchmarks, o3 blows away the competition.

The model outperformed o1 by 22.8 percentile points on SWE-Bench Verified, a benchmark focused on programming tasks, and achieved a score of 2727 on the Codeforces rating, another measure of coding skill. (A rating of 2400 places engineers in the 99.2 percentile.) o3 scored 96.7% on the 2024 U.S. Invitational Mathematics Examination, achieving the next grade after getting just one question wrong. It was 87.7% on GPQA Diamond, a graduate-level biology, physics, and chemistry problem set. Finally, o3 set a new record on EpochAI's Frontier Math benchmark, solving 25.2% of the problems. Other models do not exceed 2%.

I trained o3-mini. It has higher performance than o1-mini, and is about 4 times faster end-to-end considering inference tokens.

and @ren_hongyu @shengjia_zhao others pic.twitter.com/3Cujxy6yCU

— Kevin Lu (@_kevinlu) December 20, 2024

Of course, these claims should be taken with a grain of salt. These are based on OpenAI's internal evaluation. It remains to be seen how well this model holds up to benchmarks from external customers and organizations.

trend

Following the release of OpenAI's first series of inference models, there has been an explosion of inference models from competing AI companies, including Google. In early November, DeepSeek, an AI research firm funded by quantitative traders, began previewing its first inference model, DeepSeek-R1. That same month, Alibaba's Qwen team announced what it claimed was the first “open” challenger to o1 (in the sense that it could be downloaded, tweaked, and run locally).

What opened the floodgates for inference models? One of them was the exploration of new approaches to improving generative AI. As TechCrunch recently reported, “brute force” methods of scaling up models no longer yield the improvements they once did.

Not everyone is convinced that inferential models are the best way to go. For example, they require a lot of computing power to run, so they tend to be expensive. Also, while it has shown good performance in benchmarks so far, it is unclear whether the inference model can maintain this rate of progress.

Interestingly, o3's release coincided with the retirement of one of OpenAI's most accomplished scientists. Alec Radford, lead author of the academic paper that launched OpenAI's GPT series of generative AI models (GPT-3, GPT-4, etc.), announced this week that he is retiring to pursue independent research. .





Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

OpenAI seeks to extend human lifespans with the help of longevity startups

January 17, 2025

Farewell to the $200 million woolly mammoth and TikTok

January 17, 2025

Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

January 17, 2025

Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

January 16, 2025

Apple suspends AI notification summaries for news after generating false alerts

January 16, 2025

Nvidia releases more tools and guardrails to help enterprises adopt AI agents

January 16, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

iOS 19: All the rumor changes that Apple could bring to the new operating system

June 4, 2025

Ransomware Gangs claim responsibility for Kettering Health Hack

June 4, 2025

SNAP launches Lens Studio iOS and Web Apps for creating AR lenses with AI and simple tools

June 4, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.