Close Menu
TechBrunchTechBrunch
  • Home
  • AI
  • Apps
  • Crypto
  • Security
  • Startups
  • TechCrunch
  • Venture

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

A flock of Whitney Wolf burns out – and bounces back

May 10, 2025

Five Things We Learned from WhatsApp vs. NSO Group Spyware Litigation

May 10, 2025

Google I/O 2025: What to expect including Gemini and Android 16 updates?

May 9, 2025
Facebook X (Twitter) Instagram
TechBrunchTechBrunch
  • Home
  • AI

    OpenAI seeks to extend human lifespans with the help of longevity startups

    January 17, 2025

    Farewell to the $200 million woolly mammoth and TikTok

    January 17, 2025

    Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

    January 17, 2025

    Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

    January 16, 2025

    Apple suspends AI notification summaries for news after generating false alerts

    January 16, 2025
  • Apps

    A flock of Whitney Wolf burns out – and bounces back

    May 10, 2025

    Google I/O 2025: What to expect including Gemini and Android 16 updates?

    May 9, 2025

    Epic Games and Spotify Test Apple's new app store rules

    May 9, 2025

    X Timeline is not updated for many users

    May 9, 2025

    AppFigures: Apple earned more than $10 billion from its US App Store commission last year

    May 8, 2025
  • Crypto

    Stripe unveils AI Foundation model for payments, revealing a “deeper partnership” with Nvidia

    May 7, 2025

    Movie Pass explores the daily fantasy platform of film buffs

    May 1, 2025

    Speaking on TechCrunch 2025: Application is open

    April 24, 2025

    Revolut, a $45 billion Neobank, recorded a profit of $1 billion in 2024

    April 24, 2025

    The new kids show will come with a crypto wallet when it debuts this fall

    April 18, 2025
  • Security

    Five Things We Learned from WhatsApp vs. NSO Group Spyware Litigation

    May 10, 2025

    FBI and Dutch police seize and shut down hacked router botnets

    May 9, 2025

    Florida bill calling for encryption backdoors for social media accounts failed

    May 9, 2025

    Korean telephone giant SKT data breaches timeline

    May 8, 2025

    Powerschool paid the hacker ransom, but now the school says it's being forced

    May 8, 2025
  • Startups

    7 days left: Founders and VCs save over $300 on all stage passes

    March 24, 2025

    AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

    March 24, 2025

    20 Hottest Open Source Startups of 2024

    March 22, 2025

    Andrill may build a weapons factory in the UK

    March 21, 2025

    Startup Weekly: Wiz bets paid off at M&A Rich Week

    March 21, 2025
  • TechCrunch

    OpenSea takes a long-term view with a focus on UX despite NFT sales remaining low

    February 8, 2024

    AI will save software companies' growth dreams

    February 8, 2024

    B2B and B2C are not about who buys, but how you sell

    February 5, 2024

    It's time for venture capital to break away from fast fashion

    February 3, 2024

    a16z's Chris Dixon believes it's time to focus on blockchain use cases rather than speculation

    February 2, 2024
  • Venture

    A comprehensive list of 2025 tech layoffs

    May 9, 2025

    One of Elon Musk's longtime VCS is suing his former employer after allegedly fired

    May 8, 2025

    Sequoia leads a $1.5 billion tender offer for sales automation startup clay

    May 8, 2025

    Bosch Ventures is turning attention to North America with a new $270 million fund

    May 8, 2025

    A comprehensive list of 2025 tech layoffs

    May 7, 2025
TechBrunchTechBrunch

AI of the Week: Big tech companies embrace synthetic data

TechBrunchBy TechBrunchOctober 9, 20246 Mins Read
Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Telegram Email


Hello everyone, welcome to TechCrunch's regular AI newsletter. If you'd like to have this sent to your inbox every Wednesday, sign up here.

Synthetic data took center stage in AI this week.

Last Thursday, OpenAI introduced Canvas, a new way to interact with ChatGPT, an AI-powered chatbot platform. Canvas opens a window with a workspace for creating and coding your project. Users can generate text or code in Canvas and optionally use ChatGPT to highlight sections for editing.

From a user's perspective, Canvas greatly improves quality of life. But what's most interesting about this feature for us are the tweaked models that enhance its functionality. OpenAI said it uses synthetic data to tune the GPT-4o model and “enable new user interactions” with Canvas.

“Using new synthetic data generation techniques, such as distilling output from OpenAI’s o1 preview, we can fine-tune GPT-4o to open canvases, make targeted edits, and inline high-quality comments. ” said Nick Turley, ChatGPT Product Director. “This approach allowed us to quickly improve our models and enable new user interactions without relying on human-generated data.”

OpenAI isn't the only Big Tech company increasingly relying on synthetic data to train its models.

In developing Movie Gen, a suite of AI-powered tools for creating and editing video clips, Meta relied in part on synthetic captions generated by deriving the Llama 3 model. The company hired a team of human annotators to fix errors and add detail to these captions, but much of the groundwork was largely automated.

OpenAI CEO Sam Altman has argued that AI will one day generate enough synthetic data to effectively train itself. This could be an advantage for companies like OpenAI, which spend a lot of money on human annotators and data licenses.

Meta used the synthetic data to fine-tune the Llama 3 model itself. And OpenAI is said to be sourcing synthetic training data from o1 for its next-generation model, code-named Orion.

But taking a synthetic data-first approach comes with risks. As one researcher recently pointed out to me, the models used to generate synthetic data are inevitably hallucinatory (i.e., fabricated) and contain biases and limitations. These flaws appear in the data produced by the model.

Therefore, to use synthetic data safely, it must be thoroughly curated and filtered, similar to standard practices for human-generated data. Failure to do so can lead to model collapse, making the model's output less “creative” and more biased, ultimately severely impairing its functionality.

This is not easy on large scale operations. But as real-world training data becomes more costly (not to mention difficult to obtain), AI vendors may see synthetic data as the only viable path forward. I hope they are cautious in adopting it.

news

Ads in AI Overview: Google has announced that it will soon start showing ads in AI Overview, an AI-generated summary for specific Google search queries.

Google Lens now has video: Lens, Google's visual search app, has been upgraded to answer questions about your surroundings in near real-time. You can capture video through Lens and ask questions about objects of interest in the video. (There will probably be ads for this as well.)

From Sora to DeepMind: Tim Brooks, one of the leaders of OpenAI's video generator Sora, has left for rival Google DeepMind. In a post on X, Brooks announced that he would be working on video generation technology and a “world simulator.”

Fluidization: Black Forest Labs, the Andreessen Horowitz-backed startup that develops the image generation component for xAI's Grok assistant, has opened its API to the public in beta and released a new model.

Not so transparent: California's recently passed AB-2013 bill requires companies that develop generative AI systems to publicly provide a summary of the data used to train their systems. So far, few companies have said whether they will comply. The law gives a deadline of January 2026.

This week's research paper

Apple researchers have been hard at work researching computational photography for years, and a key aspect of the process is depth mapping. Initially, this was done using specialized depth sensors such as stereoscopic or LIDAR units, but these tend to be expensive, complex, and take up valuable internal space. Running strictly within software is desirable in many ways. That's what this paper “Depth Pro” is all about.

Alexei Bochkovsky et al. We share a method for high-detail zero-shot monocular depth estimation. This means you use a single camera, you don't have to be trained on certain things (like it works even though you've never seen a camel), and it also captures difficult aspects such as: A bunch of hair. It's almost certainly being used on the iPhone now (though probably in an improved custom-built version), but if you want to do a little bit of your own depth estimation using the code on this GitHub page, you can give it a try. can.

this week's model

Google has released a new model in the Gemini family, Gemini 1.5 Flash-8B. This is claimed by the company to be one of its best performing models.

Gemini 1.5 is a “distilled” version of Flash, already optimized for speed and efficiency. Gemini 1.5 Flash-8B offers 50% lower usage costs, lower latency, and 2x higher rate limits in Google's AI-focused AI Studio. Developer environment.

“Flash-8B nearly matches the performance of the 1.5 Flash model launched in May on many benchmarks,” Google wrote in a blog post. “Our model is [continue] Informed by developer feedback and our own testing of what's possible. ”

Google says the Gemini 1.5 Flash-8B is suitable for chatting, transcription, translation, or other “simple” and “high-volume” tasks. In addition to AI Studio, this model is also available for free through Google's Gemini API. Rate limit is 4,000 requests per minute.

grab bag

Speaking of cheap AI, Anthropic released the Message Batches API, a new feature that allows developers to process large numbers of AI model queries asynchronously at low cost.

Similar to batching requests to Google's Gemini API, Anthropic's Message Batches API allows developers to submit fixed-sized batches of up to 10,000 queries per batch. Each batch is processed within 24 hours and costs 50% less than standard API calls.

Anthropic says the message batch API is ideal for “large scale” tasks such as dataset analysis, classification of large datasets, and model evaluation. “For example, analysis of an entire corporate document repository, which may contain millions of files, [this] Bulk discount. ”

The Message Batch API is available in public beta with support for Anthropic's Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku models.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

OpenAI seeks to extend human lifespans with the help of longevity startups

January 17, 2025

Farewell to the $200 million woolly mammoth and TikTok

January 17, 2025

Nord Security founder launches Nexos.ai to help enterprises move AI projects from pilot to production

January 17, 2025

Data proves it remains difficult for startups to raise capital, even though VCs invested $75 billion in the fourth quarter

January 16, 2025

Apple suspends AI notification summaries for news after generating false alerts

January 16, 2025

Nvidia releases more tools and guardrails to help enterprises adopt AI agents

January 16, 2025

Leave A Reply Cancel Reply

Top Reviews
Editors Picks

7 days left: Founders and VCs save over $300 on all stage passes

March 24, 2025

AI chip startup Furiosaai reportedly rejecting $800 million acquisition offer from Meta

March 24, 2025

20 Hottest Open Source Startups of 2024

March 22, 2025

Andrill may build a weapons factory in the UK

March 21, 2025
About Us
About Us

Welcome to Tech Brunch, your go-to destination for cutting-edge insights, news, and analysis in the fields of Artificial Intelligence (AI), Cryptocurrency, Technology, and Startups. At Tech Brunch, we are passionate about exploring the latest trends, innovations, and developments shaping the future of these dynamic industries.

Our Picks

A flock of Whitney Wolf burns out – and bounces back

May 10, 2025

Five Things We Learned from WhatsApp vs. NSO Group Spyware Litigation

May 10, 2025

Google I/O 2025: What to expect including Gemini and Android 16 updates?

May 9, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2025 TechBrunch. Designed by TechBrunch.
  • Home
  • About Tech Brunch
  • Advertise with Tech Brunch
  • Contact us
  • DMCA Notice
  • Privacy Policy
  • Terms of Use

Type above and press Enter to search. Press Esc to cancel.