Hello everyone, welcome to TechCrunch's regular AI newsletter. If you'd like to have this sent to your inbox every Wednesday, hit the link and sign up here.
Last week, OpenAI launched Advanced Voice Mode with Vision, which feeds real-time video into ChatGPT, allowing chatbots to “see” beyond the limitations of the app layer. The premise is that by adding more context awareness to ChatGPT, bots can respond in a more natural and intuitive way.
But the first time I tried it, it lied to me.
“That couch looks comfy!” ChatGPT said when I held up my phone and asked the bot to describe the living room. I had mistaken the ottoman for a sofa.
“My mistake!” ChatGPT said when I corrected it. “Well, it still seems like a comfortable space.”
It's been nearly a year since OpenAI first demonstrated Advanced Voice Mode with Vision, which the company pitched as a step toward the AI depicted in the Spike Jonze movie “Her.” The way OpenAI sold it, Advanced Voice Mode with Vision gives ChatGPT superpowers, allowing the bot to solve sketched math problems, read emotions, and reply to affectionate letters. It will be.
Did we achieve all that? More or less. However, Advanced Voice Mode with Vision does not solve ChatGPT's biggest problem: reliability. If anything, this feature makes the bot's hallucinations more obvious.
At one point, I wanted to know if the advanced voice mode with Vision could help ChatGPT give me fashion tips, so I turned it on and had ChatGPT rate my outfit. It did so willingly. But while the bot commented on my jeans and olive shirt combination, it consistently missed the brown jacket I was wearing.
I'm not the only one who has encountered gaffes.
When OpenAI President Greg Brockman showed off advanced audio modes using Vision on “60 Minutes” earlier this month, ChatGPT made a mistake on a geometry problem. When calculating the area of a triangle, I incorrectly recognized the height of the triangle.
So my question is, if you can't trust an AI like Her, what good is it?
Every time ChatGPT misfires, the tedious sequence of reaching into your pocket, unlocking your phone, starting ChatGPT, opening advanced voice mode, and enabling vision is a pain in the ass, even under the best of circumstances. I felt that there was no longer a need to do so. Advanced Voice Mode is designed to engender trust with a bright and cheerful demeanor. It is unpleasant and disappointing when that implicit promise is not fulfilled.
Perhaps OpenAI will one day be able to completely solve the hallucination problem. Until then, we'll be using bots to monitor the world through criss-crossed wires. And frankly, I don't know who would want that.
news
Image credit: Olly Curtis/Future/Getty Images
OpenAI's 12-day “Shipmas” continues. OpenAI will release new products every day until December 20th. Here you will find a summary of all announcements that we update regularly.
YouTube allows creators to opt out: YouTube is giving creators more choice about how third parties can use their content to train AI models. Creators and rights holders will now be able to report to YouTube if a particular company is allowing models to be trained on their clips.
Meta's smart glasses get an upgrade: Meta's Ray-Ban Meta smart glasses receive several new AI-powered updates, including the ability to continuously talk to Meta's AI and translate between languages. Masu.
DeepMind's answer to Sora: Google Google's flagship AI research lab, DeepMind, wants to beat OpenAI in the video generation game. On Monday, DeepMind announced Veo 2, a next-generation video generation AI that can create clips longer than two minutes at resolutions up to 4K (4,096 x 2,160 pixels).
OpenAI whistleblower found dead: Former OpenAI employee Suthir Balaji was recently found dead in his San Francisco apartment, according to the San Francisco Office of the Chief Medical Examiner. In October, a 26-year-old AI researcher expressed concerns about OpenAI violating copyright laws during an interview with the New York Times.
Grammarly acquires Coda: Grammarly, best known for its style and spell-checking tools, has acquired productivity startup Coda for an undisclosed amount. As part of the deal, Coda CEO and co-founder Shishir Mehrotra will become Grammarly's new CEO.
Cohere is working with Palantir: TechCrunch exclusively reported that Cohere, an enterprise AI startup valued at $5.5 billion, is partnering with data analytics company Palantir. Palantir has been vocal about its close, sometimes controversial, cooperation with U.S. defense and intelligence agencies.
This week's research paper
Anthropic has pulled back the curtain on Clio (“Claude Insights and Observations”), the system it uses to understand how its customers are using its various AI models. Clio, which Anthropic compares to analytics tools like Google Trends, provides “valuable insights” to improve the safety of Anthropic's AI, the company claims.
Anthropic uses Clio to collect anonymized usage data, some of which the company made public last week. So what are customers using Anthropic's AI for? There are a variety of tasks, but web and mobile app development, content creation, and academic research are high on the list. As expected, usage examples vary by language. For example, Japanese speakers are more likely to ask Anthropic's AI to analyze an anime than Spanish speakers.
Image credit: Humanity
this week's model
AI startup Pika has released Pika 2, a next-generation video generation model that allows users to create clips from specified characters, objects, and locations. Through Pika's platform, users can upload multiple references (such as images of boardrooms or clerks), and Pika 2 will “intuit” the role of each reference before combining them into a single scene.
Of course, no model is perfect. See the “anime” below created by Pika 2. It has impressive consistency, but suffers from the aesthetic oddities that exist in all generated AI footage.
pic.twitter.com/3jWCy4659o As I said earlier, anime will be the first genre to be 100% AI-generated. It's amazing to see what's already possible with Pika 2.0
— Chubby♨️ (@kimmonismus) December 16, 2024
Still, tools are rapidly advancing in the video space, simultaneously tantalizing and angering creators.
grab bag
The Future of Life Institute (FLI), a non-profit organization co-founded by Massachusetts Institute of Technology cosmologist Max Tegmark, provides research across five key areas: present hazards, safety frameworks, and existential security. The company announced the “AI Safety Index,” which aims to evaluate the safety practices of major AI companies. safety strategy, governance and accountability, transparency and communication.
Image credit: Future of Life Institute
Meta was the worst of the group rated by the index, with an overall rating of F. (The index uses a numerical and GPA-based scoring system.) Anthropic was the best, but it didn't manage as well as C. This suggests that there is room for improvement.