It's been a rocky week for OpenAI, with executive departures and major funding developments, but the company has bounced back and is encouraging developers to build tools using its AI models at DevDay 2024. I'm trying to convince you. The company announced several new tools on Tuesday, including a public beta of its “Realtime API” for building apps with low-latency, AI-generated voice responses. It's not quite as advanced as ChatGPT's advanced voice mode, but it's close.
OpenAI Chief Product Officer Kevin Weil said in a press briefing ahead of the event that the recent departures of Chief Technology Officer Mira Murati and Chief Research Officer Bob McGrew are part of the company's progress. He said that it would not have any impact on the
“I want to start by saying that Bob and Mira were great leaders. I learned a lot from them, and they're a big part of getting us to where we are today.” said Weil. “And we have no intention of slowing down.”
OpenAI is undergoing further review by executives, a reminder of last year's post-DevDay turmoil, with developers saying the company still offers a great platform for building AI apps. trying to persuade. The startup has more than 3 million developers working on its AI models, but OpenAI operates in an increasingly competitive space, leaders said.
OpenAI has reduced the cost for developers to access its API by 99% over the past two years, but it may have been forced to do so as competitors such as Meta and Google continually lowered prices. He pointed out that it is highly sexual.
One of OpenAI's new features is called the Realtime API, which gives developers the opportunity to choose from six voices provided by OpenAI to build near real-time text-to-speech experiences within their apps. These voices are different from the voices provided for ChatGPT, and developers cannot use third-party voices to prevent copyright issues. (Although it is vague, there is no voice anywhere based on Scarlett Johansson.)
During the briefing, Romain Huet, Head of Developer Experience at OpenAI, shared a demo of a travel planning app built with Realtime APIs. The application allowed users to have a verbal conversation with an AI assistant about their next trip to London and receive a low-latency response. The Realtime API also has access to a number of tools, so the app could annotate the map with restaurant locations in response.
At another point, Huet showed how the Realtime API can talk to humans over the phone and ask them about ordering food for an event. Unlike Google's infamous Duo, OpenAI's API cannot call restaurants or shops directly. However, you can integrate with invocation APIs such as Twilio to do so. Notably, although these AI-generated voices sound very realistic, OpenAI does not add disclosure information for AI models to automatically identify themselves in such calls. That's not true. For now, it appears to be the developer's responsibility to add this disclosure, which may be required by new California law.
As part of the DevDay announcement, OpenAI also introduced vision fine-tuning capabilities to its API. This allows developers to fine-tune their GPT-4o applications using images as well as text. In theory, this should help developers improve GPT-4o's performance in tasks that require visual understanding. OpenAI's head of product APIs, Olivier Godeman, told TechCrunch that developers can't use copyrighted images (such as photos of Donald Duck), images depicting violence, or other content that violates OpenAI's safety policy. He said he would no longer be able to upload images.
OpenAI is competing to match what competitors are already offering in the AI model licensing space. Its prompt caching feature is similar to what Anthropic launched a few months ago to allow developers to cache frequently used context between API calls, reducing costs and improving latency. Masu. OpenAI says developers can save 50% with this feature, while Anthropic promises a 90% discount.
Finally, OpenAI offers a model distillation feature that allows developers to use larger AI models such as o1-preview and GPT-4o to fine-tune smaller models such as GPT-4o mini. Running small models generally reduces costs compared to running large models, but this feature allows developers to improve the performance of small AI models. . As part of model distillation, OpenAI is launching a beta evaluation tool to help developers measure the performance of their tweaks within OpenAI's API.
DevDay could cause an even bigger stir by not announcing it. For example, no news about the GPT store was announced during last year's DevDay. Last we heard, OpenAI was piloting a revenue sharing program with some of GPT's most popular creators, but the company hasn't made many announcements since then.
Additionally, OpenAI has stated that it will not be releasing any new AI models during this year's DevDay. Developers waiting for OpenAI o1 (not a preview or mini version) or the startup's video generation model Sora will have to wait a little longer.