OpenAI has released a new flagship generative AI model called GPT-4o that it plans to roll out “iteratively” across the company's developer and consumer products over the coming weeks.
Muri Murati, chief technology officer at OpenAI, said GPT-4o provides “GPT-4 levels” of intelligence, but also improves GPT-4's capabilities across audio as well as text and vision. He said that
“Why GPT-4o across voice, text, and vision,” Murati said in a keynote address at OpenAI's offices.
OpenAI's previous flagship model, GPT-4, was trained on a combination of both images and text, and can analyze images and text to extract text from images or describe the content of images. I was able to perform tasks such as: But GPT-4o adds audio to the mix.
What exactly does this enable? A lot of things.
GPT-4o greatly improves your ChatGPT experience. ChatGPT is a chatbot powered by OpenAI's viral AI. ChatGPT has long offered a voice mode that uses a text-to-speech model to transcribe text from ChatGPT. GPT-4o enhances this and allows users to interact with ChatGPT like an assistant.
For example, a user can ask a question to ChatGPT powered by GPT-4o and interrupt ChatGPT during the response. OpenAI says the model delivers “real-time” responsiveness and can also pick up the emotion in the user's voice, allowing it to generate audio in “a variety of emotional styles.”
In other news, OpenAI releases desktop version of ChatGPT and updated UI.
“These models [are getting] It's becoming more and more complex, but we actually want to make the interaction experience more natural and easy so that users can focus solely on collaborating with the user instead of focusing on the UI at all. That's what I think. [GPTs]” Murati said.