OpenAI GPT-4o: The New Best AI Model in the World. Like in the Movies. For Free
Everything you need to know (so far)
OpenAI has delivered.
In just a brief 25-minute live event they’ve changed the landscape completely.
Here’s the image that best reflects why:
The graph above is the Elo chart from the LMSys arena. See that number circled in red a foot above the rest? That’s the mysterious “im-also-a-good-gpt2-chatbot” model that’s been doing the rounds on Twitter.
It’s also the new OpenAI model: GPT-4o (“Omni”). As early users thought from interacting with the gpt2-chatbot models, it’s better than GPT-4. Well, way better. I bet no one thought it’d be 60 Elo points better than GPT-4 turbo, the previous best model in the world.1
For comparison, look at the models from other companies—Gemini 1.5 Pro, Claude 3, and Llama 3-70B (4, 5, and 7 respectively)—all trying desperately to reach the performance level of a 1.5-year-old model (with recent updates but old nonetheless).
That’s amazing. But we knew OpenAI would have by now a much better model than GPT-4. That’s no surprise (although it must feel good for them to release this one day before the Google I/O conference).
That’s why GPT-4o being better than GPT-4 isn’t the big news. There are two other things more important than that, which they managed to fit in such a short time.
GPT-4o is the best AI model ever and it’s free
Let me repeat that: Free users, who have been using GPT-3.5 for years, can suddenly access GPT-4o—a few iterations better—for absolutely no money.
This is a radical change to the business dynamics of the entire sector. It’s a reset of all the little squabbles that have been happening until now. It’s a forced overhaul of any prospects for the short-term future for all of OpenAI’s competitors.
Let’s see why.
When OpenAI announced ChatGPT back in November 2022 with GPT-3.5 as a backend model, people went crazy. It was an amazing jump from GPT-3 and, for most people, from nothing. At the time, GPT-3.5 was indeed the best model and it was accessible as a chatbot for free. Many analysts wondered how OpenAI would afford it with millions of users doing inference daily (Microsoft’s money, probably).
Four months later OpenAI released GPT-4 and turned it into a $20/month subscription tier in ChatGPT to cover the costs.
Most people stuck with the free ChatGPT but those willing to spend a few bucks on the upgrade soon realized that, for many tasks, it was a bargain. Of course, free users never got around the idea that GPT-4, just half a version above what they were using, could be so much better. OpenAI lost the ability to attract them to what they surely knew was, indeed, a bargain.
This was a mistake by OpenAI. A miscalculation about how much people would be willing to pay for an improvement they couldn’t taste. But it was a necessary mistake because OpenAI couldn’t afford—or didn’t want—to make GPT-4 free.
Now, with efficiency improvements across the entire stack materialized into GPT-4o, they can.
Making the best model in the world for free is an absolutely devastating move for all competitors which are now left trying to make money with $20/month subscriptions for models that are significantly worse and also much less known.
It’s also an amazing gift for all of us (some advantages for paid users remain, like a 5x usage limit and other priorities I’ll comment on below).2
OpenAI will roll out GPT-4o in the coming weeks to all users. Stay tuned.
GPT-4o is the closest we have to AIs from movies
I’m focusing on price here because, as Mira Murati said, this demo was intended to entice free users to get a taste of what a truly amazing AI model feels like. But there’s much more.3 The LMSys arena Elo is a nice unbiased measure of how good the model is—at text.
But GPT-4o is natively multimodal end-to-end model.4
This means it’s capable of processing text, audio, voice, video, and images all at once, in a seamless way (keeping the distance, that’s how humans do it. Note that Google DeepMind’s Gemini already does this).
Here’s where the Her-like AI assistant rumors come in: GPT-4o has great voice/video capabilities, better than anything else by far (soon to be available for ChatGPT Plus users). It’s highly responsive to emotions, shows uncannily human mannerisms, and it’s extremely fast, as in 100% real-time fast (instant translation is now possible).
It’s hard to believe they managed to create a model that’s better than GPT-4 while making it so low-latency (remember to not trust demos, but they’re about to roll it so we’ll be able to test it ourselves soon).
In contrast to the previous voice mode for ChatGPT, which was a modular approach (speech to text, then text to text, then text to speech again), GPT-4o is end-to-end:
With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.
Here’s an example tweeted out by the OpenAI account:
Here’s another example (from the demo), tweeted out by Altman:
Another from the blog post:
It laughs. It gasps. It giggles. It hums. It blushes. It flirts. It’s AI anthropomorphism made product.
That’s awesome for personal, private use—we’ll see what happens when it’s not. Whatever it is, this is a game-changer. Once the underlying model gets truly good—as in I-can’t-tell-it’s-AI good—most people won’t get tired of it like they do from ChatGPT now (watch more voice/video demos here).
Altman wrote a blog post on the value of GPT-4o as a human-computer interface. Here’s what he thinks about the voice/video capabilities:
…the new voice (and video) mode is the best compute interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change. The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.
GPT-4o is obviously not Her-level (or Skynet-level) intelligence but it’s better than anything else and an important step efficiency-wise (there are no details on how OpenAI achieved it). It’s also a promise that defeats the widespread belief that GPT-4-class models were the best we’d get from large language models alone. It seems many people were mistaken about that.5
My first-impression conclusions are clear:
First, making free the best AI model in the world is a flex no other company could achieve and resets any predictions they may have had about how the landscape will evolve for the rest of the year.
Second, the assistant-like voice/video capabilities—with emphasis on human mannerisms, emotion, and real-time cadence (if it’s as good as the demo shows)—will make me reconsider, for the first time, using it seriously as an assistant.
Third, if OpenAI closes the deal with Apple and manages to implement a distilled, on-device GPT-4o to replace Siri, it’s game over for Google, Meta, Anthropic, etc.
Although the LMSys arena is now the preferred evaluation of the public, OpenAI tested GPT-4o on traditional benchmarks. It’s better than any other model but perhaps not so much as we could assume from a 60-point Elo gap. Here’s a chart:
An interesting question is: how do they plan to retain as many paid users as they have now just with upper usage limits and other light priorities? For instance, I pay ChatGPT but will consider becoming a free user as soon as I have access to GPT-4o. My reading is this: OpenAI wants to focus on enterprise and API customers—those who are truly heavy users—and want to give away free access to the best models for all users through the playground (this may change once GPT-5—whatever it’s called—is out, we’ll see). If that business model works out for them it’s a great move both to make life harder for competitors and benefit users (Altman said there won’t be ads on ChatGPT). Also, note that voice is coming for Plus users (not for free users, at least now).
You can read about capabilities not mentioned in the demo in OpenAI’s blog post (there are some cool things in there for more picky readers). Here’s one example:
In case you’re not sure what this means or how it’s different from what we had, Will Depue explained it well here: “i think people are misunderstanding gpt-4o. it isn't a text model with a voice or image attachment. it's a natively multimodal token in, multimodal token out model. you want it to talk fast? just prompt it to. need to translate into whale noises? just use few shot examples. every trick in the book that you've been using for text also works for audio in, audio out, image perception, video perception, and image generation.”
GPT-4o at 1310 Elo vs GPT-4 turbo at 1253 (and others below that) isn’t a “we’ve broken the scale” upgrade, but it’s sufficiently substantial to dissipate any thoughts about companies clustering at GPT-4-level performance due to algorithmic limitations that’d only be broken with a new paradigm. In any case, it’s wise to wait for the skeptics to arrive and put GPT-4o against the ropes—we can’t expect OpenAI to do that for us.
Kudos for the quick turnaround on this in-depth piece about GPT-4o. It's impressive how you've captured the essence of this release, highlighting not just the technological advancements, but also the economic and societal implications, with such speed and clarity. Your ability to distill complex developments into an engaging narrative stands out. Once again.
Your crystal ball prediction from this last Saturday was fantastic Alberto! Now a couple of dumb questions: 1). Will this mean that everyone can access the GPT store via either 4 or 4o? 2) Will this multimodality change the very nature of the tools that we can create on the GPT store? The first question has great bearing on my new leaf fledgling business, which is to tap the power of custom GPTs. I always worried about perspective clients barking at the $20 a month. Is that now a done deal?