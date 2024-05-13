OpenAI has delivered.

In just a brief 25-minute live event they’ve changed the landscape completely.

Here’s the image that best reflects why:

The graph above is the Elo chart from the LMSys arena. See that number circled in red a foot above the rest? That’s the mysterious “im-also-a-good-gpt2-chatbot” model that’s been doing the rounds on Twitter.

It’s also the new OpenAI model: GPT-4o (“Omni”). As early users thought from interacting with the gpt2-chatbot models, it’s better than GPT-4. Well, way better. I bet no one thought it’d be 60 Elo points better than GPT-4 turbo, the previous best model in the world.

For comparison, look at the models from other companies—Gemini 1.5 Pro, Claude 3, and Llama 3-70B (4, 5, and 7 respectively)—all trying desperately to reach the performance level of a 1.5-year-old model (with recent updates but old nonetheless).

That’s amazing. But we knew OpenAI would have by now a much better model than GPT-4. That’s no surprise (although it must feel good for them to release this one day before the Google I/O conference).

That’s why GPT-4o being better than GPT-4 isn’t the big news. There are two other things more important than that, which they managed to fit in such a short time.

GPT-4o is the best AI model ever and it’s free

Let me repeat that: Free users, who have been using GPT-3.5 for years, can suddenly access GPT-4o—a few iterations better—for absolutely no money.

This is a radical change to the business dynamics of the entire sector. It’s a reset of all the little squabbles that have been happening until now. It’s a forced overhaul of any prospects for the short-term future for all of OpenAI’s competitors.

Let’s see why.

When OpenAI announced ChatGPT back in November 2022 with GPT-3.5 as a backend model, people went crazy. It was an amazing jump from GPT-3 and, for most people, from nothing. At the time, GPT-3.5 was indeed the best model and it was accessible as a chatbot for free. Many analysts wondered how OpenAI would afford it with millions of users doing inference daily (Microsoft’s money, probably).

Four months later OpenAI released GPT-4 and turned it into a $20/month subscription tier in ChatGPT to cover the costs.

Most people stuck with the free ChatGPT but those willing to spend a few bucks on the upgrade soon realized that, for many tasks, it was a bargain. Of course, free users never got around the idea that GPT-4, just half a version above what they were using, could be so much better. OpenAI lost the ability to attract them to what they surely knew was, indeed, a bargain.

This was a mistake by OpenAI. A miscalculation about how much people would be willing to pay for an improvement they couldn’t taste. But it was a necessary mistake because OpenAI couldn’t afford—or didn’t want—to make GPT-4 free.

Now, with efficiency improvements across the entire stack materialized into GPT-4o, they can.

Making the best model in the world for free is an absolutely devastating move for all competitors which are now left trying to make money with $20/month subscriptions for models that are significantly worse and also much less known.

It’s also an amazing gift for all of us (some advantages for paid users remain, like a 5x usage limit and other priorities I’ll comment on below).

OpenAI will roll out GPT-4o in the coming weeks to all users. Stay tuned.

GPT-4o is the closest we have to AIs from movies

I’m focusing on price here because, as Mira Murati said, this demo was intended to entice free users to get a taste of what a truly amazing AI model feels like. But there’s much more. The LMSys arena Elo is a nice unbiased measure of how good the model is—at text.

But GPT-4o is natively multimodal end-to-end model.

This means it’s capable of processing text, audio, voice, video, and images all at once, in a seamless way (keeping the distance, that’s how humans do it. Note that Google DeepMind’s Gemini already does this).

Here’s where the Her-like AI assistant rumors come in: GPT-4o has great voice/video capabilities, better than anything else by far (soon to be available for ChatGPT Plus users). It’s highly responsive to emotions, shows uncannily human mannerisms, and it’s extremely fast, as in 100% real-time fast (instant translation is now possible).

It’s hard to believe they managed to create a model that’s better than GPT-4 while making it so low-latency (remember to not trust demos, but they’re about to roll it so we’ll be able to test it ourselves soon).

In contrast to the previous voice mode for ChatGPT, which was a modular approach (speech to text, then text to text, then text to speech again), GPT-4o is end-to-end:

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

Here’s an example tweeted out by the OpenAI account:

Here’s another example (from the demo), tweeted out by Altman:

Another from the blog post:

It laughs. It gasps. It giggles. It hums. It blushes. It flirts. It’s AI anthropomorphism made product.

That’s awesome for personal, private use—we’ll see what happens when it’s not. Whatever it is, this is a game-changer. Once the underlying model gets truly good—as in I-can’t-tell-it’s-AI good—most people won’t get tired of it like they do from ChatGPT now (watch more voice/video demos here).

Altman wrote a blog post on the value of GPT-4o as a human-computer interface. Here’s what he thinks about the voice/video capabilities:

…the new voice (and video) mode is the best compute interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change. The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.

GPT-4o is obviously not Her-level (or Skynet-level) intelligence but it’s better than anything else and an important step efficiency-wise (there are no details on how OpenAI achieved it). It’s also a promise that defeats the widespread belief that GPT-4-class models were the best we’d get from large language models alone. It seems many people were mistaken about that.

My first-impression conclusions are clear: