There’s been a change of plans:
(Here’s the original plan from February 2025.)
Some random thoughts on what we should expect.
Deep Research is really good but expensive and slow (it uses o3). o4-mini will likely be in the ballpark of o3 performance-wise but will fix those other limitations, making the best AI research assistant in the world widely available, even for free users at a limited rate. (OpenAI has learned that making stuff more available pays dividends down the line even if it loses them money in the short term, e.g., Ghibli frenzy.)
The iteration cycles have shrunk dramatically. o1-preview was released in September 2024. Full o1 (and o1-pro) were released in December 2024. o3 will be released within two weeks (April 2025), together with o4-mini (o4 is probably ready but too costly to serve; I assume it will be integrated into GPT-5 eventually). The turnaround time between reasoning model generations (o1, o3, o4) is not 18-24 months, as between base models (GPT-2, GPT-3, GPT-4, GPT-5), but 3-4 months. That’s a 6x speed-up.
Google Gemini 2.5—probably the top AI model in many important areas (accessible for free and readily available)—has forced OpenAI to drop the o3/o4-GPT-5 merge and launch them separately in two stages. I think they’re afraid that staying in second place for too long—especially while Google’s best model is free—could start bleeding users. They have ChatGPT brand inertia, and Google is terrible at marketing—too old school—but that’s not necessarily an ironclad edge.
Altman says it’s harder to integrate reasoning and base models than they thought. If this remains unsolved—no one knows how to train a model to accurately decide when to reason and when not to—they have two options: either they simplify the offering Anthropic’s style (each model is a big unified thing) or keep them separate until it stops making business sense (why serve models no one uses?). In any case, the current offering is so confusing that the only reason it’s working is that people don’t care beyond “ChatGPT” and cute features like Ghibli mode.
There’s another possibility: The reason they’re finally not merging o3/o4 and GPT-5 could be that the latter shines so much by itself that it doesn’t need to be enhanced with reasoning abilities from the get-go. This means that either OpenAI has only recently found a way to make it this good (thus the change of plans) or that Ghible Day reminded them in very strong terms that the general public doesn’t care that much about reasoning.
If it’s true that GPT-5 will be “much better than we originally thought”—which I believe, by the way—that will punch a hole in skeptics’ arguments of the kind “the scaling plateau won’t be solved with test-time compute” or “If OpenAI hasn’t yet launched GPT-5 it is because it’s not meeting expectations.”
Given that the GPT-4.5 release was focused on world knowledge, intuition, and, surprisingly, creative writing, I assume GPT-5 will follow the same recipe, except better. The math, logic, and code capabilities will be introduced through the o-series, but I don’t think GPT-5 will surpass top reasoning models on most benchmarks.
The reason GPT-4.5 felt underwhelming for most people (I had a different reading) is twofold: First, the benchmark results were not good. But, of course! It’s a base model, not intended to crush popular evals, which are tailored for reasoning models—vibes and taste are the new metrics. Second, it’s slow and damn expensive ($75/1M input and $150/1M output tokens). I believe this is partly because it is expensive and partly a tactic to prevent competitors from using it to generate high-quality synthetic data. That said, I believe OpenAI needs to fix this situation for GPT-5. They can’t afford it to even feel underwhelming. So they will likely make it not just better but also faster and/or much cheaper.
It seems I was wrong to think OpenAI would keep GPT-5 internally to improve smaller, cheaper models through distillation. I have no doubt they’ve already used it internally for several things (there’s no major edge for a model-building AI company than using its AIs to make better AIs), but it’s clear they still need to fight Google in the business wars and will keep making better products.
The “release new models or become obsolete” war is in full swing. Great for users, especially serious ones. Can’t wait for deep research with o4-mini and then o4. Consultants need to seriously reevaluate their business model. I’m in the midst of reconfiguring my way of working. Fortunately, I am formally retired (but still working more than full time), so I have more options. Most exciting time in my 55+ years in computer science / IT.
Great write-up. thank you.