GPT-4 was the most anticipated AI model in history.
Yet when OpenAI released it in March they didn’t tell us anything about its size, data, internal structure, or how they trained and built it. A true black box.
As it turns out, they didn’t conceal those critical details because the model was too innovative or the architecture too moat-y to share. The opposite seems to be true if we’re to believe the latest rumors:
GPT-4 is, technically and scientifically speaking, hardly a breakthrough.
That’s not necessarily bad—GPT-4 is, after all, the best language model in existence—just… somewhat underwhelming. Not what people were expecting after a 3-year wait.
This news, yet to be officially confirmed, reveals key insights about GPT-4 and OpenAI and raises questions about AI’s true state-of-the-art—and its future.
GPT-4: A mixture of smaller models
On June 20th, George Hotz, founder of self-driving startup Comma.ai leaked that GPT-4 isn’t a single monolithic dense model (like GPT-3 and GPT-3.5) but a mixture of 8 x 220-billion-parameter models. Later that day, Soumith Chintala, co-founder of PyTorch at Meta, reaffirmed the leak. Just the day before, Mikhail Parakhin, Microsoft Bing AI lead, had also hinted at this.
GPT-4 is not one big >1T model but eight smaller ones cleverly put together. The mixture of experts paradigm OpenAI supposedly used for this “hydra” model is neither new nor invented by them. In this article, I’ll explain why this is very relevant for the field and how OpenAI masterfully executed its plan to achieve three key goals.
Two caveats.
First, this is a rumor. The explicit sources (Hotz and Chintala) are robust but not OpenAI staff. Parakhin holds an executive position at Microsoft but he never confirmed it explicitly. For these reasons, it’s worth taking this with a grain of salt. The story is nevertheless very plausible.
Second, let’s give credit where credit’s due. GPT-4 is exactly as impressive as users say. The details of the internal architecture can’t change that. If it works, it works. It doesn’t matter whether it’s one model or eight tied together. Its performance and ability on writing and coding tasks are legit. This article is not a dunk on GPT-4—just a warning that we may want to update our priors.