GPT-4's Secret Has Been Revealed
Unraveling OpenAI's masterful ploy
GPT-4 was the most anticipated AI model in history.
Yet when OpenAI released it in March they didn’t tell us anything about its size, data, internal structure, or how they trained and built it. A true black box.
As it turns out, they didn’t conceal those critical details because the model was too innovative or the architecture too moat-y to share. The opposite seems to be true if we’re to believe the latest rumors:
GPT-4 is, technically and scientifically speaking, hardly a breakthrough.
That’s not necessarily bad—GPT-4 is, after all, the best language model in existence—just… somewhat underwhelming. Not what people were expecting after a 3-year wait.
This news, yet to be officially confirmed, reveals key insights about GPT-4 and OpenAI and raises questions about AI’s true state-of-the-art—and its future.
GPT-4: A mixture of smaller models
On June 20th, George Hotz, founder of self-driving startup Comma.ai leaked that GPT-4 isn’t a single monolithic dense model (like GPT-3 and GPT-3.5) but a mixture of 8 x 220-billion-parameter models. Later that day, Soumith Chintala, co-founder of PyTorch at Meta, reaffirmed the leak. Just the day before, Mikhail Parakhin, Microsoft Bing AI lead, had also hinted at this.
GPT-4 is not one big >1T model but eight smaller ones cleverly put together. The mixture of experts paradigm OpenAI supposedly used for this “hydra” model is neither new nor invented by them. In this article, I’ll explain why this is very relevant for the field and how OpenAI masterfully executed its plan to achieve three key goals.