The GPU Shortage Has Forced AI Companies to Take a Break That Could Last Until 2024
AI faces three major hurdles right now. The first is the lack of cutting-edge Nvidia H100s, which AI companies need to train and do inference on their best models
Although it may not seem like it, AI progress is slowing down.
The three main elements of every modern AI system—GPU hardware, machine learning algorithms, and high-quality data—no longer yield the same results we’ve grown accustomed to. Perhaps it feels coincidental that it happened at the same time, but I think there’s a causal relationship.
As companies explored the possibilities that first AlexNet and ImageNet sparked in the early 2010s and later the Transformer turbocharged from 2017 onward, they followed three paths in parallel: they searched for new algorithms and optimized existing ones with engineering ingenuity; scrapped and curated the immense global database that is the internet to feed them, and used increasingly powerful hardware and computers to train them.
This triple-front approach has produced an impressive wave of achievements throughout the years. Leading AI companies like Google, OpenAI, and Meta know the main reason it’s worked so well is that progress on any of those fronts serves as a proxy substitute for a lack of progress on any of the others. That is, if companies couldn’t find better algorithms, or solve the main architectural limitations of current ones (like the quadratic transformer bottleneck), they could throw in more GPUs. Or, in case GPUs became unaffordable, they could scrap new data and do more passes on the training datasets. And so on.
The inevitable consequence—sped up dramatically by the generative AI boom—was that at some point the three paths would drain out, one after another. It seems that point is now. Do not worry; this isn’t some fortuitous event companies didn’t foresee. They knew it would happen and have been planning and acting accordingly for months. I’m confident they can manage to avoid or break through these obstacles but I will also dare say this is kind of a heart-stopping moment for AI.
I want this resource to be a broad overview of the situation and the approaches companies are taking. Doing that in one article would make it too long so I’ll turn this into a three-part series. I will publish them in the coming days with the normal schedule.
Let’s begin with the first hurdle: The global GPU shortage. Why companies can’t get the GPUs they need, how that’s affecting AI progress, and how they can solve it.