I. A black box we can't seem to open
There’s a fascinating research area in AI the press doesn’t talk about: mechanistic interpretability. A more marketable name would be: “How AI works.” Or, being rigorous, “how neural networks work.”
I took a peek at recent discoveries from the leading labs (Anthropic and OpenAI). What I’ve found intrigues and unsettles me.
To answer how neural nets work we need to know what they are. Here’s my boring definition: A brain-inspired algorithm that learns by itself from data. Its synapses (parameters) change their value during training to model the data and adapt the network to solve a target task. One typical target task is next-word prediction (language models like GPT-4). You can also recognize cat breeds.
A neural net isn’t magic, just a program stored as files inside your PC (or the cloud, which is slightly magical). You can go and look inside the files. You’ll find decimal numbers (the parameters). Millions of them. But, how do they recognize cats? The answer is hiding in plain sight, in numeric patterns you can’t comprehend. Humans can’t decode how they cause behavior. Not even our best tools can.
That’s why neural networks are called “black boxes.”
You witness in awe how your little program names one hundred cat breeds just from pictures but when you go to the files and look inside—where’s the cat?
It’s in the box, obviously.
You’re a dog person anyway so you wonder: “Where are the useful neural networks?”
Well, everywhere. ChatGPT has one. Google Translate, DeepMind’s chess player AlphaZero, and TikTok’s For You algorithm do, too. Also Tesla, Waymo, and Cruise’s attempts at self-driving cars. Top-tier media apps—Spotify, Netflix, and YouTube—use them to show you stuff you may like. They’re applied in medical diagnosis (now, a few years ago, and way, way earlier than you imagine), biological research, weather forecasting (been a while), space exploration (now and then), and military purposes.
They are not new and they are not niche.1
Neural nets remain black boxes despite the continuous effort throughout summers and winters and despite being present in hundreds of scientific areas and phone apps—some you use as a consumer and some you’re subject to as a citizen. You use neural nets daily and neural nets are daily used on you.
“I see ancient black boxes everywhere” sounds like the perfect horror-science fiction crossover. But we’re no blend for this mash-up.
II. Millions to move fast, billions to break things
Thankfully, interpretability researchers are solving this question, right?
They receive millions of dollars from funders like Open Philanthropy. Add to that Anthropic’s budget (not OpenAI’s, they’re not serious). That’s a lot compared to almost anything except, ironically, the billions of dollars companies and investors are pouring into making their work harder. Want to understand these black boxes? Ha! I’ll make them bigger, more complex, opaque—and self-improving! Want to study their flaws? Sure, after I productize and integrate them into every service.
Why is the last bastion against our ignorance in AI so underfunded, while the forces pushing it into obscurity receive vast sums? One can only wonder what interests lie behind the gaps in our knowledge. On the road to answering the most important question of the most important invention of our times, we’re bound hand and foot by the golden chains of profitability.
The only progress we’ve made under these twisted conditions is that we now know that we know nothing.
If you ask the experts at the forefront of interpretability research, they readily admit it. Dario Amodei, CEO of Anthropic, says “Maybe we . . . understand 3% of how [neural nets] work.” Leo Gao, researcher at OpenAI, says it plainly: “We don’t understand how neural networks work,” a statement “strongly seconded” by Neel Nanda, lead of interpretability at Google DeepMind.
They’re top people, at the top labs, at the top of their game.
They don’t know; no one does.
I’m not sure if this unsettles or excites me more.
III. AI research is no longer about curiosity
What I sure feel—as vibrantly as when I found AI in 2015—is a deep curiosity. I’m not interested in knowing how AI works because I’m afraid of it. Mine is sheer scientific curiosity.
AI isn’t an invention like a computer or a calculator. Nor is it a discovery like the theory of Relativity. It’s a discovered invention, like a forgotten artifact an ancient alien species left behind (except we designed it). Nothing sparks my curiosity as much.
This truth-seeking curiosity is long gone from the scientific arena. AI grew complex, inciting our inquiry, but it also grew useful, shutting it down. The trade-off was in the hands of money—utility over scrutability. Researchers shifted the focus from explanatory theories to predictive tools and statistical models. The result? We’re engineering an intelligence our intelligence can’t reverse-engineer.
About this, Noam Chomsky said that “Statistical models . . . provide no insight.” Peter Norvig responded with a long essay (recommended lecture). Here’s a relevant excerpt:
. . . it can be difficult to make sense of a model containing billions of parameters. Certainly a human can’t understand such a model by inspecting the values of each parameter individually. But one can gain insight by examining the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.
I agree. We can inspect AI’s behavior from the outside. But we don’t know what causes such behavior. We’re only beginning to create tools to steer it at will.
Norvig wrote this 13 years ago. He says “a model containing billions of parameters.” 10 billion is small now. If we were already hopeless at such a “tiny” scale, imagine now as we race “through the OOMs [orders of magnitude]”? Datacenters are the size of an airport with energy requirements the size of a city. The largest AI systems are counted in the trillions of parameters, not far below the size of the human brain.
Their performance and intelligence grow fast. Our knowledge is slow.
Perhaps it’s time we ask how we got here.
Or, more importantly, why we keep going forward.
We are incurring a huge intellectual debt in the form of unintelligible tech that we use without wisdom or restraint. A debt that, with each passing day, we are further away from paying off.
The ubiquitousness of neural nets is why “AI = neural networks”, albeit strictly wrong, is an accepted approximation (except for scientists. They’d kill me for writing down such a gross equivalence… although they called it “AI” so perhaps they don’t have much of a strong case).
There is much that we do not understand about our brains. A variety of brain matter, such as glial cells (including astrocytes, oligodendrocytes, and microglia), dendrites, and other non-neuronal components, along with their metabolisms, contribute to the cognitive performance of the human brain. These other things add layers of complexity that go beyond synapses and, in many people's view, involve significantly more than trillions of computational units. The interconnectivity, plasticity, and intricate structure of the brain, combined with its energy efficiency, make comparisons to modern computer clusters and LLM models somewhat unfair. Modern systems lack the parallel processing and metabolic efficiency inherent in the human brain.
Brilliant writing as always Alberto, something I wish you to be very proud of.
Emergence is absolutely fascinating and, from what it feels to me, inherent to intelligence?