How Google Created an AI That Improves Itself

AlphaEvolve marks the dawn of recursive AI

May 16, 2025

∙ Paid

I. DeepMind’s big scientific project

DeepMind wants to solve science. They've been trying for a decade. The first steps—those of us paying attention back in the late-2010s saw them with a mix of surprise and enthusiasm—were AlphaGo, AlphaGo Zero, AlphaZero, and MuZero. What caught the attention of the media and the industry was their ability to outperform humans in games like Go and Chess, whose difficulty comes from the combinatorial explosion that follows each move.

What caught my attention was something else: DeepMind wasn’t moving toward more complex games (they started with Go and then moved to Chess, considered “easier” in strictly mathematical terms) but toward greater independence. DeepMind didn’t want AI to beat us at those games—that was an instrumental goal, a press-friendly by-product—but to discover tactical and strategic secrets we couldn’t find ourselves. It wasn’t enough to make AI the greatest player at those games; the algorithms were created to find new knowledge.

At first, the human scaffolding was massive, but gradually, DeepMind would push the AI models to bootstrap themselves to superhuman mastery: no human-played games, no heuristics, no rules. DeepMind co-founder, CEO, and Nobel Prize winner Demis Hassabis used games as the ideal testing ground for his ambitions—a controlled domain, with perfect information, explicit and verifiable rewards, a finite action space—but they were effectively just a prototype of what was to come.

To Hassabis, a chess prodigy, the chessboard was a scale model of the universe. Like physicist Richard Feynman, he imagined the cosmos as a game with hidden rules we could unveil through science. If AI could master those simpler games without our help, how far could it go? Could it become the ultimate tool for scientific discovery?

Over the years, the ambition of Hassabis and DeepMind—fueled by Google’s resources and by a growing stack of successes—became more explicit.

In 2020, AlphaFold learned to predict protein structures from their sequences, a classic problem in biology in which even small advances took years of experimentation. In 2024, AlphaFold 3 expanded the scope to modeling interactions between proteins, DNA, RNA, and small molecules, bringing AI closer to pharmacology and the expensive process of drug discovery. With AlphaTensor (2022), they tackled computational efficiency, discovering better algorithms for matrix multiplication. AlphaDev (2023) did something similar with sorting algorithms. They used AlphaGeometry (2024) to solve complex Euclidean geometry problems. With AlphaChip (2024), they improved the design of Google’s in-house hardware.

When you put them all together, you see that each individual effort is not as relevant as the pattern they form together. DeepMind has been testing different frameworks, environments, and forms of representation: games, biology, math, hardware, software. Each of these Alpha-something systems is a piece of a larger project; they outline the great ambition that DeepMind has been pursuing for a decade: the full standardization (domain-agnostic), independence (without humans), and automation (with a closed feedback loop) of scientific progress.

The latest piece of the puzzle has been placed, AlphaEvolve. Let’s see what it’s about.

II. AlphaEvolve: the culmination of a decade

From the blog post (paper here):

[AlphaEvolve is] an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.

Let’s unpack this paragraph to strip it of jargon.

Evolutionary algorithm: Evolutionary algorithms aren’t talked about much nowadays, so you may not know what they are, but they’re a fundamental part of AI history. The idea is that you start with a bunch of random candidates (e.g., algorithm ideas to improve matrix multiplication). Each one is a little different (like humans are to one another). The best-performing ones get to "reproduce," combining their traits or slightly mutating into new versions. Over generations, bad solutions die off, and good ones get better. You don’t have to tell the algorithm how to solve the problem, but how to recognize a good solution (an artificial version of survival of the fittest).

This is novel in two ways. First, modern LLMs have not been paired with evolutionary algorithms at scale until now, and second, the evolutionary approach is a replacement for reinforcement learning. Let’s use a chess analogy, since we like them so much. RL is like training a chess prodigy. It plays millions of games, learns from each win and loss through human-defined reward functions, and slowly gets better. Evolutionary algorithms are more like a vast gladiator-style tournament: thousands of players battle it out, and you breed the next generation from the winners. There’s no learning involved. It’s mutation and selection. Importantly, there’s absolute opacity about what works and why. In a way, this is the natural extension of the bitter lesson; what role are human engineers left to play if AI can do it all?

GPT-4: The Bitterer Lesson

Alberto Romero

March 21, 2023

Richard Sutton’s Bitter Lesson is, simplifying, that in the long-term it’s always better to let computation take the burden of finding greater AI systems (through learning or search algorithms) instead of trying to find them ourselves by leveraging our knowledge.

Read full story

Coding agent: You know this one. An agent is basically an LLM that autonomously performs actions of some kind for some time (examples are OpenAI’s Deep Research and Operator and Google’s Project Astra and Project Mariner). AlphaEvolve, through Gemini 2.5, writes code to create the algorithms (i.e., it doesn't do research or control a computer). One interesting bit is that AlphaEvolve isn’t restricted to coding a proposal of the final algorithm (say, for matrix multiplication) but can work at the meta level, i.e., coding a search algorithm that looks over the space of better matrix multiplication algorithms. This meta behavior makes it much more powerful than its predecessors (e.g., FunSearch) and further independentizes the process from human ingenuity.

General-purpose algorithm discovery: AlphaEvolve finds better algorithms across domains (so far: datacenter design, chip design, AI training, GPU optimization, and math; probably in the short-term future: drug discovery). The numbers are ok—Google has been using AlphaEvolve for a year now—but so far not good enough to be relevant for a high-level analysis like this one. In case you’re curious: 1) Datacenter optimizations save “0.7% of Google’s worldwide compute resources.” 2) A better matrix multiplication algorithm (beyond AlphaTensor’s solution) achieves a “1% reduction in Gemini's training time.” 3) In open math problems, it improved over the best-known solutions “in 20% of the cases.”

Automated evaluators: no human judges the best solutions. The system itself has an internal feedback loop to take the best proposed solutions as weighted by the evolutionary algorithm and gives it back to the prompting module, which then activates the LLM, and the cycle repeats.

Here’s a cool diagram of the whole thing:

III. So, in simple terms, what’s the big deal?

Did that work for you? Do you already understand why AlphaEvolve matters? Probably not. The main problem I have with company blog posts, papers, and even most commentary is that they rarely give enough de-jargonized context for anyone not obsessively following the literature—99.9999% of humanity—to understand.

Is this a big deal or not? What did Google have before, and how is this different? How does it compare to what competitors are doing? What’s the actual idea, concept, insight, or result that AlphaEvolve brings to the table that nobody else has? My answer is that AlphaEvolve brings to the table, in one package, four key aspects that make it apt for scientific discovery: it’s generalizable, it’s evolutionary, it's agentic, and it’s recursively self-improving.

Generalizable: It can be applied to any domains that benefit from improved algorithms, whether the algorithms themselves are the goal (eg, math problems) or whether they’re a means to find the solution (eg, drug discovery). It can’t be applied to domains where algorithms are mostly irrelevant (eg, international politics), so don't expect AlphaEvolve to solve the world’s problems.

Evolutionary. Because AlphaEvolve is evolutionary, it doesn’t need carefully designed rewards to learn and improve, like reinforcement learning systems do. Instead of trying to optimize against a specific goal, Gemini proposes a variety of programs that might work. The evolutionary process—natural selection followed by genetic mutation—takes it from there, gradually surfacing the best-performing ones. Evolution made humans intelligent. It’s not such a stretch to think it could also be the key to machine intelligence.

(When I started learning AI, I thought evolutionary algorithms were the obvious answer to “how do we make AI smart,” but it turned out they were impossibly expensive and were relegated to a secondary research venue. But just think about it: neural networks are but the biological product of millions of years of evolution, shouldn't we focus on the meta level instead? Enact an evolutionary process, accelerate it as much as you can, and eventually it’ll yield neural networks and then, whatever it is that evolution would have created if it were 1,000,000x faster!)

Agentic: The whole process is automated from start to finish (no human in the loop). There are no efficiency bottlenecks that AlphaEvolve itself can’t tackle. It can perform all the tasks in the pipeline, from prompting Gemini to program synthesis, to the evolutionary process, to evaluation of the proposals and feedback without humans meddling. (Google engineers can debug, interpret, verify, and improve the process, so don’t worry too much about a potential loss of control, at least until AlphaEvolve itself does it better than any of them can.)

Recursive self-improvement: This, together with the evolutionary reframing, is the biggest deal. The other traits are novel insofar as they’re combined with recursive self-improvement. Google is using AlphaEvolve to enhance efficiency painpoints in datacenters, chips, training/inference models, and GPU software, but it won’t stop there. AlphaEvolve is improving both the chips that train it and the LLMs that propose better programs (Gemini).

In a way, this is the beginning of Gemini improving itself (no wonder Google improved the models so much lately). Or rather, this is Google improving itself. Or rather, this is… I can go all the way to the ultimate meta-level. Actually, let’s do that.

Keep reading with a 7-day free trial

Subscribe to The Algorithmic Bridge to keep reading this post and get 7 days of free access to the full post archives.