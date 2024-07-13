Strawberries in a Basket, by Joseph Decker, 1887

Do you remember the infamous Q* project that Reuters leaked in November 2023?

It was presumably an AI model enhanced to solve math problems “at the level grade-school students,” above the level of any of the existing consumer AIs at the time. Some people at OpenAI called it a “breakthrough”. Analysts and bloggers came up with wild hypotheses about what it could be from the name alone (others not so wild).

OpenAI remained silent.

What most people agreed on at the time, speculation aside, was that Q* embodied a new step in one of the fundamental frontiers of current AI technology: Reasoning.

I wrote at length about the importance and state of the art of reasoning research in AI in my in-depth article on GPT-5: Everything You Need to Know. Let me give you a preview here of what you’d find there:

Evidence leads me to think OpenAI has been focused particularly on leaving behind pure imitation learning by integrating the power of search and RL with LLMs. That’s what the speculation about Q* suggests and what public clues from leading researchers quietly scream.

I also quoted Andrej Karpathy, ex-OpenAI, from a talk he gave at Sequoia on the topic:

I think people still haven’t really seen what’s possible in the space . . . The model needs to practice itself . . . it needs to figure out what works for it and what does not work for it.

I concluded:

So to go beyond imitation learning you have to integrate [large language models like GPT-4] with search, self-play, reinforcement learning, etc. That’s what people believe Q* is. That’s what I believe Q* is.

Well, ladies and gentlemen, tighten your seatbelts because we finally know what Q* is and for which purpose OpenAI wants to use it.

By the way, the project has evolved into a different name: Strawberry.