The Algorithmic Bridge

Agreed.

Expand full comment

I wonder what you think of this “paper”: https://arxiv.org/html/2506.09250v1

Expand full comment

Saw it on Twitter but didn't read it as I imagine the points are similar (although surely better explained). Anything that stood out to you?

Expand full comment

It is actually a disgrace. See section 5.

Expand full comment

Why is it a disgrace? I've read section 5

Expand full comment

Don’t you think using a script to solve Hanoi is cheating? This has nothing to do with reasoning.

Expand full comment

No, that's using a tool to solve a problem. It's out of the scope of the original paper but why is that cheating? Humans use tools all the time (including the tools that our brain provides, like memory or visual processing)

Expand full comment

But here we are not talking about problem solving, but whether LLMs can do reasoning for complex problems. It’s like if we want to prove Llm can do arithmetic, then a calculator should never be in the picture.

At best, the script thing is irrelevant to his conclusion. But I don’t believe he doesn’t see that, then he is just dishonest.

Expand full comment

Continue thread →

Shaeda Ramblings

I'm late reading this, here's my thoughts:

1. Good post, insightful. And the top comment was especially great, something I'd never even considered before (and I suspect many judging by the response to it).

2. Completely agree re the hacks/grifters just pushing anything and everything for clicks.

3. When I first read that this was an AI breakdown I had these same concerns as @Mark. Without knowing the input it's hard to trust the output as much. For example, if you're going to use AI to breakdown a topic it would be better I think it just link the chat and to also, in the spirit of fairness, prompt it to play both sides and then potentially consider asking it to come to an unbiased conclusion etc.

4. I echo Mark's concerns re the information provided to the model, and I'm not sure I would agree/understand (with) your point that the model already has all the context it needs in it's dataset etc and thus providing the additional link as a search is irrelevant. If this were the case then why would provide the link at all? Because this is a known technique to improve output. The same with the non-provision of cited references, although this ironically would have just caused collapse (and hallucinations) anyway, so it would have been tricky to implement.

5. I tried doing the both-sides/unbiased approach here with G2.5P: https://g.co/gemini/share/60d94db84ea6 -- it would be interesting to see if you had any thoughts or rebuttals to what it said.

Expand full comment

2dEdited

Thank you for the comment!

What Gemini says there is pretty much what I think (I acknowledge the value of Apple's paper and underscore that the blame is on the influencers who exaggerated the claims, not the paper or the authors). By no means does Apple's paper incite as a takeaway "AI models can't reason," even if the headline is clearly a click bait.

Re your other points: I agree that sharing the chat is useful, unless there's a much longer conversation going on haha. Not sharing it doesn't invalidate the response it gave me, though. I more or less clarified the prompt as well in the two points I say I gave o3.

Expand full comment

Laurie

This was a helpful walkthrough!

Expand full comment

Paul Topping

I don't have enough knowledge to challenge the details, but "charitable" seems like the perfect one-word description of o3's review. It amounts to saying that the authors didn't prove that LLM reasoning isn't possible, merely that it didn't happen in the cases they tested and reasoning might still happen if a few changes are made. It is an answer that human AI afficionados might write. Hope springs eternal!

Expand full comment

Actually, it goes a little beyond that. It challenges the premises, e.g., that reasoning traces are not faithful representations of the actual reasoning inside the AI model and thus no research of this kind - without going inside the model to see what it's doing - can provide useful insights (I agree and it's why I hold Anthropic's interpretability research in such high esteem).

Expand full comment

Paul Topping

But will we ever be able to prove that LLMs aren't reasoning by "going inside the model"? Seems like whenever interpretability researchers fail to find reasoning, they are always going to conclude that they just aren't looking at the model the right way.

Expand full comment

yes, that's a possiblity, but actually they've found instances of ill reasoning and bullshitting and similar behaviors, so they seem to be taking this seriously.

Expand full comment

Stephen Moore

I would have trusted the company's instincts on this without thought back in Apple's glory days, but, with current Apple, and how lost it is, how creatively bankrupt it is, this seems more like an attempt to stall, or buy time, or to try and throw shade on its competitors.

Expand full comment

Yeah, I agree with your take on Apple's current state. But I'd say the paper is an honest attempt at showing flaws in AI models. I think influencers are at fault here for taking a little piece of truth (AI models indeed rely on pattern-matching and memorization, but they are capable of more) and exaggerating it way too much. This is, after all, a little research team within Apple, which is a giant company. And they made sure to qualify the study. (They chose the headline, which is clearly clickbait but who wouldn't!)

Expand full comment

Mark

It seems to me that this was a highly biased exercise. You gave o3 only two sources of information: the Apple paper (which you instructed it to critique) and the Anthropic paper, which you assume to be relevant to the critique. But Haiku is a smaller model, and is not a reasoning model, so the relevance is questionable. Furthermore, there is a trove of other relevant research that you did not give to o3 - starting with the union of the 46 references in the Apple paper, and the 95 references in Anthropic's paper. In short, o3 was not properly equipped to provide a valid critique, and was heavily skewed in the limited basis that it was given for its analysis. By your own admission, you had already determined for yourself that Anthropic's paper provided a basis for rebuttal of the Apple researchers, and you set o3 up to reproduce that rebuttal.

Expand full comment

Reply (2)

If you know how LLMs work, you know this isn't true. I merely limited the *search* to those two papers. o3 has more than enough knowledge to refute the paper without searching at all. It doesn't need 100 references like I don't need 100 references. (In the post I explain I limited the search to these two sources because otherwise o3 finds many more flaws and I wanted to keep the post short.)

Besides, Anthropic's work is not "about Haiku" but about the biology of an LLM. All LRMs are LLMs. That Apple chose to separate them into different categories is a taxonomical choice of little importance. All the results from Anthropic's interpretability work apply in Apple's study, namely, they took the visible reasoning traces as the actual reasoning and that's a serious mistake.

Expand full comment

Christian

Great point!

Expand full comment

Alex Jukes

One thing I find fascinating in the discussion around AI and self-expressed reasoning is that there's pretty good evidence that humans themselves don't reason in the way that we express we do.

The Enigma of Reason makes the case that reason is primarily an evolutionarily evolved social mechanism to help us explain why we made certain choices to others. We don't make decisions based on reason, we make them with a mix of intuition and feeling, but then need reason to help others understand our actions.

Perhaps AI is more similar to us than we may think.

Expand full comment