Discussion about this post

User's avatar
Pascal Montjovent's avatar

Here's the fundamental problem: LRMs are forced to present their "reasoning" in human-readable traces that mimic human thought patterns. We're not seeing how these systems actually think—we're seeing them pretend to think like us.

And that's precisely what Apple's study measures: the quality of this performance, not the underlying cognition. Then they extrapolate from performance breakdowns to conclude there's no real reasoning happening at all. It's like denying human intelligence because our performance collapses beyond certain complexity thresholds. Try calculating 847,293 × 652,847 in your head—does your inevitable failure mean you can't think?

Apple's methodology is solid, but their conclusion reveals a deeper confusion. They're measuring machine intelligence by human standards, then acting surprised when it doesn't match up perfectly.

But there's a broader point: LRMs manifest emergent cognitive processing that corresponds neither to classical algorithms nor human cognition. We're exploring uncharted epistemological territory with the same old maps.

What if these systems are developing radically different forms of intelligence? Novel cognitive processing with their own coherencies and limitations that have no human equivalent?

Maybe it's time to stop asking whether AI "really thinks" and start asking what kinds of thinking we're actually witnessing.

Expand full comment
Atsushi Ito's avatar

When people get to a certain level of complexity, they can't solve it without tools like pen and paper. If you want to compare properly, I think it is not fair unless you do it with or without tools.

By the way, when I tried it, o3 was solved by making a program. Under the same conditions, I think the results would be similar.

Expand full comment
16 more comments...

No posts