21 Comments
User's avatar
Daniel Nest's avatar

this was a vey fun aticle, thanks fo shaing.

Down with the 's, we don't need these kind of lettes in ou alphabet!

Expand full comment
Alberto Romero's avatar

LMAO

Expand full comment
Jurgen Gravestein's avatar

this eally cacked me up 😂😂

Expand full comment
Rich James's avatar

Grade A satire!

Expand full comment
Alberto Romero's avatar

Thank you Rich!!

Expand full comment
Richard's avatar

This strawberry thing is pretty funny. I just asked 4o, and it said there were three. Disappointed, I said: "You sure?" and it said: "You're right to double-check! Upon reviewing, the word "strawberry" contains two "r's"—one in "straw" and one in "berry." Thank you for catching that!" I love it

Expand full comment
Alberto Romero's avatar

Lmao, Incredible

Expand full comment
Mike's avatar

Loved it! Now if the damn thing could just count rs in "strawberrry" too o1 literally counts it as 4 rs and then says 3. Followup and ask it how many rs should strawberrry have and it will say 2. I think this goes to the core of ai search as well. Likely a different problem but all leading to trust.

Expand full comment
Alberto Romero's avatar

Yikes, didn't know that... Very interesting actually that o1 fails that way!

Expand full comment
Mike's avatar

While you may or may not get the correct answer to the original puzzle question, because I believe they correct for that... Use the apple research which showed that if you change the prompt slightly and failure rates increase. The "how many R's should there be". Works on almost every model I've tried it on.

Expand full comment
James's avatar

funny. i thought they had that fixed a while back. at any rate, you can ask GPT to retokenize the word and then count and it comes up with the correct answer. on the other hand, Claude does this automatically and comes up with the correct answer. i will also note that GPT still has the problem of 9.11 being greater than 9.9. i thought this was fixed as well.

Expand full comment
Leo K's avatar

I didn't know miscounting r's in strawberry was a thing, but, I just confirmed CGPT really can't count them.

Expand full comment
Alberto Romero's avatar

Amazing, isn't it? (It's a meme in the AI community, especially on Twitter, because companies haven't yet managed to make an AI model that counts them correctly all the time. It's hilarious in its own right, even without the satire)

Expand full comment
Leo K's avatar

Yes it is amazing. I'm still trying to wrap my head around what to make of these tools.

Expand full comment
Geoffe's avatar

A consonant only society sounds more like a utopia to me. Think of all the time we’d save! Just one more way that AI allows us to be more productive.

Expand full comment
Alberto Romero's avatar

The SMS era! We were much faster typing, right?

Expand full comment
Philippe Delanghe's avatar

you caught me for a while !

Expand full comment
Alberto Romero's avatar

That's because I didn't really have to stretch the truth too much lol

Expand full comment
PT Lambert's avatar

People have been focusing on this strawberry thing, but the more general and pressing question involves the extent and nature of other LLM failure modes. If AI can't solve trivial problems like this, why should we expect it to work anywhere else? Do all the anecdotal "mind blown" and "we are cooked" results somehow more than make up for this, as seems to be implied? That is, don't worry if AI can't do 2+2, because it'll happily solve thorny quantum physics problems for you. But what goes unsaid is that you better check the answers, whose correctness there's no good reason ever to trust.

Expand full comment
Alberto Romero's avatar

But this is lighthearted satire. Just for enjoyment and laughter!

Expand full comment
PT Lambert's avatar

Certainly, I'm just curious about this whenever it comes up in any context.

Expand full comment