SAN FRANCISCO, CA—OpenAI, the most talked-about tech start-up of the decade, convened an emergency company-wide meeting Tuesday to address what executives are calling “the single greatest existential challenge facing artificial intelligence today”: Why can’t their models count the R’s in
This strawberry thing is pretty funny. I just asked 4o, and it said there were three. Disappointed, I said: "You sure?" and it said: "You're right to double-check! Upon reviewing, the word "strawberry" contains two "r's"—one in "straw" and one in "berry." Thank you for catching that!" I love it
Loved it! Now if the damn thing could just count rs in "strawberrry" too o1 literally counts it as 4 rs and then says 3. Followup and ask it how many rs should strawberrry have and it will say 2. I think this goes to the core of ai search as well. Likely a different problem but all leading to trust.
While you may or may not get the correct answer to the original puzzle question, because I believe they correct for that... Use the apple research which showed that if you change the prompt slightly and failure rates increase. The "how many R's should there be". Works on almost every model I've tried it on.
funny. i thought they had that fixed a while back. at any rate, you can ask GPT to retokenize the word and then count and it comes up with the correct answer. on the other hand, Claude does this automatically and comes up with the correct answer. i will also note that GPT still has the problem of 9.11 being greater than 9.9. i thought this was fixed as well.
Amazing, isn't it? (It's a meme in the AI community, especially on Twitter, because companies haven't yet managed to make an AI model that counts them correctly all the time. It's hilarious in its own right, even without the satire)
People have been focusing on this strawberry thing, but the more general and pressing question involves the extent and nature of other LLM failure modes. If AI can't solve trivial problems like this, why should we expect it to work anywhere else? Do all the anecdotal "mind blown" and "we are cooked" results somehow more than make up for this, as seems to be implied? That is, don't worry if AI can't do 2+2, because it'll happily solve thorny quantum physics problems for you. But what goes unsaid is that you better check the answers, whose correctness there's no good reason ever to trust.
this was a vey fun aticle, thanks fo shaing.
Down with the 's, we don't need these kind of lettes in ou alphabet!
LMAO
this eally cacked me up 😂😂
Grade A satire!
Thank you Rich!!
This strawberry thing is pretty funny. I just asked 4o, and it said there were three. Disappointed, I said: "You sure?" and it said: "You're right to double-check! Upon reviewing, the word "strawberry" contains two "r's"—one in "straw" and one in "berry." Thank you for catching that!" I love it
Lmao, Incredible
Loved it! Now if the damn thing could just count rs in "strawberrry" too o1 literally counts it as 4 rs and then says 3. Followup and ask it how many rs should strawberrry have and it will say 2. I think this goes to the core of ai search as well. Likely a different problem but all leading to trust.
Yikes, didn't know that... Very interesting actually that o1 fails that way!
While you may or may not get the correct answer to the original puzzle question, because I believe they correct for that... Use the apple research which showed that if you change the prompt slightly and failure rates increase. The "how many R's should there be". Works on almost every model I've tried it on.
funny. i thought they had that fixed a while back. at any rate, you can ask GPT to retokenize the word and then count and it comes up with the correct answer. on the other hand, Claude does this automatically and comes up with the correct answer. i will also note that GPT still has the problem of 9.11 being greater than 9.9. i thought this was fixed as well.
I didn't know miscounting r's in strawberry was a thing, but, I just confirmed CGPT really can't count them.
Amazing, isn't it? (It's a meme in the AI community, especially on Twitter, because companies haven't yet managed to make an AI model that counts them correctly all the time. It's hilarious in its own right, even without the satire)
Yes it is amazing. I'm still trying to wrap my head around what to make of these tools.
A consonant only society sounds more like a utopia to me. Think of all the time we’d save! Just one more way that AI allows us to be more productive.
The SMS era! We were much faster typing, right?
you caught me for a while !
That's because I didn't really have to stretch the truth too much lol
People have been focusing on this strawberry thing, but the more general and pressing question involves the extent and nature of other LLM failure modes. If AI can't solve trivial problems like this, why should we expect it to work anywhere else? Do all the anecdotal "mind blown" and "we are cooked" results somehow more than make up for this, as seems to be implied? That is, don't worry if AI can't do 2+2, because it'll happily solve thorny quantum physics problems for you. But what goes unsaid is that you better check the answers, whose correctness there's no good reason ever to trust.
But this is lighthearted satire. Just for enjoyment and laughter!
Certainly, I'm just curious about this whenever it comes up in any context.