9 Comments
User's avatar
Jack Pierce's avatar

Mostly, I have a significant amount of source content, and I’m asking it to pull together relationships that I wouldn’t normally see myself or rewrite to a format with particular attention to numbers of sentences and how they’re treated. It used to do a superb job of this, but now it stumbles on the easiest requests.

Expand full comment
Alberto Romero's avatar

Very interesting. I wonder if what they did to GPT-4o is what they've done with GPT-4.5 so that it works worse on those kinds of reasoning tasks in exchange for world knowledge, intuition, emotionality, style etc.

Expand full comment
imthinkingthethoughts's avatar

Great writings Alberto. You are particularly prophetic here, and I haven’t seen anyone come close so far.

Do note that it is my understanding current LLMs do occasionally choose less than optimal next token predications, so there is some form of randomness. The problem is that randomly selecting a less likely word does not equate to a writing subtly changing the topic as that typically is on a more meta conceptual level.

The 2023 cutoff is quite astounding. I do wonder just how much of the internet since has become slop and if GPT-678 still have a knowledge cutoff of 2023 or so with other premium and vetted sources to sort wheat from chaff

Expand full comment
Shawn Fumo's avatar

Yeah I think we just don’t know the implications of 4.5 yet. It is interesting to note that the lab heads were wrong that just going up a size or so would fix hallucinations, etc. But the real question now is how much better it works for RL.

So I consider GPT-5 to be the actual test. That will probably tell us a lot about where everything stands in terms of scaling. And especially how it compares to smaller models with more refined RL that may have come out by then.

Expand full comment
James's avatar

I only recently learned that ChatGPT is making itself known in the NSFW community. Its personalization features are key and jailbreak techniques are available. It creates a much more robust character with more rounded personalities than character.ai. Some have invested in the Pro account in order to have a context window large enough for 2 weeks of play. I would suppose that improvements in emotional intelligence may have been the goal. It is said that OpenAI doesn’t encourage it but are aware of how people use it.

Expand full comment
Kenneth E. Harrell's avatar

Benchmark scores and real world usage seem so different. Given the way I use AI it’s all about practical use.

Expand full comment
Jack Pierce's avatar

I haven’t looked at 4.5 yet, but I’m terribly disappointed in 4o! It’s gotten really stupid… Can’t follow a simple prompt. Over and over and over it makes the same mistakes. Lately, I’ve had much superior results simply using Perplexity AI! OK, it’s not apples to apples, but at least Perplexity is reliable where GPT – 4o is not!

Expand full comment
Alberto Romero's avatar

What kinds of tasks do you use it for?

Expand full comment
Kenneth E. Harrell's avatar

What are your thoughts on Grok? I have been using it for a few weeks now and it seems pretty useful. I get straight no BS answers that don’t sound like I am talking to an HR representative.

Expand full comment