15 Comments

I mean the expectations were beyond sky high. Imagine the headlines if Google had shipped a lesser than GPT-4 model.

Both the demo and the benchmark methodology suggest that what they really wanted was to make a big splash, even though in actuality it looks like they managed to built something that is on-par with current state-of-the-art.

Don’t get me wrong, that’s fucking awesome. I have no dog in this fight. But it does raise the question: was Google so far behind that it took them a year to catch up with OpenAI, or are large language models hitting a plateau?

Expand full comment

Very good question to which I don't know the answer! We will see what OpenAI releases next (rumors say it'll be very soon!)

Expand full comment

Agree with the points presented here. It was a self-own on Google's part to release that video.

The thing is, they do this frequently. A couple years back they had a demo of the Google Assistant that could make appointments on your behalf. You could tell it what you wanted and in the demo the assistant made a voice call to a real human, had a conversation, navigated your google calendar and made the appointment. All without any additional human interaction. I remember being stunned. But Google Assistant still cannot work this way.

Lame and unnecessary at best. Dishonest/duplicitous at worst.

Expand full comment

Yes, and people no longer let them get away with this because its competitors are doing it better (just compare that with Mistral magnet links lol)

Expand full comment

A relevant anecdote here is that the first Apple iPhone event was staged as well. If Gemini was just a tweet to the API I doubt it would have had the same 'effect' as Sama's first ChatGPT tweet.

But I agree, such an agressive PR video was unnecessary in retrospect. I personally found the other smaller demo videos more impressive (where it seemed they were using some internal tool to do those).

Expand full comment

The principles of Shiny Object Syndrome might suggest that interest in incremental improvements to AI tools may decline over time. Not shiny enough. Increasingly, it may take breakthroughs in the user experience to maintain fascination. People have come to expect a historic development every other Thursday, and I doubt the developers can keep up with that insatiable demand. Or maybe it's just me, it's hard to tell.

In other news, now that you have 1,000 paid subs Alberto, this is officially a big time operation. Seems time for your own domain name. Technically, you don't actually own this blog, Substack does. Well, you own the content, but not the traffic.

At this point, it would be wise to have a disaster plan in place, even though disaster is not likely. If Substack were to go poof over night, what would it take for you to get back up and running? Huge investment here, be sure to protect is.

Expand full comment

I agree with both things Phil. I have to set up my own website (have wanting to do it for some time now). I do periodical exports of everything, though. Just in case!

Expand full comment

So long as you own the domain name, and have backups of all content, you're safe. You don't necessarily need any other site than this one.

It could be helpful to know how to move, and where to move to, should that ever become necessary. You could for example, mirror this blog on Ghost, just to ensure you're comfortable with what's involved in moving.

Probably nothing will happen. That's good enough for one of my blogs. Not for yours.

Expand full comment

I agree with this article for the most part. It is just so amazing, almost comical, how Google keeps doing all these unforced errors.

Why use different comparisons when they still beat GPT-4 on CoT@32? All it did was make people look at the 5-shot numbers where GPT-4 DID do better. I doubt it would have gotten nearly as much attention if it wasn’t apples to oranges. Also, why so much emphasis on MMLU when people have found 2-3% of the questions on that benchmark have errors in them? When they basically beat all the multi-modal benchmarks out there against dedicated models?

And the demo was just disappointing, after reading the blog. I assumed the demo was a bit fluffed up, but they didn’t use audio at all? It almost invites conspiracy because why wouldn’t they demo audio even once unless there is something wrong?

Like if they had given the model audio files and sequences of images from a video and it came back with audio files and images, I would have been ok with that having been manual instead of live. You could piece that together in a UI later. But it was all text and even the music part was passing a query back to something else to search for a song. Vs it being very easy to assume it created some music on its own.

Even more egregious in some ways is that the prompts weren’t just hand-holding a bit more than shown but full on few-shotted. Giving it the answers to the first 2 cup swaps and a way to represent the ball location and it just figured out the third?

As was said, the fact that it is GPT-4 level or a bit better is a great achievement. It’s just so strange that the only thing the demo did differently than GPT-4 was to generate some images without an external model.

They could have saved themselves sooo many problems by just being a tiny bit more direct about things.

Expand full comment

"They could have saved themselves sooo many problems by just being a tiny bit more direct about things." 100% agree Shawn!

Expand full comment

I agree with Domingo as regards Gary Marcus. I’ve unsubscribed. There’s a gleeful ambulance-chasing quality to his writing, a tawdry delight in someone else’s suffering. It also reads as unattached to any serious content: the breathless reporting in a National Enquirer story about someone’s private life, turning it into a self-serving public soap opera. “Inquiring minds want to know!”

Expand full comment

Grady Booch (which I admire a lot for his 80's work on OOP) has joined the Gary Marcus cult. They have become boring, always repeating the same songs.

Expand full comment

Well, I don't agree with everything Marcus or Booch say but they provide us with important perspectives for a space that would be heavily one-sided otherwise.

Expand full comment

They are stuck. I prefer people intellectually open to challenge their own beliefs: http://rodneybrooks.com/three-things-that-llms-have-made-us-rethink/

Expand full comment

Agreed! And thanks for the link, I like Brooks more as well.

Expand full comment