I mean the expectations were beyond sky high. Imagine the headlines if Google had shipped a lesser than GPT-4 model.
Both the demo and the benchmark methodology suggest that what they really wanted was to make a big splash, even though in actuality it looks like they managed to built something that is on-par with current state-of-the-art.
Don’t get me wrong, that’s fucking awesome. I have no dog in this fight. But it does raise the question: was Google so far behind that it took them a year to catch up with OpenAI, or are large language models hitting a plateau?
Agree with the points presented here. It was a self-own on Google's part to release that video.
The thing is, they do this frequently. A couple years back they had a demo of the Google Assistant that could make appointments on your behalf. You could tell it what you wanted and in the demo the assistant made a voice call to a real human, had a conversation, navigated your google calendar and made the appointment. All without any additional human interaction. I remember being stunned. But Google Assistant still cannot work this way.
Lame and unnecessary at best. Dishonest/duplicitous at worst.
A relevant anecdote here is that the first Apple iPhone event was staged as well. If Gemini was just a tweet to the API I doubt it would have had the same 'effect' as Sama's first ChatGPT tweet.
But I agree, such an agressive PR video was unnecessary in retrospect. I personally found the other smaller demo videos more impressive (where it seemed they were using some internal tool to do those).
I agree with this article for the most part. It is just so amazing, almost comical, how Google keeps doing all these unforced errors.
Why use different comparisons when they still beat GPT-4 on CoT@32? All it did was make people look at the 5-shot numbers where GPT-4 DID do better. I doubt it would have gotten nearly as much attention if it wasn’t apples to oranges. Also, why so much emphasis on MMLU when people have found 2-3% of the questions on that benchmark have errors in them? When they basically beat all the multi-modal benchmarks out there against dedicated models?
And the demo was just disappointing, after reading the blog. I assumed the demo was a bit fluffed up, but they didn’t use audio at all? It almost invites conspiracy because why wouldn’t they demo audio even once unless there is something wrong?
Like if they had given the model audio files and sequences of images from a video and it came back with audio files and images, I would have been ok with that having been manual instead of live. You could piece that together in a UI later. But it was all text and even the music part was passing a query back to something else to search for a song. Vs it being very easy to assume it created some music on its own.
Even more egregious in some ways is that the prompts weren’t just hand-holding a bit more than shown but full on few-shotted. Giving it the answers to the first 2 cup swaps and a way to represent the ball location and it just figured out the third?
As was said, the fact that it is GPT-4 level or a bit better is a great achievement. It’s just so strange that the only thing the demo did differently than GPT-4 was to generate some images without an external model.
They could have saved themselves sooo many problems by just being a tiny bit more direct about things.
I agree with Domingo as regards Gary Marcus. I’ve unsubscribed. There’s a gleeful ambulance-chasing quality to his writing, a tawdry delight in someone else’s suffering. It also reads as unattached to any serious content: the breathless reporting in a National Enquirer story about someone’s private life, turning it into a self-serving public soap opera. “Inquiring minds want to know!”
Well, I don't agree with everything Marcus or Booch say but they provide us with important perspectives for a space that would be heavily one-sided otherwise.
I agree with both things Phil. I have to set up my own website (have wanting to do it for some time now). I do periodical exports of everything, though. Just in case!
I mean the expectations were beyond sky high. Imagine the headlines if Google had shipped a lesser than GPT-4 model.
Both the demo and the benchmark methodology suggest that what they really wanted was to make a big splash, even though in actuality it looks like they managed to built something that is on-par with current state-of-the-art.
Don’t get me wrong, that’s fucking awesome. I have no dog in this fight. But it does raise the question: was Google so far behind that it took them a year to catch up with OpenAI, or are large language models hitting a plateau?
Very good question to which I don't know the answer! We will see what OpenAI releases next (rumors say it'll be very soon!)
Agree with the points presented here. It was a self-own on Google's part to release that video.
The thing is, they do this frequently. A couple years back they had a demo of the Google Assistant that could make appointments on your behalf. You could tell it what you wanted and in the demo the assistant made a voice call to a real human, had a conversation, navigated your google calendar and made the appointment. All without any additional human interaction. I remember being stunned. But Google Assistant still cannot work this way.
Lame and unnecessary at best. Dishonest/duplicitous at worst.
Yes, and people no longer let them get away with this because its competitors are doing it better (just compare that with Mistral magnet links lol)
A relevant anecdote here is that the first Apple iPhone event was staged as well. If Gemini was just a tweet to the API I doubt it would have had the same 'effect' as Sama's first ChatGPT tweet.
But I agree, such an agressive PR video was unnecessary in retrospect. I personally found the other smaller demo videos more impressive (where it seemed they were using some internal tool to do those).
I agree with this article for the most part. It is just so amazing, almost comical, how Google keeps doing all these unforced errors.
Why use different comparisons when they still beat GPT-4 on CoT@32? All it did was make people look at the 5-shot numbers where GPT-4 DID do better. I doubt it would have gotten nearly as much attention if it wasn’t apples to oranges. Also, why so much emphasis on MMLU when people have found 2-3% of the questions on that benchmark have errors in them? When they basically beat all the multi-modal benchmarks out there against dedicated models?
And the demo was just disappointing, after reading the blog. I assumed the demo was a bit fluffed up, but they didn’t use audio at all? It almost invites conspiracy because why wouldn’t they demo audio even once unless there is something wrong?
Like if they had given the model audio files and sequences of images from a video and it came back with audio files and images, I would have been ok with that having been manual instead of live. You could piece that together in a UI later. But it was all text and even the music part was passing a query back to something else to search for a song. Vs it being very easy to assume it created some music on its own.
Even more egregious in some ways is that the prompts weren’t just hand-holding a bit more than shown but full on few-shotted. Giving it the answers to the first 2 cup swaps and a way to represent the ball location and it just figured out the third?
As was said, the fact that it is GPT-4 level or a bit better is a great achievement. It’s just so strange that the only thing the demo did differently than GPT-4 was to generate some images without an external model.
They could have saved themselves sooo many problems by just being a tiny bit more direct about things.
"They could have saved themselves sooo many problems by just being a tiny bit more direct about things." 100% agree Shawn!
I agree with Domingo as regards Gary Marcus. I’ve unsubscribed. There’s a gleeful ambulance-chasing quality to his writing, a tawdry delight in someone else’s suffering. It also reads as unattached to any serious content: the breathless reporting in a National Enquirer story about someone’s private life, turning it into a self-serving public soap opera. “Inquiring minds want to know!”
Grady Booch (which I admire a lot for his 80's work on OOP) has joined the Gary Marcus cult. They have become boring, always repeating the same songs.
Well, I don't agree with everything Marcus or Booch say but they provide us with important perspectives for a space that would be heavily one-sided otherwise.
They are stuck. I prefer people intellectually open to challenge their own beliefs: http://rodneybrooks.com/three-things-that-llms-have-made-us-rethink/
Agreed! And thanks for the link, I like Brooks more as well.
I agree with both things Phil. I have to set up my own website (have wanting to do it for some time now). I do periodical exports of everything, though. Just in case!