8 Comments

Lately, people treat new AI chatbots like kids who torture their new Christmas toys. They poke and stretch them to see if, and when, they’ll break. We endlessly pick at language models to uncover flaws. Come on. Do we gain anything meaningful from this constant prodding? Or does it only briefly satisfy our petty urges?

We’ve become spoiled children surrounded by AI gifts. Obsessing over their imperfections serves little purpose, at least short-term. We would benefit more from marveling at their super-powers before investigating their failures.

Let's wait for rigorous benchmarks to decide if AI Santa truly delivered.

Expand full comment

This is a good writeup. My very early thoughts are that the majority of the early reactions are from power users, who are likely to ask it to do the logic tricks or deep iterative tasks (the areas where Ultra seems weakest). Google is playing the long game and a different style/flavor of GPT-4 level ability + integration with Google data may be good enough for most.

Expand full comment

from https://gemini.google.com/faq "What is Gemini Advanced?

Gemini Advanced gives you access to Google’s most capable AI model, Ultra 1.0. With Ultra 1.0, it’s far more capable at highly complex tasks like coding, logical reasoning, following nuanced instructions, and creative collaboration. Plus, it will continue to expand with new and exclusive features. Gemini Advanced gives you access to Ultra 1.0, though we might occasionally route certain prompts to other models. " - you think it is routing some prompts to other models like pro?

Expand full comment

On February 13, 2024 ChatGPT 4 is far better than Gemini Ultra 1.0 answering questions of knowledge. Ultra 1.0 is bad in that topic.

Expand full comment

Nice summary of all of the cognitive biases impacting the generic public's evaluation of Google Gemini!

At least we can be safe no one will fall for the automation bias with Gemini lol. I expect everything it says to be wrong so I proof read it a lot more than perplexity or Claude.

Expand full comment

A few questions I am wondering about:

First, does Gemini have access to the internet to find and verify answers? Scanning their 84 page technical report I don’t see anything about this. (https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf)

Second, how many parameters does the Ultra use? How long did it take to train the Ultra and at what costs?

Third: Is its architecture a single unified transformer model or a “Mixture of Experts” as it is rumored that GPT-4 is and Mixtral says it is? I see in Table 1 in their technical report they just show a box labeled “transformer”, but that may just be a black box hiding what is going on inside. Section 2 says “Gemini models build on top of Transformer decoders that are enhanced with improvements in architecture and model optimization … employing efficient attention mechanisms”. The plural “Transfomer decoders” suggest multiple Transformers are inside the black box. The Transformer described in the Dec 2017 “Attention is All You Need” has an encoder and a decoder stack - not multiple decoders.


Fourth: Demis Hassabis said last June when announcing Gemini: “At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models.”


The technical report says in 5.1.7: “Gemini models can also be combined with additional techniques such as search and tool-use to create powerful reasoning systems that can tackle more complex multi-step problems. One example of such a system is AlphaCode 2, a new state-of-the-art agent that excels at solving competitive programming problems. AlphaCode 2 uses a specialized version of Gemini Pro – tuned on competitive programming data … – to conduct a massive search over the space of possible programs. This is followed by a tailored filtering, clustering and reranking mechanism.”



This all suggest that Gemini Ultra by itself is NOT using the “strengths of AlphaGo-type systems”. Instead an “agent” must be built around it, like they did making AlphaCode 2!

This is NOT like GPT-4 which appears to be itself an “agent” that invokes experts - either internal or external via plugins.



How well will Gemini Advanced compete with GPT-4 then?

What do you think?

Expand full comment

Great write-up. There's a good chance that reality is a mix of those things. People who're more likely to post negatively might also be the ones with a bag of tricky questions at the ready, throwing the toughest puzzles at "unprepared" Gemini to trip it up. (As Chase points out in a separate comment.)

If you're going to benchmark a model's performance on how well it does at questions at the most difficult extreme of the spectrum, you won't be giving it a fair trial.

Having said that, I'd like to test drive it for myself soon.

Expand full comment