9 Comments

About embodied robotics, I don't have clear how to apply LLM on them. It was particularly interested the recent presentation of RT-2 by Google Deepmind. However, it was focused on the human and machine interaction.

In terms of autonomous navigation, there's a clear tendency of fields like "curiosity based exploration", "lifelong learning", "motivational learning", which help them obtain long tasks and missions on their own, being able to overcome obstacles that are not forecasted in the scenario.

Some promising lines of research are presented in https://www.sciencedirect.com/science/article/abs/pii/S0925231220317537 or this http://proceedings.mlr.press/v97/jaques19a/jaques19a.pdf

Expand full comment
Dec 10, 2023Liked by Alberto Romero

I happen to work with a humanoid Robot manufacturer (they prefer the term “Android”) that intends to add LLM to their product. We asked “why”? We like the idea but does it add functional or experiential value? As an integrator we determined that it does. Natural language exchanges between the ‘bot in the field or the ‘bot and a control operator while the unit is working is a substantial “upgrade” to performance and experience. We also have a preliminary evaluation that if some people are weirded out by a humanoid robot the addition of conversation only makes it worse. So far that is a low percentage but if they are high influencers, well….

Expand full comment
author

Thanks for the links Julian!

Expand full comment

Well written. Thank you.

Expand full comment

The consensus view among people who have read the technical paper closely appears to be that the points of comparison have been chosen very judiciously to favour Gemini, and Gemini may be only an incremental improvement, or perhaps even no improvement at all. For example, comparing Gemini to GPT-4 under the same conditions (five-shot) on the the MMLU- a massive test of knowledge and to a certain degree reasoning across various intellectual domains- results in GPT-4 outperforming Gemini by almost 3%, certainly nothing to sneeze at.

If it does turn out that Gemini is "better" in some sense, it won't be because of its one-shot capacity for answering linguistic questions straight up, at least not vis a vis this initial offering. It may be that it's multi-modal capabilities are a significant advance, I don't have a lot of knowledge of this domain.

Of course, who knows, maybe the proof will be in the eating once they release the model, and certainly, a little competition rarely hurt any developing science (which from my point of view, as an AI Apocalypse Doomer and also a person concerned about potential job-loss effects, may be a terrible thing).

Expand full comment
author
Dec 7, 2023·edited Dec 7, 2023Author

That's true but only when they're compared on the 5-shot setting. When Gemini and GPT-4 are *both* compared under the CoT@32 setting, then Gemini comes out on top with a gap of ~2.7% (I realized that the blog post images show an apples-to-oranges comparison, i.e. GPT-4 on 5-shot vs Gemini on CoT@32. I will change them tomorrow for the ones on the technical report, which are correct).

This reveals that which one is better depends heavily on the prompt setting, which is an interesting observation in itself. However, it's worth noting that both, GPT-4 and Gemini are better under CoT@32 than 5-shot, so it makes sense that it's the preferred setting to make the comparison.

Expand full comment

You are right - so we have to wait and see. The last try went also quite bad: https://bard.google.com/chat/498fde1a8b048ba0

Expand full comment

I couldn't believe how stupid the answers were to simple bike acceleration problems: https://bard.google.com/chat/4ac2be7feb8ce8c2 and https://bard.google.com/chat/7010325b753fc56b

Expand full comment
author

Remember that's Gemini Pro (~GPT-3.5), not Ultra (>GPT-4)

Expand full comment