Discussion about this post

User's avatar
Sol's avatar

As shown on the screenspot pro benchmark, Gemini seems to be better than the other LLMS at image understanding. I think this is why it’s good at ARC. Even though you feed the LLMs ARC1 in text, the visualisation helps. Like when I see the puzzles it becomes a lot easier, so maybe, by allowing Gemini a better ‘visual brain’ it also does better.

As far as I know nano-banana is the best AI image generators/editor. AND Google is doing interesting video -> playable-world stuff. My theory is that they have figured out a way for better knowledge transfer between these models rather than just bolting image models and LLMS together.

One other question that stands out is how is Anthropic still so good at coding? Even with all these advances anthropic stays ahead in coding benchmarks and often seems to be the preferred choice for developers.

Expand full comment

No posts