11 Comments

Very interesting post. I just happened to read a twitter thread from Steven Sinofksy which approaches this from a somewhat different angle: https://twitter.com/stevesi/status/1625622236043542530 . He seems to view the big tech companies (Google, MSFT, etc.) as inherently risk averse, and so he thinks that the most interesting use cases for LLMs (or AI more generally) won't come from them. Rather, he expects innovation to come from smaller, less risk averse, companies.

Expand full comment

Agreed. That's why Microsoft invested in OpenAI and Google in DeepMind and now Anthropic instead of doing it themselves from the very beginning. However, they can't simply watch from the fences. Microsoft has made a strong move and Google will respond. Even for risk-averse big tech companies, taking a risk may be, in certain scenarios, a must to ensure dominance or, at least, a solid chance at survival.

Expand full comment

That sounds right to me. I had a net tech startup back in the nineties, which I built by myself over about 5 years. A company with 300 employees bought me out for a big pile of cash, and it only took them 6 months to completely trash the project they had purchased.

The bigger a company is the more time they wind up spending organizing and reorganizing, having meetings, plotting strategies, changing their mind, switching managers, hiring staff, getting distracted by other projects etc.

Expand full comment

to be precise: the mistake of bard cost google shareholders, not google

Expand full comment

Thanks - you are reminding us of the many things everyone could have known and too few people did, and how little the facts mattered in the end. That happens all the time it seems, and it is always sobering. Perhaps there is a little to be added though. I had a look at the Paris announcement and analyzed it here: https://sentientsyllabus.substack.com/p/reading-between-the-lines - it was striking how little Bard even mattered there. Although perhaps not: we know Google through search, but search is not their business. Their business is selling ads, and collecting the data to be able to convince their clients that the ads will be placed more effectively through Google than through anyone else. Search is just the most visible tool among the many they give us for free to use, to achieve that goal. And once you think about it that way, that's when Paris makes sense: it's all about data, and integration - multimodal, multimedial - and that's a domain where no one will quickly be able to surpass them. They're not even putting their strongest system into search, but using LaMDA light. Which is smart, since it will be much cheaper to run and that immediately will help the bottom line. (And TPU vs. GPU is another one of their strengths.)

So, Microsoft - their business is not search either, and absolutely not selling ads: it is Office, it is Azure, and it is Windows. Being able to eat into Google's search market will not provide revenues, in on of itself - but selling ads would. Is that what they are planning to do? I'm not even sure what the long game is here. Though one priority would be to add generative text to Office, and to move people away from Google docs and sheets and slides. But here's the catch: that would probably require a tool that comes across as a bit more mature.

I think you are spot on mentioning how absurd the response of the stock market seems to be - except: would that not be exactly what you would predict according to "buy the rumor, sell the news"? GOOG has been gaining since about the time that Microsoft announced it would partner with OpenAI. That's not what you would expect, isn't it? Then they crashed when they announced what they would actually do: focus on data, focus on their clients, focus on their strengths. That looks like good business sense, doesn't it? Yes, but it doesn't matter because if you've been shorting GOOG, that's when you move. And no one notices because everyone is wondering what an exoplanet is. Hey, at least they learn something interesting.

So - it was fun reading those events between the lines. And I like what I read here - I just subscribed. I write about what this all means for academia, just click my profile, it's all here on Substack. Cheers!

Expand full comment

Why did Google choose LaMDA to be the foundation for Bard - it is over a year and half old?



Last year, Google Research described newer LLMs - PaLM and Minerva - with reasoning abilities much improved over GPT-3 and their own LaMDA. Perhaps these reasoning LLMs would be more reliable.



Or they might have chosen Sparrow from DeepMind. Sparrow is a smaller LLM (70 billion parameters) compared to GPT-3 (175B) that DeepMind claims outperforms GPT-3 and further uses the internet to verify its answers.



Finally, I see new papers this year from Google Research describing a memory-augmented LLM and an update to the 5 year old Transformer design they call “Continual HyperTransformer”. Like the papers last year that preceded the announcement of PaLM and Minerva, I wonder if Google Research is not busy working on their latest greatest AI - an AI that may indeed be a competent replacement for Google’s search engine.


Expand full comment

Hi Eugene, I'd say the response to your first question is in part that OpenAI is also using old tech, and in part that they don't really need anything better than LaMDA. Also, I wouldn't say PaLM has "reasoning abilities." It's better at performing on reasoning benchmarks, which is not necessarily synonymous with that. And they're not reliable enough.

I think the focus is using LMs to enhance SEs, not replace them. I don't see LMs replacing SEs for the foreseeable future, even if companies manage to make them reliable--simply because they're designed for different purposes. In any case, I'd guess going forward companies will implement novel insights into those technologies, as you say.

Expand full comment

Thanks for your reply. I like to discuss this updated document from Google titled “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”:



https://arxiv.org/pdf/2201.11903.pdf



First, take a look at Figure 4 on page 5. It compares LaMDA, GPT-3 and PaLM on math problems using both standard prompting and chain-of-thought prompting. You can see that LaMDA does significantly worse that GPT-3 where as PaLM does slightly better than GPT-3. So by basing BARD on LaMDA they are choosing a model that will perform poorer that GPT-3, at least on these types of problems. Users are going to compare the new Bing with Bard and may find that Bing is better. Not good for Google.



Next, I want to discuss the word “reasoning”. Note the title of this Google article, the first sentence in the Abstract and the first sentence in the second paragraph of the Abstract:



 “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”


“…the ability of large language models to perform complex reasoning”


“…improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks”



Google is discussing “reasoning” and “reasoning tasks”. Reasoning benchmarks are how they measure reasoning ability.

In July 2022, Google reported that Minerva did “over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and [found] that the model can correctly answer nearly a third of them”.



https://arxiv.org/pdf/2206.14858.pdf



Page 8 of this paper says: “we evaluated Minerva 62B on the National Math Exam in Poland and found that it achieves a score of 57%, which happened to be the national average in 2021. The 540B model achieves 65%.”



So yes it is not perfect, but it appears to be at level of an average high schooler and perhaps undergraduate in math and science. If we acknowledge that the average high schooler can reason, then we must admit that Minerva is also able to reason. And that was Minerva 6 months ago, Google AI researchers have undoubtedly been working on improving these models.

Thanks for your articles and discussion!

Gene

Expand full comment

I am writing based on my memory so I could be wrong. If I am not wrong the order of events is as follows: OpenAI released some limited access to Dall-E2, but shortly after stable diffusion open up publicly. Soon after, stable diffusion is on everyone mouth and Dall-E2 faded away. Open AI perhaps learned from this episode and made chatgpt publicly available all at once and it did capture the imagination of the public.

Perhaps, beside internal concerns about reputational risk, one must also observed the recent rise of a ‘open- source’ model. It wasn’t viewed with as much criticism as in the past and people seems to be able to accept some flaws so long as its free and open. On the other side, to keep things proprietary, the cost will be rising as it means your stuff needs to be significantly better than the ‘open’ one, and given the fast iteration of the ‘open-source’ model, its getting more difficult.

So Google might want to make their model open earlier or at least a scaled down version publicly available,sooner rather than later. Having said that , I think they might have allow public access to their protein folding AI but that one will hardly capture any public imagination as its rather niche. But I will argue that could actually bring significantly more benefit to mankind.

Expand full comment

I find the decision paralysis and malaise at Google to be very severe.

They aren't even able to keep basic promises that their CEO makes

https://www.theverge.com/2022/11/2/23434361/google-text-to-image-ai-model-imagen-test-kitchen-app

Last November, Sundar Pichai said Google was going to make Imagen (Their SOTA image model) available for closed testing.

3 months later still nowhere to be seen, Imagen is already irrelevant because of Midjourney V4 and latest SD improvements by the open source community (ControlNet is a massive leap forward again)

The problem with the bard announcement, was not the mistake the model makes, but that Google pushed it out so hastily, it couldn't even fact check their own trailer and announcement. Google had years of head start, yet utterly wasted it all.

AI products do seem to have strong network effects, the more users that use it, the better the AI gets. SD is the first example, where the open source community grew like weeds after rain, now there's multiple sites sharing custom-fine-tuned art models, fundamental innovations in the model by the community (Rather than by stability.ai) month after month.

ChatGPT/Bing Sydney are not open sourced, but they are quickly becoming the industry standard thanks to the free access and incredible media domination. I see big researchers in AI all swarming towards OpenAI, downstream AI companies (Those specialising in particular industries) choosing to partner with OpenAI by default. Large and rapidly growing reddit fanbases for Bing Search and her sassy personality,

There is now incredible scrutiny on Bard, if it feels less intelligent than Bing/Sydney, it'll be absolutely mocked, and become even more free publicity for Bing. Even if Bing only rips out 15% of the search market in the end, that's 15% off of Google's main revenue stream, a huge blow.

So Google is absolutely not advantaged by its second-mover status.

Expand full comment

I wonder will it be beneficial for the chat functions be based on the search results from the keywords in the queries. Google can already answer queries pretty accurately, so perhaps just an expansion of that. Perhaps that could reduce the impact of hallucination

The fact that LLM make up stories , to me, seems like a big issues to monetise using advertisements. Those advertisers links could easily be place near or within a made up facts, That could be harmful in terms of being associated to negative or false or exaggerated statements

Lastly, now with the whole tech community talking about bing chat challenging Google, I wonder how will the antitrust guys thinks. I suppose at least the notion that lack of players= lack of competitions should be seriously question

Expand full comment