Microsoft vs Google: Will Language Models Overtake Search Engines?
If Microsoft renames Bing I'd consider it...
Rumors say Microsoft has started a project that promises to impact the tech landscape in the coming years. As The Verge’s Tom Warren writes, the software giant is “reportedly planning to launch a version of Bing that uses ChatGPT.” For the first time in twenty years, search is about to experience a revolution.
The unsurprising news comes a month after Microsoft's partner, OpenAI, released ChatGPT, a powerful language model (LM) optimized for dialogue I deemed the world’s best chatbot—although probably not for long.
After ChatGPT’s release on November 30, people quickly realized that its existence entails a non-negligible probability that LMs could, in the short term, overtake traditional search engines (SE) as the primary means of online information retrieval. By extension, this implies that Google’s two-decade hegemony in the search space may be in jeopardy.
Microsoft’s (yet unofficial) announcement has reignited the debate of LMs vs SEs and, although no one knows for sure how events will unfold, there’s pretty much consensus on one thing; in one way or another, LMs and search will most likely be inseparable parts of a greater whole in the future.
Like gravity pulls us to the ground, technology flows spontaneously—as used in thermodynamics—in one direction: making our lives easier. LMs are more intuitive and interfacing with them comes naturally to us. That SEs will either change or die seems to be an inevitable outcome.
I know, that sounds like your typical generic unfalsifiable prediction. Thankfully, we can shed light on some of the unknowns: is ChatGPT a real threat to Google? Can Microsoft overthrow Google? Can the search giant react adequately? Which company will come up on top in the end? Are LMs going to replace search? Complement it? In which aspects will LMs improve or degrade search? How and when will all this happen?
Let’s try to answer some of those questions and find out how LMs and SEs will interact in the future, what Microsoft, Google, and OpenAI have to say about all this, and how I think events will unfold in the coming months/years.
Language models and search engines
A Twitter user that goes by the name josh called it first the day ChatGPT went public: “Google is done.” Others, like the now ex-Twitter George Hotz, agreed—but not everyone jumped to the same conclusion.
Professor Gary Marcus countered George Hotz’s take with empirical evidence and Google’s François Chollet pointed to similar issues: “search is a search problem, not a generation problem:”
I agree with Marcus and Chollet. LMs aren’t well-suited to outperform SEs by themselves. Yet, SEs could improve significantly to the point that those which don’t integrate LM-based features would become obsolete.
If we accept this hypothetical it’s easy to see that the best-positioned company to combine LMs and search is Google. Not OpenAI. Not Microsoft. Google’s world-leading presence in both fields is unmatched even separately. Despite OpenAI’s popularity, GPT-3, ChatGPT, and all similar models are based on Google’s tech and Google’s SE takes 4/5 of the market share.
If the company doesn’t ship too many AI products it’s because of its “institutional inertia,” as Stability’s Emad Mostaque says. Google is undoubtedly the world's leading AI company in terms of research in depth and breadth.
However, as popular investor Balaji Srinivasan explains, research and production are two different beasts: Google can’t take the risk it’d entail to restructure its SE from the ground up to power it up with an LM. The company has shipped new search features and incremental changes throughout the years, but nothing revolutionary, like what Microsoft—and others like Perplexity, You, and Neeva—seem to be doing.
My take on LMs vs SEs can be summarized as follows: “Search engines are much more limited but better equipped for the task [of searching the web] … [but] I don’t think the [traditional] search engine will survive LMs.” The key word here—which I didn’t include in the original piece—is “traditional.”
SEs will be around but they’ll be so different as to be unrecognizable. LMs will most likely be the reason why.
(I won’t go into the details of whether it’s a good idea to integrate LMs into SEs. Gary Marcus has a great piece on that and I pretty much fully agree with him: “Is ChatGPT Really a “Code Red” for Google Search?”)
Microsoft vs Google: The tech battle of the ages
Microsoft’s $1 billion investment in OpenAI—as well as the exclusive license to some of the latter’s AI stack they got in return—was a clear signal of its interest in the field. It’s no surprise that they’re planning to integrate DALL-E and ChatGPT into their services. An enhanced Bing SE could “challenge Google’s dominance,” as Tom Warren writes.
The idea, of course, isn’t to take an LM and replace the SE, but to complement it. A Microsoft spokesperson told Bloomberg that “conversational and contextual replies to users’ queries will win over search users by supplying better-quality answers beyond links.”
Not unlike Google, Microsoft knows very well that LMs aren’t as reliable as SEs. The company will have to evaluate the risk of implementing features people can’t rely on 100% against the potential upside in its battle vs Google. Microsoft is “weighing … the chatbot’s accuracy [and] the initial release may be a limited test to a narrow group of users.” Sounds like a reasonable start.
However, if anyone knows better than Microsoft what LMs can and can’t do, that’s Google. In a 2021 paper—published much earlier than ChatGPT was even the seed of an idea—Google researchers explored the question of using LMs to “rethink[] search.”
They considered whether it could be done and, more importantly, whether it should be done:
“Classical information retrieval systems [i.e. traditional SEs] do not answer information needs directly, but instead provide references to (hopefully authoritative) answers.
…
Pre-trained language models, by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are dilettantes rather than domain experts—they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over.”
Google’s eventual conclusion is that using a ChatGPT-like system to turbocharge its SE would entail a high “reputational risk.” CEO Sundar Pichai and AI lead Jeff Dean told CNBC that “the cost if something goes wrong would be greater [than for OpenAI] because people have to trust the answers they get from Google.”
Google announced LaMDA (but didn’t release it) in May 2021. Given that LaMDA is comparable to ChatGPT—if not better, as Blake Lemoine claims—it’s reasonable to question why Google didn’t leverage it to avoid a threat like OpenAI’s. Balaji Srinivasan predicted it was because the company didn’t have enough “risk budget,” and, as it turns out, he was right.
A large company like Google that provides billions of users (not a few million like OpenAI) with a presumably high-reliability service like Google search can’t simply dare to embed a non-trustable, unrigorously-tested new tech just because it seems to be the future and everyone’s crazy about it.
But Google execs aren’t fools. They know ChatGPT, owned by a much smaller, less risk-averse company, is truly a threat—much more so when a direct competitor like Microsoft owns a high stake in it. That’s why they declared ChatGPT a “code red,” as reported by the New York Times:
“… with a new kind of chat bot technology poised to reinvent or even replace traditional search engines, Google could face the first serious threat to its main search business. One Google executive described the efforts as make or break for Google’s future.
…
Google must wade into the fray or the industry could move on without it ...”
As the situation stands now, Google faces Microsoft—who’s a strong direct competitor in the search space—and OpenAI—who owns comparable AI tech, even if with a much tighter budget—while, at the same time, trying to balance the reputational risk that LMs entail due to their intrinsic unreliability and the evident threat they pose in the hands of a lower risk-averse startup.
As Pichai says, Google has “to be bold and responsible” and find a compromise solution. “It’s super important we get this right,” concludes Dean.
My predictions on how the events will unfold
Given the current circumstances, I think there are three key points to focus on to understand what and how will happen. First, who is Google really competing against to report “reputational risk” as the main obstacle going forward? Second, it’s possible to “get it right” with LMs and current AI safety/alignment techniques? Third, even if it can be done and companies conclude it should be done, can a viable business model stem from it?
Google’s true enemy
When I read Pichai and Dean’s arguments on the ChatGPT threat I noticed something weird: They seemed to be implying that Google is competing against OpenAI. Indeed, OpenAI’s tech is what Google execs deemed “code red”, but I don’t think OpenAI is a threat to Google—that’s the wrong framing.
On the one hand, OpenAI can’t compete against Google regarding technological research and AI expertise. Google’s budget and talent are far greater than OpenAI’s—even if just by sheer numbers. As Emad Mostaque argues:
On the other hand, OpenAI doesn’t want to compete against Google.
OpenAI’s reputational risk is much lower than Google’s because it’s a rather new and small company that provides services to a few million users at best whereas estimates claim that Google search is used by 4+ billion people worldwide and amounts to a striking 84% of the market share.
However, OpenAI’s reported mission is to build a beneficial artificial general intelligence (AGI). Why would they bother risking an arguably superior purpose by competing against a much larger company in a space that doesn’t overlap at all with their main goal?
Even if OpenAI was primarily about financial profit (and it’s undeniable that dethroning Google would lead to an absurdly successful money-making machine) the company has better options that don’t conflict with its long-term goal, like setting up paid subscriptions or pay-to-use models, as they do now (e.g. GPT-3 and DALL-E).
Google’s true competitor in terms of influence, size, budget, and most importantly, goals, is Microsoft. But when seeing it this way the argument of Google having to face a higher reputational risk breaks apart. The number of Microsoft users is comparable to Google’s and the company has to also take care of its carefully built reputation—as its decision to shut down racist chatbot Tay in 2016 suggests.
One point that favors the “reputational risk” view is that Microsoft’s search market share is incomparably smaller than Google’s. However, if Microsoft’s attempt at combining LMs and search succeeds, they’d increase the user count and thus the reputational risk would grow accordingly.
The question that’s left for Microsoft to answer is whether they’re willing to make the decision to integrate ChatGPT into Bing, risking its reputation as users grow attracted by the new service’s greater capabilities, just to have a chance at dethroning Google.
What does Google plan to do about it?
“Getting it right” is an unfeasible nice-sounding goal
Jeff Dean’s explanation that Google is waiting to “get it right” reminded me of my more-hopeful-than-realistic takes on the importance of embedding ethical principles into AI models and fighting misinformation. I think those efforts are paramount and I will keep arguing so but I can see how, while they’re honorable in theory, they become almost unbearably hard in practice.
As I see it, the only way to get LMs right in the sense Dean is referring to here is to redefine, redesign and rebuild them completely. If, as Gary Marcus suggests, they’re simply ill-equipped to be truthful, factual, reliable, and neutral, there’s no amount of ad-hoc guardrails that will manage to contain the wicked essence that stems from LMs’ data-fed guts.
It may be that, as soon as companies try to combine an SE with an LM, all the key features that make the former reliable become poisoned by the LMs’ lack of functional design. Marcus showed abundant evidence of this in his analysis of Perplexity, Neeva, and You. His conclusion keeps the hope alive for the future but settles the debate for now:
“About the best thing I can say is that Perplexity.ai and you.com’s chat are genuinely exploring an interesting idea: hybrids that combine classical search engines with large language models, possibly allowing for swifter updates. But there’s still a ton of work left to do, in properly integrating the two, classical search and large language models.”
Another question is whether current state-of-the-art techniques in AI alignment are sufficiently good or directed toward the right goals. Scott Alexander wrote a nice essay on the limitations of reinforcement learning through human feedback (RLHF) which ChatGPT uses and seems to be the only way companies can hold back LMs’ behavioral deficiencies.
Alexander puts it bluntly: “RLHF doesn’t work very well.” As I wrote in my ChatGPT piece, “people can ‘easily’ pass its filters and it’s susceptible to prompt injections.” RLHF-optimized models can also enter a loop of conflicting priorities. Alexander says that “punishing unhelpful answers will make the AI more likely to give false ones; punishing false answers will make the AI more likely to give offensive ones; and so on.” It may not be possible to make LMs generate helpful, truthful, and non-offensive responses at the same time.
Additionally, if LM's improvement with RLHF is asymptotic, as Alexander ponders, we won’t “get it right” with it—ever. However, because it’s the best-performing method, companies don’t have the incentive to spend time and resources on researching another happy idea that may—or may not—work as well as RLHF.
If all of the above turns out to be right—LMs are intrinsically ill-equipped for search and the best techniques we have access to are mediocre—there won’t be a “getting it right” moment in the short term, as Jeff Dean wishes—and Google needs.
Google will face a dilemma: On the one hand, they could let Microsoft take the lead to assume the “reputational risk” with a non-zero probability of eventually redefining the future of search and becoming the next hegemon in the space. On the other, they could decide “getting it right” is too ambitious a goal, assuming the reputational risk themselves with a mix of PR moves (e.g. “we tried as hard as possible”) and half-baked features (e.g. “it works better now”)—but eventually keeping the lead both in AI and search, and surviving to fight a couple of decades more.
If it comes down to Google having to choose between its reputation or its life, I think we all know what will happen.
Is LM-powered search compatible with making money?
But then there’s the last part of the challenge, which, if everything turns out well for Google, will be an inevitable obstacle. Microsoft won’t be free from this either. If search is profitable through an ad business model, how can companies monetize LM-powered search when there would be no need to click anything?
Could Google, in case it takes the lead, find a way to create a moat around LM-powered search while at the same time building a novel viable business model around LM + search? Google’s PageRank algorithm combined with the ad model was an unbeatable combo twenty years ago. Can Google repeat the feat?
Of course, goes without saying that it’d be awesome if we could enjoy an ad-free internet. However, the alternative is transforming search into a paid service. Are people willing to accept such an anti-inertial change?
One alternative possibility I see (that may be just a crazy hypothesis) is that Microsoft could decide to commoditize search by making it a non-profitable service (without ads or any other form of monetization) with the sole goal of effectively removing Google from the map in a matter of years.
However, there are other issues that could prevent Microsoft from trying this. As Marcus explains in his article, current search is much cheaper than LMs and much faster. That means less profit which companies are allergic to. Microsoft would be draining money while fighting Google at the same time, which would put both against the ropes in what would feel like a very risky business operation.
Whatever happens in the end it’s pretty clear that the search space, which has been pretty much stagnant for two decades, is about to live through an inflection point like no other before.
Thanks for yet another great piece, Alberto! As someone working with SEO and content, I've been wondering how ongoing AI trends will affect the field, especially if we get to a point where search engines can reliably consolidate and deliver answers across multiple sources.
What will that mean for the entire "thought leadership" approach where companies try to become knowledge hubs for topics and keywords related to their products and services. Today, the reward is "free" organic traffic...but what if people no longer have the incentive to click through to the company site because LM+SE combo becomes much better at delivering thorough, accurate, and neutral answers?
I'm very curious to see how this plays out.
Very interesting analysis. The problem that I keep coming back to is that search ads constitute something like 80% of Google's revenue (and a smaller, albeit still large, percentage of Alphabet's as a whole). This means that MSFT, if it wanted to, would only have to bleed off a small amount of Google's search revenue to seriously harm Google's ability to generate revenue and so finance its operations. On the other hand, Google (and its parent, Alphabet) has a strong balance sheet, so it could finance a war of attrition with MSFT for a while. But its stock would tank, and so its ability to recruit and retain employees would decline. MSFT, on the other hand, has a more diversified revenue stream, and if it decides that it's willing to finance losses on Bing for a while, it won't be as harmed by that decision. Of course, I am looking at this through a financial, not technological, lens. But I very much see this as an Innovator's Dilemma kind of problem, as articulated by Clayton Christensen in his book of the same name.