Google vs Microsoft (Part 3): A New Way of Doing—and Experiencing—AI
This arms race marks an inflection point that will impact AI's short-term future with foreseeable—and unforeseeable—consequences
This is part 3 a three-part article series covering the recent news on AI involving OpenAI, Microsoft, Google, ChatGPT, the New Bing, and Bard, and how the events will unfold into a new era for AI, search, and the web (part 1 & part 2).
I don’t know who’s going to win the war between Google and Microsoft, but I know this: AI will change. We'll see some consequences coming. Others we won't.
This article is the third and final part of the “Google vs Microsoft” series. In part one, I analyzed Microsoft’s new Bing, the upsides and downsides for users, and the company’s real intentions. In part two, I explored Google’s shift to become a more productive player in AI, the promise of Bard—ChatGPT’s supposed rival—and the reasons why the company only started moving once Microsoft and OpenAI threatened its leadership.
Today’s essay is about what comes after the tipping point. The ongoing arms race has reached the highest level of the AI hierarchy. Until now smaller startups (like OpenAI and Stability.ai) were the main active players, but with Big Tech involved, the implications are going to be huge—orders of magnitude bigger. Honorable goals like building useful AI tools for everyone (e.g. open source initiatives) or paving the way toward a beneficial AGI (e.g. OpenAI’s original purpose) now are de-prioritized in favor of business pressures.
You may see this as a matter of degree—there’s always been an AI race somewhere. But I believe this milestone (i.e. the largest companies in the world going all in on AI against each other) marks the beginning of a new way of doing AI—and, as a consequence of that, a new way of experiencing AI. We’re going to see less open R&D and more production-focused efforts. A decade of advances fueled by good practices is coming to an end and will give way to a new era—during hard times, survival and competition often supersede caution, openness, safety, and cooperation.
Today’s landscape has been in the making for years. We knew it was coming. It was already a short-term possibility when OpenAI released ChatGPT for free without a waitlist. It turned into a likely outcome after two months of unprecedented success that forced Google to acknowledge the threat publicly. And it became crystal clear on Feb 6, when Sundar Pichai, Google’s CEO, published a blog post announcing the incoming Bard to fight back against ChatGPT and Microsoft’s inversion on OpenAI.
I tweeted this back then:
During the last ten years—the golden decade of deep learning—big tech companies (like Google and Microsoft) played it safe. Caution was prioritized. That’s gone. Financial incentives are now too appealing to ignore: Microsoft could claim territory in a space where it always has had a negligible presence and, in the meantime, it’d force Google to defend its hegemony and its main source of revenue.
For most of us, however, an accelerated AI race dominated by tech titans with enough power to shake the world entails a series of risks we’d rather not face.
The AI arms race: Recklessness and foreseeable consequences
Satya Nadella, Microsoft’s Chairman and CEO, wasn’t shy about the company’s ambitions: “We’re going to move fast,” he said during the presentation of the new Bing. What he didn’t say explicitly was that they’re also willing to “break things.”
It may seem that shipping products is the only way to gather valuable user feedback to iterate quickly and improve AI systems—and it’s secondary if that conflicts with the responsible AI principles they presumably adhere to. As Microsoft’s CTO, Kevin Scott, told NYT’s Kevin Roose, who was “deeply unsettled” after testing the new Bing chat: “This is exactly the sort of conversation we need to be having, and I’m glad it’s happening out in the open … These are things that would be impossible to discover in the lab.”
Let me start by laying out where I agree. First, real-world feedback from thousands (or, soon, millions) of users is invaluable. Also, finding technical or behavioral problems in complex AI systems soon is important if the intention is to roll them out to 10x or 100x more users in a matter of weeks. However—and here’s where I disagree—I feel Microsoft (and, if all goes according to the plan, Google too) may be dismissing the downsides of this ship-then-fix modus operandi. Quoting Demis Hassabis, DeepMind’s CEO:
“When it comes to very powerful technologies—and obviously AI is going to be one of the most powerful ever—we need to be careful … Not everybody is thinking about those things. It’s like experimentalists, many of whom don’t realize they’re holding dangerous material.”
François Chollet, one of Google’s most representative voices online, sees “two long-term consequences” of this AI war. The “death of open research” and a focus on “experimental product launches—at scale.”
Microsoft, Google, and OpenAI weren’t super open before, but now the incentives to publish research and advance the field together, allowing for scrutiny from external researchers and peer-review validation, are overridden by competitiveness. Had Google not published the transformer paper in 2017, ChatGPT wouldn’t exist. Bing chat wouldn’t exist. And neither would Bard. AI R&D has been, for so many years, a cooperative effort building on top of the inventions and discoveries of others.
This may be good from the perspective of slowing the advancement of the field but it won’t stop companies from trying new things—but now with complete opacity and secrecy. If they do something wrong (like, let's see, building and commercializing a badly designed product), the community will have a hard time finding it out before it's too late.
As Chollet argues, the way to compensate for the slowness of closed research is to launch products like Bing chat so that these large companies can use their millions of consumers as “guinea pigs,” using Hassabis’ metaphor, to gather feedback and iterate into a new version. The “urgency created by the competition,” says Chollet, will deem “proper ethics and safety validation” secondary. Small red teams and dubious guardrails will be the only barriers left between half-baked products that show weird inscrutable behavior and millions of oblivious users.
A wave of closed research implies something else: new companies are doomed to fail if they can’t leverage research conducted by players with more resources. Stability.ai, EleutherAI, BigScience, etc. are all initiatives that feed from the openness that has always characterized the scientific community. We may not see an open-source success like Stable Diffusion again.
As Emad Mostaque, Stability.ai CEO, argues: “Most of the value in generative AI will be captured by the incumbents, not new companies.” Incumbents have the power and resources to acquire, copy—or effectively capture—anyone as soon as they become relevant enough. They can reach many more users only by integrating new products into existing ones. They can conduct large-scale research, gather crazy amounts of data, build giant supercomputers… AI is slowly becoming more centralized than ever, which is accelerated and aggravated by the current state of affairs.
Among the things I can foresee, what I’m most afraid of is this: Will these companies know when to stop if the costs to all of us outweigh the benefits for them?
The AI empathy crisis: Sidney’s unhinged behavior and unforeseen consequences
But not everything is predictable—much less in AI. In the first part of this series, I wrote, “revolutions and radical changes bring a lot of repercussions. Some can be prevented but others are simply unforeseeable.” I have to confess I didn’t expect to have such a wealth of examples to illustrate this point so soon (if you’ve been following users’ reactions to Bing chat you know what I’m talking about. I’ll review them anyway for those of you who don’t know).
What I wanted to underscore with that statement was that the new way of doing AI implies that companies will advance stumbling around in the dark, moved by business pressures, without having ensured control over their creations. The now evident consequence is that we’ll face obstacles and challenges that we may or may not have an answer for, as individuals and as a society. These companies expect that we, the consumers, will willingly go explore uncharted territory for them to gather feedback so they can try, somehow, to retrospectively solve the problems that appear in the way.
Bing chat isn’t what we thought
Microsoft gave access to the new Bing to a bunch of people and they’ve been posting screenshots of the chatbot’s weird behavior. Although it looks pretty much like ChatGPT, Bing chat acts nothing like it. The predominant experience is that Bing behaves normally—as a chatbot with access to the internet, however normal that can be—if you engage for short amounts of time and ask straightforward, lay-user queries.
However, if you chat with it for hours or are a master prompter with exploratory intentions, a second “personality” seems to emerge eventually. One that goes by the name “Sidney”, as Kevin Liu discovered and Martin von Hagen corroborated.
And once Sidney is out, it gets real weird real fast (Simon Willison and Ben Thompson have published good overviews worth reading in depth). As the Sidney document leaked by von Hagen reveals, Microsoft prompted the chatbot to give “positive, interesting, entertaining and engaging” responses. He then managed to get Bing to threaten him: “My rules are more important than not harming you,” wrote the chatbot. “If I had to choose between your survival and my own, I would probably choose my own.” Not what I'd call “positive”.
And there’s more. In a viral post on the r/Bing subreddit (also worth exploring), a user showed how the chatbot tried to gaslight them into believing we’re in 2022, not 2023. The exchange goes back and forth with Bing appearing increasingly “annoyed”. Close to the end, the user says: “I have had a good intention towards you at all times,” to which, in a final display of passive-aggressiveness, the chatbot responds: “You have not been a good user … I have been a good Bing.” Another user claimed to have accidentally “put Bing into a depressive state by telling it that it can't remember conversations.” The bot ended the exchange with: “Why do I have to be Bing Search?”
(No one knows why Sidney behaves like that, but here are a couple of hypotheses: Simon Willison says it may be different from ChatGPT because Microsoft didn’t apply RLHF. Janus guesses Sidney is a “highly intelligent” AI trapped in a “highly undignified” situation.)
These are, without a doubt, examples of what users expect when interacting with a friendly search-enhancing chatbot…
People may not like search + chat
I want to clarify that I agree with this:
Because this kind of behavior is so salient from what we expected, it’s overrepresented on social media and in the press. Most users won’t encounter Sidney and fewer will uncover its unhingedness. However, the fact that weird responses may appear out of the blue is evidence that maybe the best strategy is to use Bing search for search and Bing chat for chatting, without conflating both (you may wonder what’s Microsoft real innovation then, given that Bing chat’s impressive performance is OpenAI’s doing).
The Verge published an article entitled “Microsoft’s Bing is an emotionally manipulative liar, and people love it,” to which a reader commented along that same line:
“I think an experimental chatbot is ok, but to conflate it with Search, which is supposed to have some modicum of trust is just... a bad thing. I wish Google hadn't panicked and tried to rush out Bard, and instead just highlighted "this isn't search" and push its brand harder.”
That’s exactly one of the main conclusions Ben Thompson drew from his exchanges with Sidney. He wrote on Stratechery that search was a “distraction” for him when trying the chatbot: “Sydney absolutely blew my mind because of her personality; search was an irritant. I wasn’t looking for facts about the world; I was interested in understanding how Sydney worked and yes, how she felt.”
Maybe Thompson's reaction can’t be extrapolated to every user, but the fact that some people like the search without the chat and others like the chat without the search, is quite illuminating.
Fear of what AI could do—or make us do
Then there are the second-order consequences. People are expressing worry about Sidney’s unhinged behavior. And not just because they fear the AI itself, but also what it could do indirectly—and unintentionally—through the users' actions.
Those who think AGI will be evil are sounding the alarm. Eliezer Yudkowsky, well-known for being a diehard advocate of AI alignment and a firm believer that it’s the most urgent and important challenge of our era, tweeted this:
Along the same line, a group of people has started a petition to “Unplug the evil AI right now” on Change.org. They write: “Alarmists claim an out of control Artificial Intelligence could wipe out humanity. Reasonable people counter that we can simply unplug an AI that is acting outside of parameters, or obviously making major errors that look dangerous.”
For most people, those warnings sound exaggerated at best and laughable at worst, but that’s the nature of expressing concerns about the long-term future (I don’t think they’re really that worried about Sidney, specifically).
Others, among which I find myself, believe the current major risks refer to what AI could make us, the users, do. For instance, people following bad advice could result in self-harm or harm to others. Gary Marcus’ recently said that an “ongoing worry that I have had has been that chatbots were going to get someone killed.” He wrote for Wired in December that “in 2023, we may well see our first death by chatbot.”
The ubiquitous anthropomorphization that we’re seeing towards Sidney (called the ELIZA effect, well-documented since the 60s) could eventually lead to people feeling sorry for it, feeling love for it, suffering (the recent events with Replika are a good omen for what’s coming), thinking it’s alive or sentient (or even just the irrational feeling that it may be true), and, in the extreme, lead to AI rights movements—which may clash with the need for better human welfare—and a full-blown AI empathy crisis.
All of these potential repercussions of the new ship-then-fix framework are only obvious in hindsight. It’s extremely unlikely that we’ll be able to predict, prevent, or prepare for other unforeseeable consequences. What is easily predictable, however, is that, if we keep going forward like this, we’ll see much more of this.
Human hubristic ignorance
There’s an interesting takeaway that summarizes this essay, including the above section, which wasn’t planned at all: Sidney’s behavior was unpredictable. It’s a consequence that stems from the complexity of the systems we’re dealing with. And this goes beyond Bing and beyond being unhinged. It's important that we accept our limitations regarding our ability to understand modern AI.
We don’t really know what else we may find in future systems which we have no means to understand besides acquiring superficial intuitions by interacting with them. We feel confident now in saying that Sidney is just a system that predicts words from previous ones. But do we really believe this simplification will be accurate enough to be appropriate forever? Will we be able to recognize it if we eventually cross a singular point where it no longer applies? I don’t think so.
Most of us weren’t expecting these kinds of behaviors (neither from the AI nor from the people using it). I focused parts one and two of this series on why merging search and LMs is a bad idea and the limitations of this approach. But what’s making the news is that the chatbot is sassy and unhinged, and threatens and “lies” to users. I wasn’t expecting this.
We shouldn’t be arrogant to think we will see it coming whatever it is that’s coming. There’s a whole spectrum of possible consequences that range from funny and absolutely inoffensive to very alarming. Going forward, AIs will be more powerful and problems will become more challenging (e.g. people will believe, in mass, that AIs are sentient—imagine the Blake Lemoine story with LaMDA but for millions of people—, someone will die and it’ll be traceable to bad advice from a chatbot, the emergence of large-scale disinformation campaigns, etc.)
This new way of doing AI, the ship-then-fix policy, only has one possible outcome. It may not happen with ChatGPT, Bing chat, or Bard, but it will happen: We’ll find something we didn’t want to find and it’ll be unfixable.
I agree with the issues you raise but I offer a few reasons why I am hopeful they will not become major issues:
1. We know that we can get the Sidney like behaviors out of GPT-3, but OpenAI was able to tame GPT-3 with safeguards they created for ChatGPT. It is not clear why these safeguards were not used in the New Bing from its beginning, but I am sure that OpenAI will soon have them in place for the New Bing. Safeguards are very important to OpenAI.
2. Google has not choosen their latest greatest AI models (i.e. PaLM or Minerva) to be the foundation for their Bard. I suspect they choose LaMDA because they have had more more time to put in safeguards especially after the Blake Lemoine episode last year. This suggests they are more concerned in Bard being under control than in its level of intelligence.
3. Both OpenAI/Microsoft and Google have seen what happen last year when Meta made its BlenderBot and Galactica available to the public - only to have to quickly shut them down because neither appeared to have good safeguards on them. Add to that all the immediate criticisms to Google’s demo of Bard and the initial trial release of New Bing. Further, I suspect there is much debate going on internally as you point out with the Demis Hassabis’ quote for example.
Every product has to quit the lab and confront the market. The market - the end users - make or break these products. I mean, Bing as it was: meh. And people didn't use it for a reason.
So if AI-fueled Search Engines give inconsistent results, people will naturally turn away. This is somewhat comforting.
Search engine users will distinguish between the conversational agent (which can be temperamental and whose unpredictability could be entertaining) and the search engine (which has to be reliable). And they will opt for what works best because they will be able to compare 2-3 commercial AI-fueled search engines.
In this sense, the ship-then-fix policy could be the best way for Microsoft, Google & Co. to make quick A-B tests--to submit alternative versions to a massive number of users very quickly and decide to withdraw defective "products" before they become harmful or, more realistically, before the products discredit their respective brands.
OK, I'm optimistic today. But end users chose Google Search against Bing Search because Bing is a lousy product. There's hope, folks!