I'm fully on board with your message of "progress, but not at any cost."
We will need to find a way to navigate the inevitable AI future in a way that's morally sound and beneficial to all sides, to the extent that that's at all possible.
But as for the technical argument, I'm not so sure the NYT case for "verbatim replication" is that strong. There's a tellingly little amount of detail about how they got GPT-4 to generate the text. In the lawsuit itself, the NYT basically says they achived the output with "minimal prompting" without explaining the method.
They go into a bit more detail in the Exhibit J (https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.1.68.pdf), which is 100 instances of what NYT claim to be verbatim responses. They prompt GPT-4 (presumably in clean Playground state) with the beginning of an NYT article, which GPT-4 completes with its "verbatim" responses. However, looking at the prompts stated, there must be more to the prompt that NYT is not showing us. For instance, here's one "prompt" (page 22) that GPT-4 completes with a long paragraph: "If the United States had begun imposing social"
That's it. That's allegedly the entire prompt. I find it exceedingly unlikely that GPT-4 would complete such a vague prompt with a specific NYT article without extensive prompting that NYT is not showing. If the team e.g. shared the article URL and then asked GPT-4 to read and complete the prompt, they don't have a case at all. It would be less "GPT-4 memorizing and reciting articles from its training data" and more "GPT-4 reading the requested URL out loud."
None of this takes away from your main message in this article, with which I agree. I do also recognize that you were very explicit about the irrelevance of the technical and legal nuances to your argumentation.
Still, I'm not convinced the NYT case is quite the watershed moment that it appears to be.
If OpenAI believe that its fair game to use copyrighted material created by others without suitable compensation then I'm sure that equally they would be fine with releasing the code for GPT-4 to the world as open source....
But of course they don't believe that. The use of the word 'progress' by many AI evangelists really means the ability for corporations to increase profits regardless of any social cost. I'm certain that AI will bring many benefits (although a lot will be trivial or aimed at just improving efficiency within other corporations) but equally it's already feeding the dark side of the Internet. The proliferation of misinformation is already having real world consequences and with the realism that AI brings it's only going to get worse. That's before you even consider the actual issues with generative AI and its biases and output errors.
I think morality vs progress is the wrong dichotomy here. It's about innovation vs morals, and progress (applied innovation for the betterment of the whole of society) is the outcome when that conflict is settled. But that's just semantics, maybe.
I think that anything that slows down AI to be a good thing. I have zero sympathy for the goal of replacing humanity with machines.
If you want to imagine the future, imagine yourself watching the machines do what you wanted. All your dreams will exist only to be broken, unless we break this first.
I'd simply add that for me the issue isn't the progress technology but wrong concept of privatisation of wellness which should enrich someone at the cost of making someone worse off.
In other words... NYT is going to be worse off by not being able to sell its private information to the public if an AI company is able to retrieve it free of charge or for much less, so ultimately the argument is about profits.
If we are all here to help human civilisation then Ai as a concept should be a community tool and not in the hands of a private company. The private company who built it should be remunerated for the work, but if a private company has profit as the main focus of their job, then AI risks to be used for the wrong reasons.
Let me give a bad comparative example:
Water is a need, so the supply of water should be community lead and available to all. If you privatise the supply of water everywhere then those who control it could literally choose who gets water and at what cost, which is why a 100% privately lead water supplying system to a community would be a dangerous idea.
AI has the same risk.
The divide between rich and poor is at its highest levels today since the start of civilisation (i.e. less people own the largest amount of wealth on the planet) so we're clearly going the wrong way in that sense.
I'm not saying that clever or business savvy people should lose their accumulated wealth so everybody can get some, but I am saying we should gradually leave the concept of privatisation along with the "more for me so less for you" concept of finance and move towards circular economy where "there's enough for everybody and a balance can be created, still championing meritocracy but also not leaving anybody behind starving"
On LLM like ChatGPT storing copies this is possible by overfitting training on some common phrases or speeches. LLM as Humans use associative memory most of time but there is rote memory that humans also use for multiplication table as an example. In LessWrong Bill Benzon's Sep 2023 insightful article Notes on ChatGPT’s “memory” for strings and for events shows ChatGPT outputting the entire Hamlet's soliloquy "to be or not to be"
Over the long sweep of history we can see a process which might be described as the building of a single unified global mind. Small tribes became small towns, which became bigger towns, which became cities, which became the Internet and globalization etc.
I can't defend this theory, but I suspect that copyright will eventually be pushed aside as part of this unifying process. As it is already, a notion that any of us exclusively own any idea is pretty suspect. Human writers essentially do what GenAI does more efficiently. We gather ideas from many different sources, rearrange them a bit, and then claim ownership.
As one speculative example, if AI makes society as wealthy as some predict, all of our needs may be met by society, so there would be no need to argue over who owns what. As example, I'm on social security, which at least in part is why I don't need to demand that Alberto pay me for clogging up his comment section with my supposedly sage burpings. I can just contribute what I am able without concern as to who owns those contributions.
I've been reading your emails for months because you bring not only solid information, but usually a very different view to my own. But one I find interesting. I signed up for Substack because of this article. This is an issue I was personally thinking of as I saw artists vs pro-AI people on X verbally fist fight each other. It's been there for awhile, and I knew it'd become very loud, but 2024 seems to be the year it's really gonna pop off.
It's now yet another thing dividing us. It's as you say. I personally don't think copyright theft is right. I don't think what OpenAi or Midjourney did was right, even if I love the technology and have already benefited off of it, and can see all the things it'll help me do.
But the cost was very high, it was irresponsible and now it has tainted people's perception towards AI. I understand they needed huge amounts of data. I understand they didn't have the money as start-ups to license things. I knew they were thinking, just get the tech out, we'll deal with the consequences later.
The consequences to the users of AI who now receive a negative perception from some and those who lost their jobs from it and stolen from are very high. I have thought over things and thought "Well, this is just how it always is. Just like factories came and changed everything, it's going to happen again."
But it doesn't sit right for me. There must be balance. Yet it's also true that without what they did, nobody would probably be really paying attention to generative AI or all that willing to participate since they wouldn't have seen the value and let's face it, money in it. It's an ugly truth.
In your final lines, you offer a challenge to humanity I think we need to take. To truly have progress in all areas, we should be trying to be better than our predecessors. We have history to learn from, so how can we mitigate the damage going forward so as many people as possible can benefit, and not just the few?
The outcome of this law suit will set a major precedent that will be defining for the future of tech, creativity and the internet. Thanks for the article. However I found it too wordy and summarised it using ChatGPT to be shorter and clearer on the key points/arguments.
Great read. The morality vs progress battle is going to be fascinating to see play out.
Very well stated. Worth thinking.
Great piece, Alberto!
I'm fully on board with your message of "progress, but not at any cost."
We will need to find a way to navigate the inevitable AI future in a way that's morally sound and beneficial to all sides, to the extent that that's at all possible.
But as for the technical argument, I'm not so sure the NYT case for "verbatim replication" is that strong. There's a tellingly little amount of detail about how they got GPT-4 to generate the text. In the lawsuit itself, the NYT basically says they achived the output with "minimal prompting" without explaining the method.
They go into a bit more detail in the Exhibit J (https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.1.68.pdf), which is 100 instances of what NYT claim to be verbatim responses. They prompt GPT-4 (presumably in clean Playground state) with the beginning of an NYT article, which GPT-4 completes with its "verbatim" responses. However, looking at the prompts stated, there must be more to the prompt that NYT is not showing us. For instance, here's one "prompt" (page 22) that GPT-4 completes with a long paragraph: "If the United States had begun imposing social"
That's it. That's allegedly the entire prompt. I find it exceedingly unlikely that GPT-4 would complete such a vague prompt with a specific NYT article without extensive prompting that NYT is not showing. If the team e.g. shared the article URL and then asked GPT-4 to read and complete the prompt, they don't have a case at all. It would be less "GPT-4 memorizing and reciting articles from its training data" and more "GPT-4 reading the requested URL out loud."
None of this takes away from your main message in this article, with which I agree. I do also recognize that you were very explicit about the irrelevance of the technical and legal nuances to your argumentation.
Still, I'm not convinced the NYT case is quite the watershed moment that it appears to be.
If OpenAI believe that its fair game to use copyrighted material created by others without suitable compensation then I'm sure that equally they would be fine with releasing the code for GPT-4 to the world as open source....
But of course they don't believe that. The use of the word 'progress' by many AI evangelists really means the ability for corporations to increase profits regardless of any social cost. I'm certain that AI will bring many benefits (although a lot will be trivial or aimed at just improving efficiency within other corporations) but equally it's already feeding the dark side of the Internet. The proliferation of misinformation is already having real world consequences and with the realism that AI brings it's only going to get worse. That's before you even consider the actual issues with generative AI and its biases and output errors.
Interesting times ahead!
I think morality vs progress is the wrong dichotomy here. It's about innovation vs morals, and progress (applied innovation for the betterment of the whole of society) is the outcome when that conflict is settled. But that's just semantics, maybe.
I think that anything that slows down AI to be a good thing. I have zero sympathy for the goal of replacing humanity with machines.
If you want to imagine the future, imagine yourself watching the machines do what you wanted. All your dreams will exist only to be broken, unless we break this first.
Great article!
I'd simply add that for me the issue isn't the progress technology but wrong concept of privatisation of wellness which should enrich someone at the cost of making someone worse off.
In other words... NYT is going to be worse off by not being able to sell its private information to the public if an AI company is able to retrieve it free of charge or for much less, so ultimately the argument is about profits.
If we are all here to help human civilisation then Ai as a concept should be a community tool and not in the hands of a private company. The private company who built it should be remunerated for the work, but if a private company has profit as the main focus of their job, then AI risks to be used for the wrong reasons.
Let me give a bad comparative example:
Water is a need, so the supply of water should be community lead and available to all. If you privatise the supply of water everywhere then those who control it could literally choose who gets water and at what cost, which is why a 100% privately lead water supplying system to a community would be a dangerous idea.
AI has the same risk.
The divide between rich and poor is at its highest levels today since the start of civilisation (i.e. less people own the largest amount of wealth on the planet) so we're clearly going the wrong way in that sense.
I'm not saying that clever or business savvy people should lose their accumulated wealth so everybody can get some, but I am saying we should gradually leave the concept of privatisation along with the "more for me so less for you" concept of finance and move towards circular economy where "there's enough for everybody and a balance can be created, still championing meritocracy but also not leaving anybody behind starving"
On LLM like ChatGPT storing copies this is possible by overfitting training on some common phrases or speeches. LLM as Humans use associative memory most of time but there is rote memory that humans also use for multiplication table as an example. In LessWrong Bill Benzon's Sep 2023 insightful article Notes on ChatGPT’s “memory” for strings and for events shows ChatGPT outputting the entire Hamlet's soliloquy "to be or not to be"
https://www.lesswrong.com/posts/N9QQoxk6pGy5DG9sx/notes-on-chatgpt-s-memory-for-strings-and-for-events
Over the long sweep of history we can see a process which might be described as the building of a single unified global mind. Small tribes became small towns, which became bigger towns, which became cities, which became the Internet and globalization etc.
I can't defend this theory, but I suspect that copyright will eventually be pushed aside as part of this unifying process. As it is already, a notion that any of us exclusively own any idea is pretty suspect. Human writers essentially do what GenAI does more efficiently. We gather ideas from many different sources, rearrange them a bit, and then claim ownership.
As one speculative example, if AI makes society as wealthy as some predict, all of our needs may be met by society, so there would be no need to argue over who owns what. As example, I'm on social security, which at least in part is why I don't need to demand that Alberto pay me for clogging up his comment section with my supposedly sage burpings. I can just contribute what I am able without concern as to who owns those contributions.
I've been reading your emails for months because you bring not only solid information, but usually a very different view to my own. But one I find interesting. I signed up for Substack because of this article. This is an issue I was personally thinking of as I saw artists vs pro-AI people on X verbally fist fight each other. It's been there for awhile, and I knew it'd become very loud, but 2024 seems to be the year it's really gonna pop off.
It's now yet another thing dividing us. It's as you say. I personally don't think copyright theft is right. I don't think what OpenAi or Midjourney did was right, even if I love the technology and have already benefited off of it, and can see all the things it'll help me do.
But the cost was very high, it was irresponsible and now it has tainted people's perception towards AI. I understand they needed huge amounts of data. I understand they didn't have the money as start-ups to license things. I knew they were thinking, just get the tech out, we'll deal with the consequences later.
The consequences to the users of AI who now receive a negative perception from some and those who lost their jobs from it and stolen from are very high. I have thought over things and thought "Well, this is just how it always is. Just like factories came and changed everything, it's going to happen again."
But it doesn't sit right for me. There must be balance. Yet it's also true that without what they did, nobody would probably be really paying attention to generative AI or all that willing to participate since they wouldn't have seen the value and let's face it, money in it. It's an ugly truth.
In your final lines, you offer a challenge to humanity I think we need to take. To truly have progress in all areas, we should be trying to be better than our predecessors. We have history to learn from, so how can we mitigate the damage going forward so as many people as possible can benefit, and not just the few?
The outcome of this law suit will set a major precedent that will be defining for the future of tech, creativity and the internet. Thanks for the article. However I found it too wordy and summarised it using ChatGPT to be shorter and clearer on the key points/arguments.
Feels a bit ironic.