i suspect that the vast difference between sora and veo2 is from video library used for pretraining. google's unfettered access to youtube probably has much to do with it. also given that openai now should have access to iphone cameras that their library should grow by leaps and bounds and significantly improve sora. this means that physics is wrapped up in pre- and post- training on video content. on a different note, now that people are desensitized to ai videos on social media, they will become more skeptical and enthralled with what they see and hopefully disengage. this should be very damaging to tictok and instragram.
1. I doubt this is just about more training data. DeepMind is know to do a lot of ingenuous tweaking besides pure deep learning. In not sure whether it's just more data, different data, better post training or something else entirely.
2. I assume you mean you "hope" people will disengage. But do you believe they will? I don't. First, most people don't know what's going on! Like, almost nobody. Second, people don't really care. Social Media is way too addictive. But I hope I'm wrong!
Thanks for the thoughtful write up! I’m on the team that built Imagen 3 and Veo, and this new Veo 2 model is the most exciting new model I’ve had the opportunity to explore and evaluate since we built the Parti image generation model a couple years back.
An important component of our release of these models is that every image and video is tagged with SynthID so that they can be verified as AI generated.
These are part of a broader approach to how the benefits of these technologies can be brought to the world while mitigating some of the risks you mention.
Amazing work Jason, congratulations to you and the team! Both Imagen 3 and Veo 2 are something else. Also, thank you for sharing these resources and for taking these concerns seriously.
I agree with your point about generations. But I think you underestimate the importance of deepfakes as deception. Forget the Pope in Balenicaga -- that's really trivial. Think about the targeted use of deepfakes to attack political enemies. It isn't necessary that the deepfakes be viral in such cases, only that they be plausible to finders of fact in a court case, impeachment, or other proceeding.
Your post appeared in my email on the same day that the House Republicans announced that they would recommend the FBI investigate Liz Cheney for her role on the committee investigating Jan 06. One can't call this an extreme example: it's actually happening, as ridiculous as it sounds. My spouse is an opposition politician here in Japan, where one party has been in power for 64 of the last 69 years. These videos sent chills down my spine, thinking about how they could be used against her and her colleagues to suppress democratic alternation of power.
Your reference to "trading trust in visuals for boundless creativity" illustrates part of the social problem. "Boundless creativity" incorporates two very favorable words in our culture -- and boundaries are particularly regarded negatively when it comes to economic and intellectual activity.
One of the underpinnings of economic freedom is the observation by 19th C. neoclassical economist Léon Walras that a vial of medicine may have more utility to a man planning to use it to murder his family than to a doctor seeking to use it to cure a sick child. The neoclassicals (whose fallacious and inconsistent theory is taught today in every Econ 101 class around the globe) were proud that their theory was completely divorced from ethical considerations. This is in contrast to an older strain of economic theory going back to Aristotle's Politics, that economic activity must be rooted in the ethics of the community.
The deepfake problem and the forgetting/generational problem you highlight are (further) evidence that GenAI technology, like economic theory, weakens the ethical bonds that can strengthen civic communities. That's going to be a major problem going forward. Boosterish phrases like "boundless creativity" hide the problem from us.
BTW, in stressing the necessity of ethical constraints on AI development (among other forms of innovation), I don't mean to suggest that finding the bounds will be easy, or that this be left to some sort of morality police. We need to find a way to have some sort of community-wide deliberation and attempts at consensus before technology deployment. Leaving AI development and deployment decisions to the whims of private corporations is a great way to destroy any chance of public deliberation, and possibly to destroy what little is left of our communities in toto.
The thing about deception is that it doesn't happen as easily as we fear because people believe what they want to believe, independent of the provenance of information or the underlying truth.
It's not that deception can't happen but that the real power of deepfakes is expression instead. There's both but the real effect is through the latter. The consequences, however, are often indistinguishable. The problem is that our worries are misplaced. (I have an article ready for January where I go much more in-depth on this topic.)
I think you underestimate the potential of this technology to distort burdens of proof in individual cases, and in countries outside the US (for now). Consider, e.g., fabricated “hidden camera” footage of several purported incidents of a person receiving money from agents of a foreign adversary. If there aren’t any tells of AI generation, it will be harder to find anyone — NGOs, maybe even family members — willing to stand up for that person.
Consider families that may be disrupted in divorce and child custody cases by fabricated evidence, if generation of false evidence by AI becomes harder to detect. Those cases are decided by judges, based on the evidence proffered, not on the judge’s prejudices.
So far it seems you’re being very utilitarian, focused on deception writ large, rather than on non-viral applications of this tech.
Your remark about people believing what they want seems to discount that this can be affected by the quality of the evidence before them —as if we’ve all made up our minds about everything in advance (another interesting echo, BTW, no doubt fortuitous, of modern economic theory).
And the techno-optimist discourse of “we’ve always adapted …,” even if we accept it FSA as accurate, ignores a more fundamental principle that past performance is not indicative of future results. And especially as access to the tech becomes wider. E. g. LLMs maybe were astonishing in the lab, but now cause havoc in education (I speak as someone who teaches college students). Students have adapted to using LLMs, but with depressingly crappy results.
Again, I’m not disagreeing with you about the downsides of expressive use of this technology, but only about your apparent downplaying of the deception risks. I look forward your piece next month.
Google cooked on this one wow!! The blueberries in the cup of water and the slicing of the tomato really stood out to me, I wouldn’t be able to distinguish them as ai generated
We need personalised AIs to help people emotionally cope. Unfortunately there’s no where near enough psychologists, and the psychs are wayyy behind in the tech advancement side of things
First, I learn something important every time I read your posts. Thank you for tracking all this for the rest of us. I'm a retired philosopher whose focus has been philosophy of technology for three our four decades. I wrote a long essay in 2008 on Kurzweil's "The Singularity Is Near." His law if accelerating returns is being demonstrated every day. As you show, Deep Mind Veo 2 is stunning, although the other video generators are already pretty awesome. And all this in a few short years as the developmental curve is nearly vertical. What are we to see in the next 18 months? Toffler's "Future Shock" is upon us. Your remarks about the gap between adults and kids is spot on. Moreover, there will be growing gaps between some kids and others, perhaps on a weekly or even daily basis, thus raising the disturbing question of human identity: Who are we in lives and communities without continuity and without agreed upon "reality" conditions? I'm going to share your post as widely as I can. Michael Z
Thank you Michael!! "What are we to see in the next 18 months?" That's the question we're all trying to answer, and every time we miss the mark with our conservativeness.
Though keep in mind that running one is a lot less intensive than the original training of it. Like the Hunyuan one was open sourced and can run on a higher end consumer graphics card if you use a quantized version (and are willing to wait a while for each generation).
That’s actually one of the scarier aspects of all this IMO. If giant companies like Google were the only ones able to run this stuff, it is more controllable. But already you can run many still image models directly on an iPhone. Not really with video yet, but what about in 2 years?
In the end, these things are just a file that you download (6gb for an image model, 42gb for Hunyuan) and run with GPUs.
Sadness is my overriding feeling around this, too. I have a couple of questions: why is there still so little discussion about the catastrophic environmental impact of this tech? It seems to be the number one problem. And secondly, why are you against degrowth as an approach? Is it just that you think it’s unrealistic? Because we may not have a choice, if climate collapse has anything to do with it.
I think rather than degrowing the economy we can work forward to transform all sources of energy into clean renewable energy, e.g. solar and wind, etc. (perhaps using nuclear as a transition point). The thing about AI and climate change is that AI really affects very little compared to other things that have existed for decades and decades. Why focus on the tiny thing? Because it's the new thing. All efforts to fight the climate are better spent somewhere else (e.g. better energy sources, less fossil fuel usage, less cars more sustainable transport methods, less meat more veggies, etc). I don't deny AI is not helpful (and won't defend it on the grounds that it might be helpful down the line, e.g. by discovering better methods to eliminate CO2 etc.) but it's also not the most pressing of our problems in this regard.
I agree that we have to concentrate on other areas too, particularly the ones you mention, and we desperately needed to do this without the addition of this tech… but
If our governments are backing down from decarbonisation plans as a direct response to the surge in demand even in this early phase, that has to be a red flag, no? Are the reports that ‘generating an image or two evaporates a bottle of water, or is the equivalent of charging your phone from empty’, false?
If cloud computing accounts for more emissions than the global commercial aviation industry, it’s not a tiny thing if the addition of Gen AI to the system is taken into consideration. At least from my understanding. Plus given the tech needs the resources so badly, and we know that the resources are going to be increasingly rare as the climate continues to collapse, isn’t it foolish to transition our systems and economies to be reliant on it?
I keep coming back to the ‘why?’ - I still don’t understand why we need this. People say ‘it’s just a tool’, but what demand does it fulfill? The same demand the creative industries already fulfill? If it doesn’t offer something genuinely useful, I’d argue it’s not by definition a tool. At least, not for the majority of people. To me this makes the environmental impact that much more grotesque.
I wholeheartedly agree your last question needs to be askedm that's the last section of the essay. A question we unfortunately rarely ask. But I'd go beyond climate change on why this is important. Gen AI adds little to the carbon footprint of cloud computing, which is mostly centered on other stuff. But I agree, we gotta keep an eye on that (major companies are restarting nuclear plants to power datacenters, which is better than burning fossil fuels)
i suspect that the vast difference between sora and veo2 is from video library used for pretraining. google's unfettered access to youtube probably has much to do with it. also given that openai now should have access to iphone cameras that their library should grow by leaps and bounds and significantly improve sora. this means that physics is wrapped up in pre- and post- training on video content. on a different note, now that people are desensitized to ai videos on social media, they will become more skeptical and enthralled with what they see and hopefully disengage. this should be very damaging to tictok and instragram.
Interesting. My two cents:
1. I doubt this is just about more training data. DeepMind is know to do a lot of ingenuous tweaking besides pure deep learning. In not sure whether it's just more data, different data, better post training or something else entirely.
2. I assume you mean you "hope" people will disengage. But do you believe they will? I don't. First, most people don't know what's going on! Like, almost nobody. Second, people don't really care. Social Media is way too addictive. But I hope I'm wrong!
Thanks for the thoughtful write up! I’m on the team that built Imagen 3 and Veo, and this new Veo 2 model is the most exciting new model I’ve had the opportunity to explore and evaluate since we built the Parti image generation model a couple years back.
An important component of our release of these models is that every image and video is tagged with SynthID so that they can be verified as AI generated.
https://deepmind.google/technologies/synthid/
We are also part of C2PA, a consortium that adds metadata to generated content.
https://c2pa.org/
These are part of a broader approach to how the benefits of these technologies can be brought to the world while mitigating some of the risks you mention.
Amazing work Jason, congratulations to you and the team! Both Imagen 3 and Veo 2 are something else. Also, thank you for sharing these resources and for taking these concerns seriously.
I agree with your point about generations. But I think you underestimate the importance of deepfakes as deception. Forget the Pope in Balenicaga -- that's really trivial. Think about the targeted use of deepfakes to attack political enemies. It isn't necessary that the deepfakes be viral in such cases, only that they be plausible to finders of fact in a court case, impeachment, or other proceeding.
Your post appeared in my email on the same day that the House Republicans announced that they would recommend the FBI investigate Liz Cheney for her role on the committee investigating Jan 06. One can't call this an extreme example: it's actually happening, as ridiculous as it sounds. My spouse is an opposition politician here in Japan, where one party has been in power for 64 of the last 69 years. These videos sent chills down my spine, thinking about how they could be used against her and her colleagues to suppress democratic alternation of power.
Your reference to "trading trust in visuals for boundless creativity" illustrates part of the social problem. "Boundless creativity" incorporates two very favorable words in our culture -- and boundaries are particularly regarded negatively when it comes to economic and intellectual activity.
One of the underpinnings of economic freedom is the observation by 19th C. neoclassical economist Léon Walras that a vial of medicine may have more utility to a man planning to use it to murder his family than to a doctor seeking to use it to cure a sick child. The neoclassicals (whose fallacious and inconsistent theory is taught today in every Econ 101 class around the globe) were proud that their theory was completely divorced from ethical considerations. This is in contrast to an older strain of economic theory going back to Aristotle's Politics, that economic activity must be rooted in the ethics of the community.
The deepfake problem and the forgetting/generational problem you highlight are (further) evidence that GenAI technology, like economic theory, weakens the ethical bonds that can strengthen civic communities. That's going to be a major problem going forward. Boosterish phrases like "boundless creativity" hide the problem from us.
BTW, in stressing the necessity of ethical constraints on AI development (among other forms of innovation), I don't mean to suggest that finding the bounds will be easy, or that this be left to some sort of morality police. We need to find a way to have some sort of community-wide deliberation and attempts at consensus before technology deployment. Leaving AI development and deployment decisions to the whims of private corporations is a great way to destroy any chance of public deliberation, and possibly to destroy what little is left of our communities in toto.
The thing about deception is that it doesn't happen as easily as we fear because people believe what they want to believe, independent of the provenance of information or the underlying truth.
For instance, we worried AI would be catastrophic for the US election but it mostly didn't happen: (https://theconversation.com/the-apocalypse-that-wasnt-ai-was-everywhere-in-2024s-elections-but-deepfakes-and-misinformation-were-only-part-of-the-picture-244225, https://time.com/7131271/ai-2024-elections/, https://www.washingtonpost.com/technology/2024/11/09/ai-deepfakes-us-election/).
It's not that deception can't happen but that the real power of deepfakes is expression instead. There's both but the real effect is through the latter. The consequences, however, are often indistinguishable. The problem is that our worries are misplaced. (I have an article ready for January where I go much more in-depth on this topic.)
I think you underestimate the potential of this technology to distort burdens of proof in individual cases, and in countries outside the US (for now). Consider, e.g., fabricated “hidden camera” footage of several purported incidents of a person receiving money from agents of a foreign adversary. If there aren’t any tells of AI generation, it will be harder to find anyone — NGOs, maybe even family members — willing to stand up for that person.
Consider families that may be disrupted in divorce and child custody cases by fabricated evidence, if generation of false evidence by AI becomes harder to detect. Those cases are decided by judges, based on the evidence proffered, not on the judge’s prejudices.
So far it seems you’re being very utilitarian, focused on deception writ large, rather than on non-viral applications of this tech.
Your remark about people believing what they want seems to discount that this can be affected by the quality of the evidence before them —as if we’ve all made up our minds about everything in advance (another interesting echo, BTW, no doubt fortuitous, of modern economic theory).
And the techno-optimist discourse of “we’ve always adapted …,” even if we accept it FSA as accurate, ignores a more fundamental principle that past performance is not indicative of future results. And especially as access to the tech becomes wider. E. g. LLMs maybe were astonishing in the lab, but now cause havoc in education (I speak as someone who teaches college students). Students have adapted to using LLMs, but with depressingly crappy results.
Again, I’m not disagreeing with you about the downsides of expressive use of this technology, but only about your apparent downplaying of the deception risks. I look forward your piece next month.
Google cooked on this one wow!! The blueberries in the cup of water and the slicing of the tomato really stood out to me, I wouldn’t be able to distinguish them as ai generated
Agreed. One thing is not distinguishing AI images, but already AI videos as well? This was unexpectedly quick
Yeah it way too quick for me to keep up. And I am deeply into this stuff researching every day. Good luck to the average joe.
We need personalised AIs to help people emotionally cope. Unfortunately there’s no where near enough psychologists, and the psychs are wayyy behind in the tech advancement side of things
First, I learn something important every time I read your posts. Thank you for tracking all this for the rest of us. I'm a retired philosopher whose focus has been philosophy of technology for three our four decades. I wrote a long essay in 2008 on Kurzweil's "The Singularity Is Near." His law if accelerating returns is being demonstrated every day. As you show, Deep Mind Veo 2 is stunning, although the other video generators are already pretty awesome. And all this in a few short years as the developmental curve is nearly vertical. What are we to see in the next 18 months? Toffler's "Future Shock" is upon us. Your remarks about the gap between adults and kids is spot on. Moreover, there will be growing gaps between some kids and others, perhaps on a weekly or even daily basis, thus raising the disturbing question of human identity: Who are we in lives and communities without continuity and without agreed upon "reality" conditions? I'm going to share your post as widely as I can. Michael Z
Thank you Michael!! "What are we to see in the next 18 months?" That's the question we're all trying to answer, and every time we miss the mark with our conservativeness.
All this AI video blows my mind.
I'm a complete ignorant on this field and I cannot imagine the computing power needed for all those videos.
Thanks for this article Alberto.
It is mindblowing! The quality and the physical consistency, just amazing even for people closely following the space.
Though keep in mind that running one is a lot less intensive than the original training of it. Like the Hunyuan one was open sourced and can run on a higher end consumer graphics card if you use a quantized version (and are willing to wait a while for each generation).
That’s actually one of the scarier aspects of all this IMO. If giant companies like Google were the only ones able to run this stuff, it is more controllable. But already you can run many still image models directly on an iPhone. Not really with video yet, but what about in 2 years?
In the end, these things are just a file that you download (6gb for an image model, 42gb for Hunyuan) and run with GPUs.
https://hayleycantolino.substack.com?r=4w9uvz&utm_medium=ios
Sadness is my overriding feeling around this, too. I have a couple of questions: why is there still so little discussion about the catastrophic environmental impact of this tech? It seems to be the number one problem. And secondly, why are you against degrowth as an approach? Is it just that you think it’s unrealistic? Because we may not have a choice, if climate collapse has anything to do with it.
I think rather than degrowing the economy we can work forward to transform all sources of energy into clean renewable energy, e.g. solar and wind, etc. (perhaps using nuclear as a transition point). The thing about AI and climate change is that AI really affects very little compared to other things that have existed for decades and decades. Why focus on the tiny thing? Because it's the new thing. All efforts to fight the climate are better spent somewhere else (e.g. better energy sources, less fossil fuel usage, less cars more sustainable transport methods, less meat more veggies, etc). I don't deny AI is not helpful (and won't defend it on the grounds that it might be helpful down the line, e.g. by discovering better methods to eliminate CO2 etc.) but it's also not the most pressing of our problems in this regard.
I agree that we have to concentrate on other areas too, particularly the ones you mention, and we desperately needed to do this without the addition of this tech… but
https://www.ft.com/content/533a031a-29d0-434a-b6e0-daadf2ff3add
If our governments are backing down from decarbonisation plans as a direct response to the surge in demand even in this early phase, that has to be a red flag, no? Are the reports that ‘generating an image or two evaporates a bottle of water, or is the equivalent of charging your phone from empty’, false?
If cloud computing accounts for more emissions than the global commercial aviation industry, it’s not a tiny thing if the addition of Gen AI to the system is taken into consideration. At least from my understanding. Plus given the tech needs the resources so badly, and we know that the resources are going to be increasingly rare as the climate continues to collapse, isn’t it foolish to transition our systems and economies to be reliant on it?
I keep coming back to the ‘why?’ - I still don’t understand why we need this. People say ‘it’s just a tool’, but what demand does it fulfill? The same demand the creative industries already fulfill? If it doesn’t offer something genuinely useful, I’d argue it’s not by definition a tool. At least, not for the majority of people. To me this makes the environmental impact that much more grotesque.
I wholeheartedly agree your last question needs to be askedm that's the last section of the essay. A question we unfortunately rarely ask. But I'd go beyond climate change on why this is important. Gen AI adds little to the carbon footprint of cloud computing, which is mostly centered on other stuff. But I agree, we gotta keep an eye on that (major companies are restarting nuclear plants to power datacenters, which is better than burning fossil fuels)