OpenAI Has Just Killed Prompt Engineering…

Sep 20, 2023

You can now get high-quality images that depict complex scenes by default

23 Comments

Sep 20, 2023

This genuinely feels like a major paradigm shift. Even if the image quality isn't quite at the same level as Midjourney, opening the flood gates to any average person being able to conjure up whatever they can think of is massive.

I'm also happy to see that OpenAI are taking steps to address some of the ethical issues with having these models trained on the work of artists who aren't compensated. (It's in the "Creative Control" section of the "A focus on safety" chapter. They now claim ChatGPT will refuse to generate images in the style of a living artist. And they let artists proactively opt out of models training on their work. Whether this goes far enough is of course another discussion.

But I'm probably not as fatalistic as you seem to be about us losing humanity when it comes to this specific development. I see many wonderful use cases where passionate authors can play around with visualizing the scenes they describe in an extremely nuanced way. I see kids exploring magical worlds of their imagination (the DALLE-3 demo video about the hedgehog is along those lines). I see the average person no longer constrained by their technical / artistic ability giving outlet to amazing creations lying dormant in their minds. And so on.

Sure, as with most generative AI, we'll see battles over ethics, copyright, etc. and we'll want some regulations in place to prevent the worst abuses. But I definitely see the potential for this to be huge booster of creativity, where AI and people work in tandem to create something new.

Expand full comment

Reply (1)

Comment deleted

Sep 20, 2023

Comment deleted

Expand full comment

Birgitte Rasine

Sep 21, 2023

So well said Phil. So well said.

Expand full comment

Daniel Nest

Sep 21, 2023

Phil "Gloom-n-Doom" Tanny strikes again!

I think that's a fair alternative take in terms of where the society might broadly be headed.

I'm not convinced that "Having a text-to-image model that understand words better" is necessarily the most blatant example of it though.

We already have the technology for people who might wish to isolate themselves from the rest of humanity and live inside a sick fantasy world. Deepfakes, text-to-image models without NSFW filters, VR, etc. So there's plenty to satisfy any "happy slave" tendencies already.

But I'm not as cynical about humanity as a whole. There'll always be those who use technology to satisfy base urges (Stable Diffusion porn, ChatGPT scams and spam, etc.) , and there will be others who use it as an outlet for creativity and finding a sense of belonging (I encourage you to visit some of the Midjourney forums and hear stories of people sharing the power of this tech to heal and create genuine connections with others).

Technology in itself is neutral. It's us who imbue it with either negative or positive traits, depending on who we are.

If I'm to buy the argument that all of us are slowly heading to a future where we'll want nothing to do with each other and live in make-believe worlds of our own that satisfy our most depraved urges, then we're doomed either way. AI or not.

Expand full comment

Reply (1)

Comment deleted

Sep 21, 2023

Comment deleted

Expand full comment

Daniel Nest

Sep 21, 2023Edited

If we look at it from that perspective, then one can argue that there's nothing fundamentally different between enjoying alone time in the woods and enjoying a virtual life inside a fantasy world of your own creation. As long as both give you an uplifting experience that doesn't negatively affect others.

Personally, I'd choose the woods. But if someone prefers the alternative? I'm just happy that AI might give them that option.

Have a fantastic time in North Florida. It actually sounds incredible.

Expand full comment

John Ward

Sep 20, 2023

I doubt anyone outside of OpenAI knows the answer to this, but what’s your take on the phrase “available to GPT Plus consumers”? Does that mean included for their subscription price or does it mean the ability to add on this feature for a surcharge?

Expand full comment

Reply (1)

Alberto Romero

Sep 20, 2023

I interpret it as meaning "included in the price for ChatGPT Plus." And I hope that's the case. If DALL-E 3 is as good as they say I'll probably subscribe to Plus again (I tested GPT-4 for a while but after realizing I had no use case worth $20/month, I unsubscribed).

Expand full comment

Reply (1)

Daniel Nest

Sep 20, 2023

For what it's worth, prior communication about "ChatGPT Plus customers" meant exactly this: Included in the price. Like with the Code Interpreter (now "Advanced Data Analysis")

Expand full comment

Birgitte Rasine

Sep 21, 2023

Opting out is a cop-out. If OpenAI really wants to be ethical and do the right thing by artists, they would use the opt-in model. In other words, no scraping the Internet for images of art or photography without explicit permission of the artists.

Expand full comment

Reply (1)

Alberto Romero

Sep 21, 2023

Right. The thing is, they want to be ethical as long as it doesn't clash with the business. Which can be criticized but what do we expect exactly? That cliche "blame the game, not the player" applies here (FWIW I think we should all strive to be slightly more ethical than the minimum that "the game" forces us to be - the world would be a better place).

Expand full comment

Manish Sahajwani

Sep 23, 2023

Very interesting. Do you know if it allows training on your own photos?

Expand full comment

Reply (1)

Alberto Romero

Sep 23, 2023

I don't think OpenAI has said anything about fine-tuning options yet. If they allow that with DALL-E 2 (I don't know), they will likely allow it with DALL-E 3, and vice versa.

Expand full comment

Reply (1)

Manish Sahajwani

Sep 24, 2023

Thanks!

Expand full comment

David Harper, CFA, FRM

Sep 22, 2023

Except we haven't had a chance to actually try DALLE-3 yet. I'll judge it when I can actually run tests.

Expand full comment

Reply (1)

Alberto Romero

Sep 22, 2023

Yep, we'll have to check its limitations - where it fails and where it succeeds. Even if it has increased language capabilities those capabilities will have limits somewhere. Our job will be finding them.

Expand full comment

Fred Hapgood

Sep 21, 2023

>> ... I can't help but feel that we—we, humanity—are losing something every time we take a step in >> this direction.

>> What do you think?

Since you ask, my own feeling is that we would losing something way more important if we *didn't* "take a step in this direction"!

Expand full comment

Reply (1)

Alberto Romero

Sep 21, 2023

Also true, definitely. These things tend to be more complex than black and white.

Expand full comment

Sheila Dean

Sep 20, 2023

I'm kind of shocked. It happened pretty fast.

Expand full comment

Reply (1)

Alberto Romero

Sep 21, 2023

Actually DALL-E 2 went out almost a year and a half ago. A rather long time by AI standards. I think this is OpenAI's sign that they're definitely going all in on multimodality (probably to fight Google's Gemini).

Expand full comment

Andrew Smith

Sep 20, 2023

I'll definitely dive in as soon as I can and let you know what I think.

Expand full comment

Paul Toensing

Sep 20, 2023

Creating scenes and images to very exacting specifications is something that we should have been conditioned to thinking about ever since seeing it fictionally accomplished on the holodeck as it was seen in the late 1980s on Star Trek: TNG. I'm not altogether sure that I can manufacture indignation for having "lost something" for being able to do it now and not having been able to do it before in reality. Perhaps I'm missing something. Could someone explain what the big deal is?

Expand full comment

Val Osnovyanenko

Sep 30, 2023

I think the artists would benefit from including their art in the training dataset if they made some royalty agreements

Expand full comment

Aditya Anil

Sep 21, 2023

OpenAI is planning is release Dalle 3 with ChatGPT in early October. I feel this is going to be similar to the image generation within the Bing AI chat - but better. I observed that texts got better in Dalle 3 (for instance, the 21 in your cover image and of course this -> https://images.openai.com/blob/0303dc78-1b1c-4bbe-a24f-cb5f0ac95565/avocado-square.png?trim=0,0,0,0&width=2000).

Thats all cool, but one thing puzzles me. On their website, they said -- "As with DALL·E 2, the images you create with DALL·E 3 ... don't need our permission to reprint, sell or merchandise them"

The monetization part of the AI-Generated images - which by nature is made from taking creative instances from thousands of images - is still complicated.

Sure, you can disallow GPTBot from crawling your work. However, that only works possible when the bot hasn't already crawled your works. Even the site says, "Creators can now also opt their images out from training of our *future image generation* models."

What should be the way of managing the monetisation of AI-Generated Artworks? In worst cases, it could feel like reselling Picasso's work by distorting the canvas and sprinkling it with filters and colours of Van Gogh's.

Expand full comment

Reply (1)

Shawn Fumo

Sep 22, 2023

Well currently nothing stopping from selling, but I believe the copyright office struck down copywriting a particular image that was generated just from prompting. So that is already definitely a restriction compared to creating an image yourself.

I feel like it is hard to get a handle on what this kind of training really means though in terms of the original creators. Obviously it couldn't do what it does without training on the original works, and in some sense is competing with those same people, but I also feel like people tend to get the wrong idea of how it works. Like saying it collages a bunch of images together or filters/distorts images. While we can't say it is learning in a fully human sense, to me it feels more like that than it does a mashup.

No idea how DALL-E 3 works internally, but a common way image generators are trained (from my understanding) is a two-step process. First an image classifier is trained (or an existing one like CLIP is used). When you give that model an image, it spits out a bunch of labels with confidence percentages. So if you gave it an oil painting of a red car, maybe it has "painting: 85%, red 95%, car 90%, truck 40%, lizard 0.1%, etc".

Now the image model is trained. An image is given to the classifier, which spits out the labels and confidence levels. Then some amount of noise is added to the image. Now that noisy image and the labels are given to the image model. It then tries to reduce the noise according to the labels. So in our above example, it tries to make it more "painting" and more "red", but not more "lizard". To do this, it needs to look for those concepts in the existing image (like how a person sees animals in clouds) and add additional details. A resulting image is created and now that new image is given to the classifier. The labels are compared to the original labels to see how good of a job it did. If the new image is now 60% "red", then it gets feedback that it went in the wrong direction.

You can see from this process that the AI never actually sees the original image before it was made noisy, and its result isn't compared directly to the original either, but to the combination of labels involved. And it is forced to really lean into visual concepts in general, because there is no room to store all the input images.

So internally it is going to end up with some model of what "banana" means in terms of shape, texture, colors. And a style like "drawing" is likely built up in terms of what that means when applied to various kinds of low level shapes and lighting. There wouldn't be space for every permutation of every object (all angles, all photo/oil/watercolor/drawing/pastel styles, all variations of size and color) to be stored separately. Similarly, it is much smaller and more general to define "spiral" as a particular mathematical pattern and then apply colors and textures (or even objects) to that. Custom Stable Diffusion models on places like Civitai are only 2GB in size and produce very high quality. Think about how small that actually is, considering the flexibility.

And these abstractions are what lets it combine different concepts together. Like MidJourney can make a vacuum cleaner with a combination of the styles of HR Giger and Lisa Frank. As far as I know, neither actually painted a vacuum and there wouldn't be a lot of references of their styles combined. But if it is trying to progressively find shapes in noise and add diff kinds of details at the same time, it works out.

Expand full comment

The Algorithmic Bridge

OpenAI Has Just Killed Prompt Engineering…