DALL·E Mini Won—But Now OpenAI Is About to Open DALL·E 2
3 keys that led to the success of the copycat and why it matters to you that it's coming to an end.
DALL·E 2 was released on April 6 and, in a matter of days, it captured the world’s attention. DALL·E 2 wasn’t just the highest-quality text-to-image AI at the time, but a significant qualitative leap from previous ones. It was a phenomenon and people’s excitement grew accordingly.
But OpenAI didn’t want to give people public access due to safety concerns, allowing only a very restricted group of people to use it. On top of that—and in line with the company’s ironic closed-source tendencies—they didn’t open-source DALL·E 2.
That’s right. The company announced it, gave access to its—and Microsoft’s—employees, refused to let university researchers study it, and promoted the impressive generations non-stop on Twitter. An amazing PR stunt that captured our interest only to signal that they’re still at the forefront of AI research.
For days, Twitter was flooded with unbelievable paintings, drawings, and photorealistic pics. But people wanted access. They wanted to play with it and not just wait for OpenAI employees to share a new breathtaking creation.
(In defense of OpenAI, I’ll say that restricting access when the downstream harm isn’t well-analyzed is better than carelessly opening the model to everyone. Still, that’s insufficient. See BigScience’s approach).
What could people do? The only option was signing up for a neverending waitlist (that seems to show undisclosed preferences towards specific groups of people). A few weeks ago OpenAI accelerated the pace allowing access to 10K people/week (as of July 1, +50K people have access).
But it was too late.
DALL·E mini had arrived.
A new text-to-image model built on the foundations of DALL·E 2. Despite its apparent lower quality and limited skill, it captured the collective attention faster even than its predecessor. To give you some numbers, DALL·E 2 generated +3 million images in the first month and a half whereas DALL·E mini was generating up to +50 million a day just a couple of weeks ago. That’s a 700x increase.
The immediate and overwhelming success of DALL·E mini against its much more qualified older brother tells us a story that we’re going to see more of in the future.
Let’s see what happened, why it may be coming to an end, and why it matters to you.
How did DALL·E mini go viral overnight?
Three key factors were determinants in making DALL·E mini the star, to detriment of DALL·E 2 — at least for the general, non-technical public. This was a warning for OpenAI and a hint to the path the AI community is choosing against big-tech privately-held AI models.
1. Open-source AI will win eventually
Just a week ago I published what I consider the most important article I’ve written this year: “BLOOM Is the Most Important AI Model of the Decade.” The headline is bold—even risky—, but I stand by what I wrote. We’re living the beginning of a socio-political shift in AI.
Initiatives like BigScience, Hugging Face, and EleutherAI, among many others, are balancing the power dynamics between big tech companies like Google, OpenAI, or Meta, and the rest of us. Open-source AI is the future.
DALL·E mini is another example of the same trend. OpenAI published a great text-to-image model and gave us a glimpse but didn’t give us either access or a way to study it further. Boris Dayma, an ML researcher based in Texas, decided to leverage the few technical hints OpenAI disclosed in the paper and created something similar — but this time, he open-sourced it through Hugging Face.
DALL·E mini went viral once people realized they could use it too, through a straightforward inference API. The sensation is quite different when you are the one playing and creating the images. You suddenly feel an active part of the movement. That’s significantly more attractive than waiting for OpenAI employees to share a new image.
What is most striking to me and reveals more clearly that open-source will win is that, even at the cost of glaringly obvious lower-quality results, DALL·E Mini was getting 200K users a day.
Companies like OpenAI may not be able to adapt their strategies before it’s too late and these technologies become commoditized through open-source copycats. If the current monopoly/oligopoly shifts towards a more healthy competence, that’s good for us.
2. A catchy name
There are many, many open-source alternatives to DALL·E 2, but none have taken off like DALL·E mini. To power the spreadability of the model, Boris Dayma took advantage of the already built-in popularity of OpenAI’s DALL·E family and named it after the original.
He claims he used the “DALL·E” name as a form of tribute to OpenAI, the source of his inspiration. That’s why he put the surname “mini”. To make it clear the model was different—and lower quality.
However, these details often go over our heads. After searching a little bit on Twitter’s #dalle hashtag and Reddit’s r/dalle and r/dalle2 I found out that many people had been confusing the two models. They didn’t know DALL·E 2 and DALL·E mini had barely anything to do with one another, technically speaking. DALL·E mini uses a different architecture, techniques, and dataset than DALL·E 2 and it’s notably smaller: 0.4B vs ~5.5B params.
People who aren’t fully aware of what’s cooking in the kitchens of these research labs only see what reaches the surface. Who wouldn’t, at the very least, think DALL·E 2 and DALL·E Mini belong to the same people?
Now, after a petition from OpenAI “to avoid confusion”, DALL·E Mini has been renamed Craiyon. But not before the whole world knows about it.
3. Humor rules the Internet
One of the key features of DALL·E mini creations is that they aren’t anywhere near realistic the same way DALL·E 2’s are. It’s impossible to mistake DALL·E mini images with a human-made image — which Dayma recognizes as a feature and not a bug because it makes the model safer to use and deploy.
This isn’t just about DALL·E mini’s lower quality, but about its idiosyncratic style. Its creations are solid meme material. The Weird DALL·E Mini Generations Twitter account (see the cover image) is the foremost proof of that. In a few months, it amassed over 1 million followers and its Tweets easily reach 50K likes.
DALL·E 2 alludes to our sense of beauty, and impressiveness; we’re amazed by DALL·E 2 creations. But DALL·E mini tends to fall more on the absurdly-humorous side.
We like interesting, but we love funny.
DALL·E 2 can create funny images, but it feels like it’s trying too hard because it’s just too good. DALL·E mini’s humor stems in part from its inability to do better. Let’s see some examples from the Weird DALL·E account (Source).
What will happen once OpenAI opens DALL·E 2?
But why does the clash between DALL·E 2 and DALL·E mini matter to you?
It seems DALL·E mini may have its days numbered.
A week ago, OpenAI sent a survey to some of the people who have access to DALL·E 2 asking them about pricing. In particular, they wanted to know, on a price-per-prompt basis, what would be too expensive, reasonable, or a bargain.
Sam Altman, OpenAI’s CEO, said on the day of the DALL·E 2 announcement that “hopefully this summer, we’ll do a product launch and people will be able to use it for all sorts of things.” It’s a matter of days before anyone (willing to pay) can play with the original — and significantly better — version of DALL·E mini.
Which service will succeed this time? A high-quality non-free DALL·E 2 or a lower-quality free DALL·E mini?
I’m quite sure OpenAI won’t set a “bargain” price, so it’ll be around the mean of the “reasonable price” responses in the survey. Let’s imagine OpenAI chooses $0.10/prompt. This price may be low enough for OpenAI to keep its hegemony in the text-to-image generators market.
In this case, playing with the model a few times a day is virtually costless. However, artists, illustrators, professional designers, and also researchers, may find themselves reaching the dozens-of-dollars-per-day zone.
It’s habitual for AI generations to not come out the way we want in the first attempts. Precise prompt engineering and experimentation is often required to polish the images. A word change here, a new style there. In a $/prompt model, this soon could become unbearable.
However, it’s also possible they’ll decide to set up a tiered subscription model. Free, intermediate, and professional, for instance. That would completely change the current landscape (although it seems unlikely because they like the pay-as-you-go model as it better correlates with GPU-usage costs).
Anyway, what would happen to the open-source lower-quality alternatives like DALL·E mini, RuDALL·E, DALL·E flow, or the many collab notebooks like Majesty Diffusion or Disco Diffusion?
Will OpenAI effectively build a monopoly the same way they did with GPT-3? The only equally-good alternatives to DALL·E 2, Imagen and Parti, are also private. Both are owned by Google and I’m confident the company won’t give access to outsiders.
I think some people may continue to use DALL·E mini, but the lack of availability of DALL·E 2 was a key ingredient in its immediate success. Open-source alternatives will remain for edge cases in which people want to create images with those particular styles.
But, for most cases, DALL·E 2 will be the go-to option. It will most likely dominate the text-to-image landscape for a few months until an initiative like BigScience/BLOOM emerges again to build a collaborative, open-source, open-science, DALL·E 2 based on ethical values and principles.
I’ll celebrate that day the same way I did when BLOOM reached the end of training. It’ll be a good day for the AI community and democratic AI.
Why is all this important?
Open-source always finds a way to beat private alternatives, even if those are more powerful or higher-quality (Both DALL·E Mini and BLOOM aren’t near the best in their respective categories). The very nature of LLMs or text-to-image foundations makes them very easily replicable. The main bottleneck is money to acquire computational resources.
In supporting public, collective initiatives like BigScience, Hugging Face, or EleutherAI, we may indirectly help provide a better future for everyone through democratized access to these systems — which are for sure impacting the future in ways we are just beginning to grasp.
You and I will soon use products and services powered by models like DALL·E 2 or GPT-3. Whether these services will remain accessible only for people with money to spare or become commoditized through open-source pressure, it’s still to be seen. The fight is happening now.