1. What is DALL-E?
Dall-E is a generative AI technology that enables users to create new images with text to graphics prompts. Functionally, Dall-E is a neural network and is able to generate entirely new images in any number of different styles as specified by the user’s prompts.
The name Dall-E is an homage to the two different core themes of the technology, hinting at the goal of merging art and AI technology. The first part (DALL) is intended to be evocative of famous Spanish surreal artist Salvador Dali, while the second part (E) is related to the fictional Disney robot Wall-E. The combination of the two names reflects the abstract and somewhat surreal illustrative power of the technology, that is automated by a machine.
Thumbnails from OpenAI
Dall-E was developed by AI vendor OpenAI and first launched in January 2021. The technology uses deep learning models alongside the GPT-3 large language model as a base to understand natural language user prompts and generate new images.
Dall-E is an evolution of a concept that OpenAI first began to talk about in June 2020, originally called Image GPT, that was an initial attempt at demonstrating how a neural network can be used to create new high-quality images. With Dall-E, OpenAI was able to extend the initial concept of Image GPT, to enable users to generate new images with a text prompt, much like how GPT-3 can generate new text in response to natural language text prompts.
2. What is GPT?
GPT is a family of AI models built by OpenAI. It stands for Generative Pre-trained Transformer, which is basically a description of what the AI models do and how they work.
The GPT models are designed to generate human-like text in response to a prompt. Initially, these prompts had to be text-based, but the latest versions of GPT can also work with images. This allows GPT-based tools to do things like
- Answer questions in a conversational manner
- Edit content for tone, style, and grammar
- Summarize long passages of text
- Translate text to different languages
- Brainstorm ideas
GPT-4 Basic Question
The AI Generator that works most closely with this text-based GPT feature is none other than DALL-E.
3. DALL-E Principle
DALL-E operates by taking a text input and generating an image as output. This model is trained on a large dataset of image-text pairs, enabling it to learn the relationship between text and images. The model processes the input text using a series of transformer layers, then generates image pixels on a pixel-by-pixel basis. The model is trained to optimize a loss function that measures the difference between the generated image and the target image. This process is repeated millions of times until the model can produce high-quality images from text inputs.
It has a different vibe from the AI Image Generators I explained earlier, right?
We already have the third series of DALL-E released. From the traditional platform of downloading and installing the API, it’s now updated to generate images directly within Chat GPT. (I’ll explain about the original API method in a later post.)
Now, let me give you a brief explanation on how to use it!
4. How to use DALL-E3(Beta) in GPT
To start with, in order to use the updated features of GPT, especially the beta version of DALL-E, you need to subscribe to GPT4(Plus). While you can carry out most tasks through the Free Plan, if you want more detail and assistance for work, you really should subscribe to the Plus.
The pricing is as shown below.
If you are done with purchasing, Just click on “New Chat” on the left sidebar, and when you hover over GPT-4, you’ll see the following options:
- Default, which is the basic functionality of GPT
- Solution using “Browse” with Bing (Beta)
- DALL-E3 (Beta)
You should click on the third option, DALL-E3.
If you’ve followed the steps above, then all preparations are complete. Now, just type your desired prompt into the chatbox below.
For demonstration purposes, I’ve entered the word “Eagle”.
Just by entering the single word “Eagle”, it provided detailed sentence-style prompts depicting two different scenarios and compositions. Based on those prompts, it promises to create two distinct images!
It not only helps with the brainstorming but also does the work for you… What an impressive service!
The result was an artwork depicting an eagle. The image also comes with a text description explaining it.
While the current result is already impressive, I wanted to explore more beta features. To enhance the first image, I wrote a chat message asking to add a hoodie to the eagle.
The eagle was indeed reimagined with a hood while maintaining its original mood and tone. Clicking on the image, I noticed the learned image on the left and the written Prompt value used to create it on the right.
Having used various AI services, I’m once again amazed at the convenience and fun DALL-E provides, even just with its beta service.
As you can see, the usage is incredibly simple. Of course, you can create outputs tailored to your needs by composing more detailed and in-depth prompts.
In future posts, I’ll also introduce how to use the DALL-E API and how to utilize it through browse(Bing).
Next time, I’ll explain about the Third in the Top5, Deep AI.