Blog 16

DALL-E 2
Gizem Baruk I 10.07.2022

The innovation DALL-E 2 is the new, revolutionary text to image generator from OpenAI.
This allows users to create images based on entered text. To do this, the generator uses artificial intelligence called GPT-3, which is able to understand the meaning of entered words (natural language inputs) and display them in images. The generator allows users to turn their own creative ideas into living images. DALL-E 2 can create images based on realistic objects or interpret text inputs that do not actually exist. For example, if you want to generate a realistic scene, this is no problem for DALL-E 2.

How does DALL-E 2 work?

The DALL-E 2 generator works on the basis of natural language processing and artificial intelligence to convert the information from a text into a multitude of images. Through deep learning, it is taught which connections it has to make in order to generate the final product. For this learning process, it uses the existing technology of CLIP (Constrastive Language-Image Pre-training). CLIP manages to find suitable text descriptions for an image based on text-image pairs on the Internet. Dalle-E 2 consists of the following two stages:

The first step is to create the AI training process. In this case, CLIP is used to encode text-image pairs and create a so-called latent code.
The text is then converted into a new image. The latent code of the text-image pairs is taken and sent through a so-called prior.
To create variations of the image that match the text, the Generator Decoder is then used. The following steps are used to create a new image variation:
1. First, the text is entered into the text encoder. This is trained by the CLIP model to encrypt the text-image pair.
2. The prior establishes the connection between the CLIP text and the CLIP image, which reflects the information from the text.
3. Finally, the decoder is used to generate new image variations that visually represent the entered text. This allows a variety of different images to be created using different text inputs.



Share by: