Saturday, November 26

This is how the Stable Diffusion AI works, step by step, when creating images from text


Before, if you wanted to create a digital image, you had to know how to draw and use tools like Photoshop. However, as of 2022 everything has changed, and all thanks to AI and tools like Stable Diffusion. Let’s see how it works.

The generation of images by the AI is the most recent ability of the AI ​​that is leaving people speechless. The ability to create stunning images from text descriptions has a magical quality to it and clearly points to a change in the way humans create art.

Stable Diffusion, very specifically, is a model of machine learning open source that can generate images from a textmodify images based on a text or fill details in images low resolution or with few details.

It has been trained on billions of images and can produce results comparable to those of DALL-E 2 and MidJourney. It has been developed by Stability AI and it was first publicly released on August 22, 2022.

Stable Diffusion doesn’t have a user interface (yet) like some AI imagers, but it does have a very permissive license, and best of all, it’s completely free to use on your own PC or Mac. is a clear milestone in this development because made a high-performance imaging model available to the masses.

Breaking down how Stable Diffusion works (image from text)

Thanks to Jay Alammar, an expert in machine learning, we are going to delve into the operation of this curious tool. Note that we will focus on how this tool generates an image by entering a textwhich can be from a phrase to a simple word (other images can also be entered).

Also Read  Drone delivery has been fighting for years to become a reality. This is how the sector wants to finally achieve it

First of all, let’s look under the hood and see that This tool is made up of various components and models (blue, pink and yellow).

  • ClipText encoder for text encoding.
  • Image information creator to process information step by step.
  • Decoder that paints the final image.

On the one hand, and if we talk about text-based image generation, we find a component that is responsible for translating that text into numbers, a text encoder called CLIPtext (Step 1).

Briefly, this model takes the input text and produces a list of numbers (a vector) that represents each word of the text (encodes it and generates what is known as noise).

Behind this, information is passed through the imager in two stages (named as Image Generator in the image we show you, steps 2 and 3):

  • The word “diffusion” perfectly describes what happens in this component (pink). Is he step-by-step information processingleading to the final generation of a high-quality image.

In this process comes into play UNet neural network and a programming algorithm which are responsible for grouping (eliminating noise) what was previously translated into a matrix of processed information (Step 2). This it is produced in different steps, in which more and more information is added and more noise is eliminated.

  • On the other hand, the image decoder creates the image from the information (array) it got from the Image Information Creator. It is executed only once at the end of the process to produce the final image. Basically is in charge of painting the image (red, blue and green) and gives it some dimensional parameters (width and height). It makes an image emerge from all the noise (Step 3).
Also Read  Process mining: How many euros does your company hide?

we leave you an example made by us, Taking the same phrase, so that you can see how the creations are not really fixed and vary from one user to another, apart from the fact that the options that it presents are multitudes so that you can choose the one you like the most.

Is it still art if the image is generated through the use of artificial intelligence?

The great dilemma that we are currently experiencing, as usually happens whenever a new digital tool appears that makes life easier for us, it is if we are losing our essence as a creative human being. And yes, it seems that there is little merit in what is generated by a machine, but someone has had to be behind devising it and shaping it (neural networks).

Some artists, such as Ryan Murdoch, have argued for the recognition of stimulus-based imagery as art. He points to experienced AI artist Helena Sarin as an example, and it certainly wouldn’t be a bad first step.

Dall-E 2

This is how DALL-E 2 works, the AI ​​that draws what you tell it in writing

Recently, the US Copyright Office has granted the first known copyright for an AI-generated image to a New York artist named Kris Kashtanova.

Of course, for or against Artificial Intelligence in general and, above all, these new tools, are posing a series of quite worrying ethical and legal dilemmas, but The art that resides in both creations must be clear.

Leave a Reply

Your email address will not be published. Required fields are marked *