TOP5 AI Art Generators of 2023 – Stable Diffusion

As 2023 kicked off, we witnessed transformative changes.

The advancement of technology is progressing faster than ever, but it’s in the realm of AI where we’re seeing truly astonishing developments. Among these, AI generators stand at the forefront of this innovation. They demonstrate impressive capabilities, ranging from automating simple tasks to suggesting creative ideas.

However, among the myriad of AI generators, there are a few that are particularly garnering attention. They’ve received rave reviews and high praise from users, and their emergence has led to significant shifts in traditional design paradigms and work processes.

In this article, we aim to delve into the top 5 groundbreaking AI generators that have been the focus of fervent discussions and interest regarding their reviews and usage. Readers can anticipate insights into the amazing abilities of these AI generators and hints towards future design trends.

Santa riding a bike

This is a comprehensive list of the best AI art generators.

I’ll introduce each one briefly.

TOP5 AI Generator : WebUI Stable Diffusion

As we step into a technologically advanced future, one AI model that has caught the attention of many is Stable Diffusion. Developed on the foundations of research titled “High-Resolution Image Synthesis using Latent Diffusion Models” from the Machine Vision & Learning Group (CompVis) at the University of Munich, Germany, this deep learning AI model was brought to life with the support of entities like Stability AI and Runway ML.

Behind the curtain, Stability AI was a passion project of Emad Mostaque, a Brit of Bangladeshi origin. The company generously offered its immense LAION-5B database, providing the computational resources needed to train Stable Diffusion. What sets it apart from its counterparts like OpenAI’s Dall-e 2 or Google’s Imagen is its efficiency. It’s optimized to such an extent that even computers with less than 4GB of VRAM can run it smoothly.

Impressively, despite the potential high development costs, it has been open-sourced, allowing enthusiasts and general users alike to tap into its potential. It’s fair to say that Stable Diffusion heralded the era of art-oriented AIs. The decision to make it open-source has triggered an explosion of AI image services built on its backbone, solidifying Stable Diffusion’s reputation as one of the most popular image-generation AIs.

One of its standout features is the ControlNet plugin, which enables pose specification. Leveraging base sketches resembling the line art from Openpose, users can refine poses using colored rods corresponding to body parts. Additionally, it integrates with several ControlNet-compliant auxiliary models like the Canny model, further enhancing its capabilities.

Model Architecture

image

At its core, Stable Diffusion is composed of three primary artificial neural networks: CLIP, UNet, and VAE (Variational Auto Encoder). When a user inputs text, the text encoder (CLIP) converts the user’s text into tokens, which are interpretable by UNet. UNet then uses these tokens to denoise randomly generated noise. As this denoising process is repeated, a coherent image is formed, and it’s the VAE’s role to convert this image into pixels.

Unlike traditional diffusion probability image generation models, which consume resources exponentially as the resolution increases, Stable Diffusion introduces autoencoders at both ends. Instead of working on the entire image, it inserts/removes noise in a much smaller dimensional latent space. This approach significantly reduces the amount of resources required, allowing the model to generate relatively high-resolution images even on standard household graphics cards.

How to Use Stable Diffusion?

When you access the web UI of Stable Diffusion, this is what you’ll see on the screen.

image 1

The usage is incredibly straightforward. Just input your desired keywords into the ‘Prompt’ and click ‘Generate’. If you have difficulty describing an object, you can make use of tools like Google Translate or Papago Translate.

For the ‘Negative prompt’, input any keywords you’d prefer not to appear in the results.

Here is an Example

  • Positive Prompt

woman, solo, 1girl, black hair, best quality, looking at viewer, proper eyes, upper body, dress, 4k, highly detailed, photorealistic, ultra realistc, sunlight

  • Negative Prompt

(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), EasyNegative, (worst quality, low quality:1.6),lowres,low quality anatomy,ugly,more than two people,split picture,split screen,two or more pictures,Ignoring prompts,individual screen,low quality four fingers and low quality thumb,complicated fingers,fingerless,extra limbs,signature,watermark,username,fat,Chubby,dark skin,ugly face,ugly nose,outline,low quality face,low quality eyes,polydactyly,low quality body,low quality ratio,broad shoulders,low detail clothes,cheekbone, animal, dolphin, nsfw, nipples, nude

sampling method

Stable Diffusion generates images based on the type of sampling. The difference between samplers isn’t significant. However, for subtle differences, you can change the sampling method.

Available models include:
Euler a, Euler, LMS, Heun, DPM2, DPM2 a, DPM++ 2S a, DPM++ 2M, DPM++ SDE, DPM fast, DPM adaptive, LMS Karras, DPM2 Karras, DPM2 a Karras, DPM++ 2S a Karras, DPM++ 2M Karras, DPM++ SDE Karras, DDIM, PLMS.

image 2

Sampling Steps

The higher the number of sampling steps, the higher the quality of the photo produced. Typically, 20-25 steps are used. Above 25 might produce slightly different images, but it doesn’t necessarily guarantee better quality. The higher this number, the longer it takes to generate the image.

Width and Height

This refers to the dimensions of the image. Higher values use more resources. Common formats are 512×512 and 768x768px. As of the current update, a 1024×1024 format is also supported.

Batch Count (Number of Images)

This refers to the number of images generated in one batch. Since it executes sequentially, it doesn’t affect speed or VRAM.

Batch Size (Number of Batches)

This option sets how many batches to process in parallel. The higher the number, the faster it is, but it uses more VRAM.

So, the final number of images produced equals batch count multiplied by batch size. If your graphics card memory is limited, increase the batch count. If there’s plenty of memory, you can increase the batch size.

Still, image creation takes time. Find the optimal balance by experimenting. In my case, I start with 1, then when I find the desired image style, I increase both batch count and batch size to 2.

CFG Scale

This pertains to the freedom allowed during image creation. A higher CFG results in a higher probability of producing results different from the prompt. The initial value is set to 7. Higher values create more varied images.

Seed

This is a number that influences the creation of the image. Using the same prompt, settings, and seed will always produce the exact same image. Setting it to -1 means it will generate a different value each time.

ControlNet

This is a technology where you can input a structure into an image, allowing the image to transform according to that structure.

Try generating an image. Let’s see what it creates!

Write ‘Lion’.

How does it look? Did it turn out well?”

image 4
image 5

I’m watching you Grrrrr…!!

Even with just one word, you can see such an amazing result with Stable Diffusion..!! Try it out right now.

Next time, I’ll explain about the second in the Top5, Midjourney~!”

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart