Are you enjoying your day?
Today, I’d like to give a brief explanation about ComfyUI, which is currently the most widely used web UI for stable diffusion. It’s a bit tricky to use at first, but once you get it set up the way you want, it’s quite convenient. Coincidentally, there’s an introductory article on ComfyUI at stable-diffusion-art.com, so I decided to summarize it.
1. What is ComfyUI?
ComfyUI is a node-based graphical user interface (GUI) designed for stable diffusion. It utilizes various node-shaped boxes which users can connect to establish an image generation workflow. Though new users might find the node-based approach unfamiliar, those in the 3D industry may recognize similarities to node-based material creation, offering a sense of familiarity. (Remember, our articles are open for everyone to read!)
The most popular web GUI for Stable Diffusion to date is AUTOMATIC1111. But why opt for ComfyUI over AUTOMATIC1111? Let’s delve into the reasons.
Advantages of Using ComfyUI:
- Lightweight : It operates quickly.
- Flexible : Allows for extensive customization.
- Transparent : Users can see the data flow directly.
- Shareable : Workflows are saved in the generated files, making them easy to share (as JSON files).
- Prototyping : Facilitates prototype creation with a graphic interface rather than coding.
Disadvantages of Using ComfyUI:
- Lack of Interface Consistency : Workflows may require different node arrangements, and users must figure out what needs changing.
- Overly Detailed : The average user doesn’t need to understand the flow, just that the image generation works.
- Limited Inpainting Tools : For inpainting, external programs are necessary.
ComfyUI can be visually challenging for first-time users. Yet, it’s constantly under development, and with a multitude of expanding plugins, a little study can yield results beyond one’s expectations.
2. How to use ComfyUI (Txt2Img)
1) Installing ComfyUI
Installing ComfyUI is straightforward. To get started with ComfyUI, simply head over to the provided GitHub link and look for the “Direct link to download.” Click on it to download the file, and then unzip it in your desired directory. It’s as easy as that.
2) Download SDXL Models
For your information, ComfyUI is a new user interface that was distributed alongside the release of SDXL (WebUI Stable Diffusion XL). This UI allows you to create images using the new model called SDXL, which is different from the previously used SD 1 and SD 1.5. I will explain how to install and run SDXL and demonstrate its usage through examples.
SDXL, also known as Stable Diffusion XL, is a highly anticipated open-source generative AI model that was just recently released to the public by StabilityAI. It is an upgrade from previous versions of SD like 1.5, 2.0, and 2.1, and offers significant improvements in image quality, aesthetics, and versatility.
- SDXL 1.0 base Model Download Link –> Go to Download Link
- SDXL 1.0 refiner Model Download Link –> Go to Download Link
3) Access ComfyUI
Once you’ve installed ComfyUI and placed the model files in the correct paths, the next step is to launch the application. If you’re using a desktop with an Nvidia GPU, as in your case, you’ll need to start the run_nvidia_gpu.bat
batch file. This will open the familiar command prompt window and, after initializing, it will provide you with a URL to access the ComfyUI’s web interface, where you can start creating images with your installed models.
The interface presented in ComfyUI, once accessed through the browser, may appear as a sequence of connected boxes, each representing a different function or module within the user interface. Those who are encountering this setup for the first time may find it quite alien and complex, but for those who have experience with the WebUI of Automatic1111, the terminology will be familiar.
4) The basic UI configuration in ComfyUI
- Load Checkpoint
- CLIP Text Encode(Prompt)
- Empty Latent Image
- KSampler
- VAE Decode
- Save Image
Load Checkpoint:
This is the box used to load not only the SDXL model downloaded earlier but also all models used as a Checkpoint base model.
CLIP Text Encoder (Prompt):
CLIP is a neural network that understands text and images and helps guide the image generation process based on the text input you provide. Consider the boxes where you write Positive and Negative prompts in ComfyUI as analogous to those frequently used in Automatic1111.
Empty Latent Image:
The “Empty Latent Image” box is used to adjust the size of the image you wish to create. It’s akin to the image size field where you would enter 512×512 in Automatic1111, but in this case, it’s presented as a box. In a future post about upscaling, I’ll show you how to apply different boxes instead of the “Latent Image” one!
KSampler:
The KSampler is the core of any workflow and can be used to perform text to image and image to image generation tasks. The example below shows how to use the KSampler in an image to image task, by connecting a model, a positive and negative embedding, and a latent image. Note that we use a denoise value of less than 1.0. This way parts of the original image are preserved when it is noised up, guiding the denoising process to similar looking images.
VAE Decoder:
VAE stands for Variational Autoencoder, which is a component of the generative model that takes the sampled latent space representation and decodes it into a final image.
Save Image:
After the image is generated, this function allows you to save the output to your Computer.
5) ComfyUI Wrokflow
Now, I will quickly demonstrate how to use ComfyUI. As explained earlier, when you first start ComfyUI, the basic UI will appear, allowing you to start testing after a few steps. I assume that those who have used Automatic1111 will already have various models. Even if you don’t have any separate models, there’s no need to worry because you can test using the SDXL model provided earlier.
First, when you click on the model name displayed in the Load Checkpoint, you will see a list of models within the Model/Checkpoint folder as shown below. From here, you can select the base model you want to use. For our purposes, you should choose the SDXL BASE model that was downloaded earlier.
Next, you will fill in the CLIP box with the prompt values you want to use. The CLIP interface in ComfyUI usually consists of two sections: Positive Prompts and Negative Prompts.
Positive Prompt :
8k, highly detailed, photorealistic, high resolution, 1 girl, looking at viewer, simple background
Negative Prompt :
(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), (worst quality, low quality:1.6), nude
When you right-click on the corresponding box in ComfyUI, you will have the option to change the color attribute of the box. This feature allows you to set Positive prompts to green and Negative prompts to red, providing a visual cue to differentiate them easily. Feel free to customize the color coding to your preference for a more personalized experience. Additionally, you can rename the boxes to reflect their contents or to organize your workflow better. This customization can be particularly useful if you’re handling multiple prompts or if you want to keep your creative process more structured.
Now we will change the Latent Image value to 1024 x 1024. Since SDXL is a higher version than SD1 and SD1.5, it has been trained on a 1024 x 1024 image basis, allowing for the creation of clear images without the need for upscaling.
Finally, if you leave the settings for KSampler and VAE as they are and click the “Queue Prompt” button on the right, green lines will sequentially appear around the box outlines, showing the order of processing, and after the process, a single image that has been created will appear in the “Save Image” box.
It seems like you’re having a great experience with the SDXL model and its ability to create beautiful images. Now you’re looking to apply the Barbarian Warrior style LoRA, provided by A2SET for free, to the same prompt values to see how it changes the results.
You can download “Barbarian Warrior Woman Model” by below Link.
By double-clicking an empty space to bring up a search dialog, typing “LoRA”, and selecting a LoRA Loader, you can add it to your canvas as a new box. Then, similar to how you selected a Checkpoint model previously, you would choose a LoRA model.
Finally, you’d reconnect the boxes to include the LoRA Loader in the workflow, ensuring that the data flow is correctly established to apply the LoRA adjustments to your generated images.
Now, shall we press the Queue Prompt button and wait? For reference, the Barbarian Model is data trained on the face of a woman with braided hair and a slightly wild beauty.
Tip: When applying LoRA, you need to write the trigger prompt for that LoRA in the Positive Prompt to ensure it is 100% applied.
It’s applied in the same way you experienced with Automatic1111 in the WebUI Stable Diffusion. The trigger prompt for the Barbarian Woman Model we downloaded is ‘BWwarrior woman’ and you can write like <BWwarrior woman:1>, so please make sure to write it down and use it. Now, shall we press the Queue Prompt button and wait? For reference, the Barbarian Model is data trained on the face of a woman with braided hair and a slightly wild beauty. Let’s see if that face is applied!
Trigger Prompt : BWwarrior woman or <BWwarrior woman:1>
Wow~! Can you see that a warrior appears just by connecting the LoRA data and writing only one trigger prompt?
Please note that this LoRA data is exclusive to SDXL, so you can only use it with the SDXL base model!
Looking at this picture, I really want to create a bunch of noble-looking barbarian warriors, haha. Everyone, try to create your own artwork and wonderful results with various models other than this one!
I’ll give you another great tip for perfect compatibility with SDXL. There is a box specifically for SDXL within the CLIPTextEncode that allows for application.
When applied as below, I’ll give you a taste of what kind of results come out 🙂
I personally really love the node method… It’s amazing how you can create such a variety of workflows through this node method.
So today, we’ve learned about Txt2Img with ComfyUI. I’m sure everyone has felt that it’s not as complicated as it looks. Of course, you can create even more detailed outputs by adjusting each setting one by one, so let’s slowly learn about each one together!
Next time, I’ll explain the features of ComfyUI in detail!