Stable Diffusion using Reference-only (attn & AdaIN) ControlNet with Z by HP and Google Colab

Chanran Kim
5 min readMay 29, 2023

--

Diffusion models are a type of generative model that can be used to create realistic images from text descriptions. They work by gradually adding noise to an image until it matches the description.

In this blog post, I will discuss a new method for guiding to generate images more stable. This method, called reference-only, uses a reference image to guide the ControlNet pre-process. This helps to guide the model to produce related to reference image.

I will also discuss how to use the reference-only method with the ControlNet model between the best edge Z by HP and the online method Google Colab.

ControlNet

> ControlNet is a neural network structure to control diffusion models by adding extra conditions. It copys the weights of neural network blocks into a “locked” copy and a “trainable” copy. The “trainable” one learns your condition. The “locked” one preserves your model. Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models. The “zero convolution” is 1×1 convolution with both weight and bias initialized as zeros. Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion. No layer is trained from scratch. You are still fine-tuning. Your original model is safe. This allows training on small-scale or even personal devices. This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.

Before and after using ControlNet

reference: https://github.com/lllyasviel/ControlNet

Using Reference-only Diffusion with ControlNet

reference-only preprocessor does not require any control models. It can guide the diffusion directly using images as references. This reference-only ControlNet can directly link the attention layers of your SD to any independent images, so that your SD will read arbitary images for reference. You need at least ControlNet 1.1.153 to use it.

ControlNet using Reference-only in webUI

reference: https://github.com/Mikubill/sd-webui-controlnet/discussions/1236

ref:https://www.reddit.com/r/StableDiffusion/comments/13h3jn7/controlnet_reference_only_test/

Generate Images using Reference-only with Google Colab

As mentioned above, you will need the latest version of ControlNet. For this, the webUI should also be prepared with the latest version. We experimented based on AUTOMATIC1111’s webUI code, and if you have already set the code, you need to upgrade the version through git pull.

ref: https://github.com/AUTOMATIC1111/stable-diffusion-webui

*There is an issue these days about using the webUI in Google Colab. You will have no trouble using it unless you have a paid subscription.

*Note that it can take a really long time to set up.

When you run the webUI, the following screen will appear. On the screen where txt2img is selected, select ControlNet below. If not installed, go to the extensions tab and install ControlNet. If the version is lower than specified above, perform an update via Check for updates.

Difference between Google Colab and Z4 (Z by HP) environment

If the settings are identical, there seems to be nothing to compare other than the time required to create an image. The test was conducted based on the webUI. The main difference between this experiment and other general experiments seems to be the presence or absence of reference-only controlnet application.

Performance comparison when creating images without applying ControlNet

Comparison of image generation speed when ControlNet is applied

Images created through experimentation

generated image with “ring with diamond”
generated image with “ring with diamond” prompt without reference-only ControlNet
generated image with “ring with diamond” prompt without reference-only ControlNet
a reference (input) image for the ControlNet
generated image with “ring with diamond” prompt with reference-only ControlNet

The creation quality is a bit lower than before, but it creates the feeling of being in a volcanic area like the reference image.

Config details

  • image size: 512x512
  • Sampling Method: Euler a
  • Sampling Steps: 20

In both cases, there is a big difference in the rate of production. It’s an obvious fact, but you can see that the Z4 with A6000 is very powerful than Google Colab. Even with Google Colab, unlike before, there is a limit to how many credits you can use even if you subscribe. If you exceed a certain amount, you must purchase additional credits.

Opinion

In the future, a society in which generative models will become more and more important will come. It will be difficult for AI to replace humans, but at least this fact can be said. “People who use AI will replace people who do not use AI.”

However, to make good use of this generation model, you need to know various tools and environments well, but the most important thing is to create it quickly in your own environment. In other words, the productivity of the generative model is linked to my productivity.

In order to increase the productivity of your work, we hope to utilize the generative model, and it will be more valuable when combined with powerful equipment. There will come a time when everyone needs a generative model of the individual, and I think it is highly likely that it will be done at the edge.

Conclusion

In this blog post, I have discussed a new method for guiding to generate images using diffusion models in the more stable way. This method, called reference-only, uses a reference image to guide the ControlNet process.

I have also discussed how to use local/edge equipment Z4 from Z by HP and Google Colab. I hope that this blog post has been informative and that you will find the reference-only method useful for your own projects and the Criteria for development/research environment selection.

--

--