SaladCloud Blog

INSIDE SALAD

Optimizing AI Image Generation: Streamlining Stable Diffusion with ControlNet in Containerized Environments

Shawn Rushefsky

Implementing Stable Diffusion with ControlNet in any containerized environment comes with a plethora of challenges, mainly due to the sizable amount of additional model weights required to be incorporated into the image. In this blog, we discuss these challenges and how they can be mitigated.

What is ControlNet for Stable Diffusion?

ControlNet is a network structure that empowers users to manage diffusion models by setting extra conditions. It gives users immense control over the images they generate using Stable Diffusion, using a single reference image without noticeably inflating VRAM requirements. ControlNet has revolutionized AI image generation, demonstrating a key advantage of open models like Stable Diffusion against proprietary competitors such as Midjourney.

ControlNet for Stable Diffusion: Reference Image (left), Depth-Midas (middle), Output Image (right)

Note how the image composition remains the same between the reference image and the final image. This is accomplished by using the depth map as a control image. Images from Dreamup.ai using the Dreamshaper modelMiDas Depth Estimation, and the depth controlnet.

Challenges in Implementing ControlNet for Stable Diffusion

However, this remarkable feature presents challenges. There are (at the time of this writing) 14 distinct controlnets compatible with stable diffusion, each offering unique control over the output, necessitating a different “control image.”

All these models, freely accessible on Huggingface, in separate repositories, amount to roughly 4.3GB each, totaling up to an additional storage need of 60.8GB. This represents a nearly tenfold increase compared to a minimal Stable Diffusion image.

Furthermore, each ControlNet model comes with one or more image preprocessors used to decipher the “control image.” For instance, generating a depth map from an image is a prerequisite for one of the ControlNets. These additional model weights further bloat the total VRAM requirement, exceeding the capacity of commonly used graphics cards like the RTX3060 12GB.

The basic approach to implementing ControlNet in a containerized environment.

Optimizing ControlNet Implementation in a Containerized Environment on a Distributed Cloud

Building a container image for continuous ControlNet stable diffusion inference, if approached without optimization in mind, can rapidly inflate in size. This results in prohibitively long cold-start times as the image is downloaded onto the server running it. This problem is prominent in data centers and becomes even more pronounced in a distributed cloud such as SaladCloud, which depends on residential internet connections of varying speed and reliability.

A two-pronged strategy can effectively address this issue. Firstly, isolate ControlNet annotation as a separate service or leverage a pre-built service like this one from Dreamup.ai. This division of labor not only reduces VRAM requirements and model storage in the stable diffusion service but also enhances efficiency when creating numerous output images from a single input image and prompt.

Secondly, rather than cloning entire repositories for each model, perform a shallow clone of the model repository without git lfs. Then, use wget to selectively download only the necessary weights, as exemplified by this Dockerfile from the Dreamup.ai Stable Diffusion Service. This tactic alone can save more than 40GB of storage.

Stable Diffusion-split-service-controlnet

The end result? Two services, both with manageable container image sizes. The ControlNet Preprocessor Service, inclusive of all annotator models, sizes up to 7.5GB and operates seamlessly on an RTX3060 12GB.

The Stable Diffusion Service, even with every ControlNet packaged in, comes up to 21.1GB and also runs smoothly on an RTX3060 12GB. Further reductions can be achieved by tailoring what ControlNets will support. For instance, Dreamup.ai excludes MLSD, Shuffle, and Segmentation ControlNets in their production image, thereby saving about 4GB of storage.

Have questions about enterprise pricing for SaladCloud?

Book a 15 min call with our team.

Related Blog Posts

Stable diffusion 1.5 benchmark on SaladCloud

Stable diffusion 1.5 benchmark: 14,000+ images per dollar on SaladCloud

Stable diffusion 1.5 benchmark on consumer GPUs Since our last stable diffusion benchmark nearly a year ago, a lot has changed. While we previously used SD.Next for inference, ComfyUI has...
Read More
Stable diffusion XL (SDXL) GPU benchmark on SaladCloud

Stable Diffusion XL (SDXL) benchmark: 3405 images per dollar on SaladCloud

Stable Diffusion XL (SDXL) benchmark on 3 RTX GPUs Since our last SDXL benchmark nearly a year ago, a lot has changed. Community adoption of SDXL has increased significantly, and...
Read More
Flux.1 schnell benchmark for image generation

Flux.1 Schnell benchmark: 5243 images per dollar on SaladCloud

Introduction to Flux.1 - The new standard for image generation Flux.1 is a new series of models from Black Forest Labs that has set the new standard in quality and...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.