SaladCloud Blog

INSIDE SALAD

Optimizing AI Image Generation: Streamlining Stable Diffusion with ControlNet in Containerized Environments

Shawn Rushefsky

Implementing Stable Diffusion with ControlNet in any containerized environment comes with a plethora of challenges, mainly due to the sizable amount of additional model weights required to be incorporated in the image. In this blog, we discuss these challenges and how they can be mitigated.

What is ControlNet for Stable Diffusion?

ControlNet is a network structure that empowers users to manage diffusion models by setting extra conditions. It gives users immense control over the images they generate using Stable Diffusion, using a single reference image without noticeably inflating VRAM requirements. ControlNet has revolutionized AI image generation, demonstrating a key advantage of open models like Stable Diffusion against proprietary competitors such as Midjourney.

ControlNet for Stable Diffusion: Reference Image (left), Depth-Midas (middle), Output Image (right)

Note how the image composition remains the same between the reference image and the final image. This is accomplished by using the depth map as a control image. Images from Dreamup.ai using the Dreamshaper modelMiDas Depth Estimation, and the depth controlnet.

Challenges in Implementing ControlNet for Stable Diffusion

However, this remarkable feature presents challenges. There are (at the time of this writing) 14 distinct controlnets compatible with stable diffusion, each offering unique control over the output, necessitating a different “control image.”

All these models, freely accessible on Huggingface, in separate repositories, amount to roughly 4.3GB each, totaling up to an additional storage need of 60.8GB. This represents a near tenfold increase compared to a minimal Stable Diffusion image.

Furthermore, each ControlNet model comes with one or more image preprocessors used to decipher the “control image.” For instance, generating a depth map from an image is a prerequisite for one of the ControlNets. These additional model weights further bloat the total VRAM requirement, exceeding the capacity of commonly used graphics cards like the RTX3060 12GB.

The basic approach to implementing ControlNet in a containerized environment.

Optimizing ControlNet Implementation in a Containerized Environment on a Distributed Cloud

Building a container image for continuous ControlNet stable diffusion inference, if approached without optimization in mind, can rapidly inflate in size. This results in prohibitively long cold-start times as the image is downloaded onto the server running it. This problem is prominent in data centers and becomes even more pronounced in a distributed cloud such as Salad, which depends on residential internet connections of varying speed and reliability.

A two-pronged strategy can effectively address this issue. Firstly, isolate ControlNet annotation as a separate service or leverage a pre-built service like this one from Dreamup.ai. This division of labor not only reduces VRAM requirements and model storage in the stable diffusion service but also enhances efficiency when creating numerous output images from a single input image and prompt.

Secondly, rather than cloning entire repositories for each model, perform a shallow clone of the model repository without git lfs. Then, use wget to selectively download only the necessary weights, as exemplified by this Dockerfile from the Dreamup.ai Stable Diffusion Service. This tactic alone can save more than 40GB of storage.

Stable Diffusion-split-service-controlnet

The end result? Two services, both with manageable container image sizes. The ControlNet Preprocessor Service, inclusive of all annotator models, sizes up to 7.5GB and operates seamlessly on an RTX3060 12GB.

The Stable Diffusion Service, even with every ControlNet packaged in, comes up to 21.1GB, and also runs smoothly on an RTX3060 12GB. Further reductions can be achieved by tailoring what ControlNets to support. For instance, Dreamup.ai excludes MLSD, Shuffle, and Segmentation ControlNets in their production image, thereby saving about 4GB of storage.

Have questions about SaladCloud for your workload?

Book a 15 min call with our team. Get $50 in testing credits.

Related Blog Posts

AI batch transcription of 1 million hours of video

AI Batch Transcription Benchmark: Transcribing 1 Million+ Hours of Videos in just 7 days for $1800

AI batch transcription benchmark: Speech-to-text at scale Building upon the inference benchmark of Parakeet TDT 1.1B for YouTube videos on SaladCloud and with our ongoing efforts to enhance the system...
Read More
Introducing SSAP: Migrate to Salad GPU Cloud easily

Introducing SSAP: Migrate to Salad GPU Cloud easily & save up to 80%

AI companies are overpaying for compute today Affordable, accessible compute is the defining challenge for many AI startups today. Recently, we’ve seen many news stories of innovative AI companies struggling...
Read More
AI transcription - Parakeet TRT 1.1B batch transription compared against APIs

AI Transcription Benchmark: 1 Million Hours of Youtube Videos with Parakeet TDT 1.1B for Just $1260, a 1000-fold cost reduction 

Building upon the inference benchmark of Parakeet TDT 1.1B on SaladCloud and with our ongoing efforts to enhance the system architecture and implementation for batch jobs, we have achieved a 1000-fold...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.