Optimizing AI Image Generation: Streamlining Stable Diffusion with ControlNet in Containerized Environments

INSIDE SALAD

Optimizing AI Image Generation: Streamlining Stable Diffusion with ControlNet in Containerized Environments

Published: August 19, 2023

Shawn Rushefsky

Streamlining_stable_diffusion_with_controlnet on SaladCloud

Implementing Stable Diffusion with ControlNet in any containerized environment comes with a plethora of challenges, mainly due to the sizable amount of additional model weights required to be incorporated in the image. In this blog, we discuss these challenges and how they can be mitigated.

What is ControlNet for Stable Diffusion?

ControlNet is a network structure that empowers users to manage diffusion models by setting extra conditions. It gives users immense control over the images they generate using Stable Diffusion, using a single reference image without noticeably inflating VRAM requirements. ControlNet has revolutionized AI image generation, demonstrating a key advantage of open models like Stable Diffusion against proprietary competitors such as Midjourney.

ControlNet for Stable Diffusion: Reference Image (left), Depth-Midas (middle), Output Image (right)

Note how the image composition remains the same between the reference image and the final image. This is accomplished by using the depth map as a control image. Images from Dreamup.ai using the Dreamshaper model, MiDas Depth Estimation, and the depth controlnet.

Challenges in Implementing ControlNet for Stable Diffusion

However, this remarkable feature presents challenges. There are (at the time of this writing) 14 distinct controlnets compatible with stable diffusion, each offering unique control over the output, necessitating a different “control image.”

All these models, freely accessible on Huggingface, in separate repositories, amount to roughly 4.3GB each, totaling up to an additional storage need of 60.8GB. This represents a near tenfold increase compared to a minimal Stable Diffusion image.

Furthermore, each ControlNet model comes with one or more image preprocessors used to decipher the “control image.” For instance, generating a depth map from an image is a prerequisite for one of the ControlNets. These additional model weights further bloat the total VRAM requirement, exceeding the capacity of commonly used graphics cards like the RTX3060 12GB.

The basic approach to implementing ControlNet in a containerized environment.

Optimizing ControlNet Implementation in a Containerized Environment on a Distributed Cloud

Building a container image for continuous ControlNet stable diffusion inference, if approached without optimization in mind, can rapidly inflate in size. This results in prohibitively long cold-start times as the image is downloaded onto the server running it. This problem is prominent in data centers and becomes even more pronounced in a distributed cloud such as Salad, which depends on residential internet connections of varying speed and reliability.

A two-pronged strategy can effectively address this issue. Firstly, isolate ControlNet annotation as a separate service or leverage a pre-built service like this one from Dreamup.ai. This division of labor not only reduces VRAM requirements and model storage in the stable diffusion service but also enhances efficiency when creating numerous output images from a single input image and prompt.

Secondly, rather than cloning entire repositories for each model, perform a shallow clone of the model repository without git lfs. Then, use wget to selectively download only the necessary weights, as exemplified by this Dockerfile from the Dreamup.ai Stable Diffusion Service. This tactic alone can save more than 40GB of storage.

Stable Diffusion-split-service-controlnet

The end result? Two services, both with manageable container image sizes. The ControlNet Preprocessor Service, inclusive of all annotator models, sizes up to 7.5GB and operates seamlessly on an RTX3060 12GB.

The Stable Diffusion Service, even with every ControlNet packaged in, comes up to 21.1GB, and also runs smoothly on an RTX3060 12GB. Further reductions can be achieved by tailoring what ControlNets to support. For instance, Dreamup.ai excludes MLSD, Shuffle, and Segmentation ControlNets in their production image, thereby saving about 4GB of storage.

Shawn Rushefsky

Shawn Rushefsky is a passionate technologist and systems thinker with deep experience across a number of stacks. As Generative AI Solutions Architect at Salad, Shawn designs resilient and scalable generative ai systems to run on our distributed GPU cloud. He is also the founder of Dreamup.ai, an AI image generation tool that donates 30% of its proceeds to artists.

Have questions about SaladCloud for your workload?