inference Archives - SaladCloud Blog

Blend cuts AI inference cost by 85% on Salad while running 3X more scale

Startups often have a unique origin story. Blend is no different. Their origins lie in a Whatsapp channel and a local fireworks seller who printed 2000 physical posters for a peculiar reason. Blend is an AI copilot for e-commerce that helps sellers create professional product photos and designs in two clicks without hiring an agency. Their mission is to help entrepreneurs & small sellers grow sales online with compelling social graphics, product photos and SEO optimized copy. Today, Blend serves thousands of sellers generating around 6000 images every hour on Salad’s distributed network. In this chat with Jamsheed Kamardeen, Chief Technology Officer (CTO) at Blend, we discuss their growth, the switch to a distributed cloud, inference cost & more. How did the idea for Blend come about? It was during Covid-19. We were in many Whatsapp groups looking for common problems faced by e-commerce sellers and we found a peculiar thing. There were many coaching sessions on how to use photo and design apps. Turns out, many of the sellers didn’t have a design team on their own but they needed to promote their products on social media, create posters, ads and the like. In fact, one of my cousins had a fireworks shop in a small village in India and 70% of his sales came from posters on Whatsapp & Instagram. He’d go to a local printing shop and use the designer there to create & print out 2000 posters just so he could get a soft copy of the design for ads/promotions. He didn’t even bother distributing the physical posters. So looking at the challenges of local sellers in promoting their products in a digital world – that’s where the idea came from. Photo editing and design apps have been out there for a long time, right? Weren’t they sufficient? Yes. There were a lot of apps & horizontal design tools which were good for designers. But the sellers aren’t designers. Plus there is the paradox of choice. Most tools had 100s of templates, colors, etc and needed a significant time commitment. Plus, often the sellers ended up with a design that looked terrible because they tweaked too much or too little. So we decided to create Blend to offload the design decision making. Just upload a picture of the product and tell us what you want the offer to be. We’ll remove the background, put the product in appropriate settings and deliver a design with text. Our goal was always to get them the final design in the fewest clicks possible. Today, you have Millions of downloads for your app. How crucial was the arrival of Generative AI in your user growth? Our initial version included background removal and adding in an appropriate background with some other features. But generative AI completely changed the game for us. For example, if a shoe store wants to do a 25% off Diwali promotion, all they had to do was upload the product photo and describe the offer and event. With Generative AI & Stable Diffusion models, we can identify it’s a shoe, have LLMs make the decision on what to paint & such, create an aesthetically pleasing urban background, automatically create appropriate text with the right color scheme and deliver the copy. All it takes is a couple of clicks. This is what led to our massive user growth. Today, 40% of our users are individual sellers, so we are introducing a separate web app for them as well. With big growth comes big cloud bills. That must have been the case for Blend as well. What infrastructure challenges did you face here? Right. Since we are an AI first design company, inference became our biggest cost factor. Plus, we needed powerful GPUs to power Diffusion models. Sourcing GPUs to keep up with surge in demand quickly became a nightmare. The existing providers didn’t have the right options for a company like us. AWS only had multi cluster A100s but there was no single cluster A100 option. GCP or Azure had them but they were expensive. So we started looking for alternatives. We found a local provider who offered A100s for a cheaper price. But that came with reliability & scalability issues. We didn’t always have enough GPUs during times of higher traffic. I started losing a lot of sleep over this GPU shortage. We’re a small team, so when the machines go down, my sleep goes away. So again, we were looking for an alternative. That’s when we found Salad. How has switching to SaladCloud impacted your cost and scaling? When we switched from the hyperscalers to A100s with a local provider, we didn’t really think the cost could go any lower. But switching to Salad was eye-opening. On Salad’s consumer GPUs, we are running 3X more scale at half the cost of A100s on our local provider and almost 85% less cost than the two major hyperscalers we were using before. Plus Salad is much more reliable. We’ve migrated all current and new workloads to Salad. I’m not losing sleep over scaling issues anymore. On Salad’s consumer GPUs, we are running 3X more scale at half the cost of A100s on our local provider and almost 85% less cost than the two major hyperscalers we were using before. I’m not losing sleep over scaling issues anymore. Jamsheed Kamardeen, Chief Technology Officer (CTO) at Blend As a CTO, making the switch to a distributed cloud is a huge decision. What was the decision making process? That’s a good question. I was very skeptical initially about the reliability of Salad. From a technical standpoint, my major question was this: Compared to data centers with reliable internet, how am I going to have reliable workloads on random people’s computers on a distributed cloud? We needed to implement some solutions to make reliability strong but it wasn’t as difficult as I initially perceived it to be. One thing that helped us was the engineering support offered by Salad which made

Stable Diffusion XL (SDXL) Benchmark – 769 Images Per Dollar on Salad

Stable Diffusion XL (SDXL) Benchmark A couple months back, we showed you how to get almost 5000 images per dollar with Stable Diffusion 1.5. Now, with the release of Stable Diffusion XL, we’re fielding a lot of questions regarding the potential of consumer GPUs for serving SDXL inference at scale. The answer from our Stable Diffusion XL (SDXL) Benchmark: a resounding yes. In this benchmark, we generated 60.6k hi-res images with randomized prompts, on 39 nodes equipped with RTX 3090 and RTX 4090 GPUs. We saw an average image generation time of 15.60s, at a per-image cost of $0.0013. At 769 SDXL images per dollar, consumer GPUs on Salad’s distributed cloud are still the best bang for your buck for AI image generation, even when enabling no optimizations on Salad, and all optimizations on AWS. Architecture We used an inference container based on SDNext, along with a custom worker written in Typescript that implemented the job processing pipeline. The worker used HTTP to communicate with both the SDNext container and with our batch framework. Our simple batch processing framework comprises: Discover our open-source code for a deeper dive: Deployment on Salad We set up a container group targeting nodes with 4 vCPUs, 32GB of RAM, and GPUs with 24GB of vram, which includes the RTX 3090, 3090 ti, and 4090. We filled a queue with randomized prompts in the following format: We used ChatGPT to generate roughly 100 options for each variable in the prompt, and queued up jobs with 4 images per prompt. SDXL is composed of two models, a base and a refiner. We generated each image at 1216 x 896 resolution, using the base model for 20 steps, and the refiner model for 15 steps. You can see the exact settings we sent to the SDNext API. Results – 60,600 Images for $79 Over the benchmark period, we generated more than 60k images, uploading more than 90GB of content to our S3 bucket, incurring only $79 in charges from Salad, which is far less expensive than using an A10g on AWS, and orders of magnitude cheaper than fully managed services like the Stability API. We did see slower image generation times on consumer GPUs than on datacenter GPUs, but the cost differences give Salad the edge. While an optimized model on an A100 did provide the best image generation time, it was by far the most expensive per image of all methods evaluated. Grab a fork and see all the salads we made here on our GitHub page. Future Improvements For comparison with AWS, we gave them several advantages that we did not implement in the container we ran on Salad. In particular, torch.compile isn’t practical on Salad, because it adds 40+ minutes to the container’s start time, and Salad’s nodes are ephemeral. However, such a long start time might be an acceptable tradeoff in a datacenter context with dedicated nodes that can be expected to stay up for a very long time, so we did use torch.compile on AWS. Additionally, we used the default fp32 variational autoencoder (vae) in our salad worker, and an fp16 vae in our AWS worker, giving another performance edge to the legacy cloud provider. Unlike re-compiling the model at start time, including an alternate vae is something that would be practical to do on Salad, and is an optimization we would pursue in future projects. Salad Cloud – Still The Best Value for AI/ML Inference at Scale SaladCloud remains the most cost-effective platform for AI/ML inference at scale. The recent benchmarking of Stable Diffusion XL further highlights the competitive edge this distributed cloud platform offers, even as models get larger and more demanding. Shawn RushefskyShawn Rushefsky is a passionate technologist and systems thinker with deep experience across a number of stacks. As Generative AI Solutions Architect at Salad, Shawn designs resilient and scalable generative ai systems to run on our distributed GPU cloud. He is also the founder of Dreamup.ai, an AI image generation tool that donates 30% of its proceeds to artists.

Whisper Large Inference Benchmark: 137 Days of Audio Transcribed in 15 Hours for Just $117

Save Over 99% On Audio Transcription Using Whisper-Large-v2 and Consumer GPUs Harnessing the power of OpenAI’s Whisper Large V2, an automatic speech recognition model, we’ve dramatically reduced audio transcription costs and time. Here’s a deep dive into our benchmark against the substantial English CommonVoice dataset and how we achieved a 99.1% cost reduction. A Costly Comparison Traditionally, utilizing a managed service like AWS Transcribe would set you back about $10,500 for transcribing the entirety of the English CommonVoice dataset. Using a custom model? That’s an even steeper $13,134. In contrast, our approach using Whisper on SaladCloud incurred just $117, achieving the same result. Behind The Scenes: Our Architecture Our simple batch processing framework comprises: We wanted to keep the framework components fully managed and serverless, to provide as close of an analogue as possible to using managed transcription services. The framework itself incurred a cost of $28 during transcription, mainly due to S3 costs associated with uploading and downloading millions of files. This amount does not include any costs from the node pool. Discover our open-source code for a deeper dive: Deployment on SaladCloud With our inference container and services ready, we leveraged SaladCloud’s Public API. We used the API to deploy 2 identical container groups with 100 replicas each, all using the modest RTX 3060 with only 12GB of vRAM. We filled the job queue with urls to the 2.2 million audio clips included in the dataset, and hit start on our container groups. Our tasks were completed in a mere 15 hours, incurring $89 in costs from Salad, and $28 in costs from our batch framework. Performance Comparison of Whisper-Large-v2 Across Different Clouds The result? An average transcription rate of one hour of audio every 16.47 seconds, translating to an impressive $0.00059 per audio minute. Notably, SaladCloud’s cost-performance ratio dramatically outshined major competitors, even when deploying custom models. It’s worth noting AWS Transcript’s billing structure can greatly inflate costs for shorter audio clips (which comprise most of the CommonVoice corpus), a setback not encountered on per-second billing platforms, and their cost-performance would likely improve somewhat when transcribing longer content. We tried to set up an apples-to-apples comparison by running our same batch inference architecture on AWS ECS…but we couldn’t get any GPUs. The GPU shortage strikes again. Optimizing Further While our benchmark results are already quite compelling, there are areas we’ve identified for potential performance enhancements: By integrating these process improvements, we anticipate that the overall transcription throughput could see an enhancement of 20-50% on this dataset. This would not only reduce processing time but also lead to even more significant cost savings, maximizing the efficiency of this approach. SaladCloud: The Most Affordable GPU Cloud for AI Audio Transcription For startups and developers eyeing cost-effective, powerful GPU solutions, SaladCloud is a game changer. Boasting the market’s most competitive GPU prices, it offers a solution to sky-high cloud bills and limited GPU availability. In an era where cost-efficiency and performance are paramount, leveraging the right tools and architecture can make all the difference. Our Whisper Large Inference Benchmark is a testament to the savings and efficiency achievable with innovative approaches. We invite developers and startups to explore our open-source resources and discover the potential for themselves. Shawn RushefskyShawn Rushefsky is a passionate technologist and systems thinker with deep experience across a number of stacks. As Generative AI Solutions Architect at Salad, Shawn designs resilient and scalable generative ai systems to run on our distributed GPU cloud. He is also the founder of Dreamup.ai, an AI image generation tool that donates 30% of its proceeds to artists.

Stable Diffusion v1.4 Inference Benchmark – GPUs & Clouds Compared

Stable Diffusion v1.4 GPU Benchmark – Inference Stable Diffusion v1.4 is an impressive text-to-image diffusion model developed by stability.ai. By utilizing the principles of diffusion processes, Stable Diffusion v1.4 produces visually appealing and coherent images that accurately depict the given input text. Its stable and reliable performance makes it a valuable asset for applications such as visual storytelling, content creation, and artistic expression. In this benchmark, we evaluate the inference performance of Stable Diffusion 1.4 on different compute clouds and GPUs. Our goal is to answer a few key questions that developers ask when deploying a stable diffusion model to production: Benchmark Parameters For the benchmark, we compared consumer-grade, mid-range GPUs on two community clouds – SaladCloud and Runpod with higher-end GPUs on three big-box cloud providers. To deploy on SaladCloud, we used the 1-click deployment for Stable Diffusion (SD) v1.4 on the Salad Portal via pre-built recipes. Cloud providers considered: Google Cloud Platform (GCP), Amazon Web Services (AWS), Microsoft Azure Cloud, RunPod and SaladCloud. GPUs considered RTX 3060, RTX 3090, A100, V100, T4, RTX A5000 Link to model: https://huggingface.co/CompVis/stable-diffusion-v1-4 Prompt: ‘a bowl of salad in front of a computer’ The benchmark analysis uses a text prompt as input. Outputs were images in the 512×512 resolution with 50 inference steps as recommended in this HuggingFace blog. Image: A bowl of Salad in front of a computer – generated from the benchmark For the comparison, we focused on two main criteria: Images Per Dollar (Img/$) Training stable diffusion definitely needs high-end GPUs with high vRAM. But for inference, the more relevant metric is Images Per Dollar. There have been multiple instances of rapid user growth for a text-to-image platform either causing skyrocketing cloud bills or a mad scramble for GPUs. A high number of images generated per dollar means cloud costs are lower and generative AI companies can grow at scale in a profitable manner. Seconds Per Image (sec/img) The user base for SD-based image generation tools are vastly different when it comes to image generation time. In some cases, end-users expect images in under 5 seconds (Dall-e, Canva, Picfinder, etc). In others like Secta.ai, users expect results in a few minutes to hours. The image generation times can also vary for different pricing tiers. Free tier users can expect to wait a couple more seconds compared to users paying the highest price for access. Stable Diffusion GPU Benchmark – Results Image: Stable Diffusion benchmark results showing a comparison of images per dollar for different GPUs and clouds The benchmark results show the consumer-grade GPUs outperforming the high-end GPUs, giving more images per dollar with a comparable image generation time. For generative AI companies serving inference at scale, more images per dollar puts them on the path to profitable, scalable growth. Image: Stable Diffusion benchmark results showing a comparison of image generation time Some interesting observations from the benchmark: Deploying Stable Diffusion v1.4 on Salad Cloud Stable Diffusion v1.4 is available for 1-click deployment as a ‘Recipe’ on Salad Portal, accessible at https://portal.salad.com/. This recipe is accessible via an HTTP server, once the recipe has been deployed to Salad, you will be provided with a unique URL that can be used to access this model. In order to secure your recipe, all requests must include the Salad-Api-Key header with your individual Salad API Token that can be found in your account settings. Example API Request Parameters required prompt- Your prompt for Stable Diffusion to generate negativeprompt- Prompts for Stable Diffusion to not contain numinferencesteps- The number of steps to generate each image guidancescale- How close to the prompt your final image should be width- Width in pixels of your final image height- Height in pixels of your final image seed- The seed to generate your images from numimagesperprompt- The number of images to generate for your prompt PIPELINE- Which pipeline to use SCHEDULER- Which scheduler to use safetychecker: Enable or disable the NSFW filter on models, note some models may force this enabled anyway Example API Response Stable Diffusion XL 0.9 on consumer-grade GPUs The pace of development in the generative AI space has been tremendous. Stability.ai just announced SDXL 0.9, the most advanced development in the Stable Diffusion text-to-image suite of models. SDXL 0.9 produces massively improved image and composition detail over its predecessor. In the announcement, Stability.ai noted that SDXL 0.9 can be run on a modern consumer GPU with just 16GB RAM and a minimum of 8GB of vRAM. Chalk it up as another win for consumer-grade GPUs in the race to serve inference at scale.