SaladCloud Blog

INSIDE SALAD

GPU shortage isn’t the problem for Generative AI. GPU selection is.

Daniel Sarfati

Are we truly out of GPU compute power or are we just looking in the wrong places for the wrong type of GPUs? Recently, GPU shortage has been in the news everywhere. Just take a peek at the many articles on the topic here – The InformationIT BrewWall Street Journala16z. The explosive growth of generative AI has created a mad rush and long wait-times for AI-focused GPUs. For growing AI companies serving inference at scale, shortage of such GPUs is not the real problem. Selecting the right GPU is.

AI Inference Scalability and the “right-sized” GPU

Today’s ‘GPU shortage’ is really a function of inefficient usage and overpaying for GPUs that don’t align with the application’s needs for at-scale AI. The marketing machines at large cloud companies and hardware manufacturers have managed to convince developers that they ABSOLUTELY NEED the newest, most powerful hardware available in order to be a successful AI company.

The A100s and H100s – perfect for training and advanced models – certainly deserve the tremendous buzz for being the fastest, most advanced GPUs. But there aren’t enough of these GPUs around and when they are available, it requires pre-paying or having an existing contract.

A recent article by semianalysis has two points that confirm this:

  • Even OpenAI can’t get enough GPUs and this is severely bottlenecking its near-term roadmap. OpenAI cannot deploy its multi-modal models due to GPU shortages.
  • The highest-end Nvidia GPU, H100, will remain sold out until Q1 of next year, despite Nvidia’s attempts to increase production drastically.

Meanwhile, GPU benchmark data suggests that there are many use cases where you don’t need the newest, most powerful GPUs. Consumer-grade GPUs (RTX3090, A5000, RTX4090, etc.) not only have high availability but also deliver more inferences per dollar significantly reducing your cloud cost.

Selecting the “right sized” GPU at the right stage puts generative AI companies on the path to profitable, scalable growth, lower cloud costs and immune to ‘GPU shortages’.

How to Find the “Right Sized” GPU

When it comes to determining the “right sized” GPU for your application, there are several factors to consider. The first step is to evaluate the needs of your application at each stage of an AI model’s lifecycle. This means taking into account the varying compute, networking and storage requirements for tasks such as data preprocessing, training, and inference.

Training Models

During the training stage of machine learning models, it is common for large amounts of computational resources to be required. This includes the use of high-powered graphics processing units (GPUs) which can number from hundreds to thousands. These GPUs need to be connected through lightning-fast network connections in specially designed clusters to ensure that the machine learning models receive the necessary resources to train effectively. These specially designed clusters are optimized for the specific needs of machine learning and are capable of handling the intense computational demands required during the training stage.

Example: Training Stable Diffusion (Approximate Cost: $600k)

Serving Models (Inference)

When it comes to serving your model, scalability and throughput are particularly crucial. By carefully considering these factors, you can ensure that your infrastructure is able to accommodate the needs of your growing user base. This includes being mindful of both budgetary constraints and architectural considerations.

It’s worth noting that there are many examples in which the GPU requirements for serving inference are significantly lower than those for training. Despite this, many people continue to use the same GPUs for both tasks. This can lead to inefficiencies, as the hardware may not be optimized for the unique demands of each task. By taking the time to carefully assess your infrastructure needs and make necessary adjustments, you can ensure that your system is operating as efficiently and effectively as possible.

Example 1: 6X more images per dollar on consumer-grade GPUs

In a recent Stable Diffusion benchmark, consumer-grade GPUs generated 4X-8X more images per dollar compared to AI-focused GPUs. Most generative AI companies in the text-to-image space will be well served using consumer-grade GPUs for serving inference at scale. The economics and availability make them a winner for this use case.

Example 2: Serving Stable Diffusion SDXL

In the recent announcement introducing SDXL, Stability.ai noted that SDXL 0.9 can be run on a modern consumer GPU with just 16GB RAM and a minimum of 8GB of vRAM.

Stable Diffusion SDXL system requirements

Serving “Right Sized” AI Inference at Scale

At Salad, we understand the importance of being able to serve AI/ML inference at scale without breaking the bank. That’s why we’ve created a globally distributed network of consumer GPUs that are designed from the ground up to meet your needs. Our customers have found that turning to SaladCloud instead of relying on large cloud computing providers has allowed them to not only save up to 90% of their cloud cost, but also improve their product offerings and reduce their dev ops time.

Example: Generating 9M+ images in 24 hours for only $1872

In a recent benchmark for a customer, we generated 9.2 Million Stable Diffusion images in 24 hours for just $1872 – all on Nvidia’s 3000/4000 series GPUs. That’s ~5000 images per dollar leading to significant savings for this image generation company.

With SaladCloud, you won’t have to worry about costly infrastructure maintenance or unexpected downtime. If it works on your system, it works on Salad. Instead, you can focus on what really matters – serving your growing user base while remaining profitable.

To see if your use case is a fit for consumer-grade GPUs, contact our team today.

Have questions about SaladCloud for your workload?

Book a 15 min call with our team. Get $50 in testing credits.

Related Blog Posts

AI transcription - Parakeet TRT 1.1B batch transription compared against APIs

AI Transcription Benchmark: 1 Million Hours of Youtube Videos with Parakeet TDT 1.1B for Just $1260, a 1000-fold cost reduction 

Building upon the inference benchmark of Parakeet TDT 1.1B on SaladCloud and with our ongoing efforts to enhance the system architecture and implementation for batch jobs, we have achieved a 1000-fold...
Read More
Self-managed Openvoice vs Metavoice comparison: A Text to speech API alternative

Text-to-Speech (TTS) API Alternative: Self-Managed OpenVoice vs MetaVoice Comparison

A cost-effective alternative to Text-to-speech APIs In the realm of text-to-speech (TTS) technology, two open-source models have recently garnered everyone's attention: OpenVoice and MetaVoice. Each model has unique capabilities in...
Read More
Blog_Stable_diffusion_fine_tuning_api_service

Cost-effective Stable Diffusion fine tuning on Salad

Stable Diffusion XL (SDXL) fine tuning as a service I recently wrote a blog about fine tuning Stable Diffusion XL (SDXL) on interruptible GPUs at low cost, starring my dog...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.