gpu shortage Archives - SaladCloud Blog

The AI GPU Shortage: How Gaming PCs Offer a Solution and a Challenge

Reliability in Times of AI GPU Shortage In the world of cloud computing, leading providers have traditionally utilized expansive, state-of-the-art data centers to ensure top-tier reliability. These data centers, boasting redundant power supplies, cooling systems, and vast network infrastructures, often promise uptime figures ranging from 99.9% to 99.9999% – terms you might have heard as “Three Nines” to “Six Nines.” For those who have engaged with prominent cloud providers, these figures are seen as a gold standard of reliability. However, the cloud computing horizon is expanding. Harnessing the untapped potential of idle gaming PCs is not only a revolutionary departure from conventional models but also a timely response to the massive compute demands of burgeoning AI businesses. The ‘AI GPU shortage‘ is everywhere today as GPU-hungry businesses are fighting for affordable, scalable computational power. Leveraging gaming PCs, which are often equipped with high-performance GPUs, provides an innovative solution to meet these growing demands. While this fresh approach offers unparalleled GPU inference rates and wider accessibility, it also presents a unique set of reliability factors to consider. The decentralized nature of a system built on individual gaming PCs does introduce variability. A single gaming PC might typically offer reliability figures between 90-95% (1 to 1.5 nines). At first glance, this might seem significantly different from the high “nines” many are familiar with. However, it’s crucial to recognize that we’re comparing two different models. While an individual gaming PC might occasionally face challenges, from software issues to local power outages, the collective strength of the distributed system ensures redundancy and robustness on a larger scale. When exploring our cloud solutions, it’s essential to view reliability from a broader perspective. Instead of concentrating solely on the performance of individual nodes, we highlight the overall resilience of our distributed system. This approach offers a deeper insight into our next-generation cloud infrastructure, blending cost-efficiency with reliability in a transformative way, perfectly suited for the computational needs of modern AI-driven businesses and to solve the ongoing AI GPU shortage. Exploring the New Cloud Landscape Embracing Distributed Systems Unlike traditional centralized systems, distributed clouds, particularly those harnessing the power of gaming PCs, operate on a unique paradigm. Each node in this setup is a personal computer, potentially scattered across the globe, rather than being clustered in a singular data center. Navigating Reliability Differences Nodes based on gaming PCs might individually present a reliability range of 90-95%. Various elements influence this: Unpacking the Benefits of Distributed Systems Global Redundancy Amidst Climate Change The diverse geographical distribution of nodes (geo-redundancy) offers an inherent safeguard against the increasing unpredictability of climate change. As extreme weather events, natural disasters, and environmental challenges become more frequent, centralized data centers in vulnerable regions are at heightened risk of disruptions. However, with nodes spread across various parts of the world, the distributed cloud system ensures that if one region faces climate-induced challenges or outages, the remaining global network can compensate, maintaining continuous availability. This decentralized approach not only ensures business continuity in the face of environmental uncertainties but also underscores the importance of forward-thinking infrastructure planning in our changing world. Seamless Scalability Distributed systems are designed for effortless horizontal scaling. Integrating more nodes into a group is a straightforward process. Fortifying Against Localized Disruptions Understanding the resilience against localized disruptions is pivotal when appreciating the strengths of distributed systems. This is especially evident when juxtaposed against potential vulnerabilities of a centralized model, like relying solely on a specific AWS region such as US-East-1. Catering to AI’s Growing Demands Harnessing idle gaming PCs is not just innovative but also a strategic response to the escalating computational needs of emerging AI enterprises. As AI technologies advance, the quest for affordable, scalable computational power intensifies. Gaming PCs, often equipped with high-end GPUs, present an ingenious solution to this challenge. Achieving Lower Latency The vast geographic distribution of nodes means data can be processed or stored closer to end-users, potentially offering reduced latency for specific applications. Cost-Effective Solutions Tapping into the dormant resources of idle gaming PCs can lead to substantial cost savings compared to the expenses of maintaining dedicated data centers. The Collective Reliability Factor While individual nodes might have a reliability rate of 90-95%, the combined reliability of the entire system can be significantly higher, thanks to redundancy and the sheer number of nodes. Consider this analogy: Flipping a coin has a 50% chance of landing tails. But flipping two coins simultaneously reduces the probability of both landing tails to 25%. For three coins, it’s 12.5%, and so on. Applying this to our nodes, if each node has a 10% chance of being offline, the probability of two nodes being offline simultaneously is just 1%. As the number of nodes increases, the likelihood of all of them being offline simultaneously diminishes exponentially. Thus, as the size of a network grows, the chances of the entire system experiencing downtime decrease dramatically. Even if individual nodes occasionally falter, the distributed nature of the system ensures its overall availability remains impressively high. Here is a real example: 24 hours sampled from a production AI image generation workload with 100 requested nodes. As we would expect, it’s fairly uncommon for all 100 to be running at the same time, but 100% of the time we have at least 82 live nodes. For this customer, 82 simultaneous nodes offered plenty of throughput to keep up with their own internal SLOs, and provided a 0-downtime experience. Gaming PCs as a Robust, High-Availability Solution for AI GPU Shortage While gaming PC nodes might seem to offer modest reliability compared to enterprise servers, when viewed as part of a distributed system, they present a robust, high-availability solution. This system, with its inherent benefits of redundancy, scalability, and resilience, can be expertly managed to provide a formidable alternative to traditional centralized systems. By leveraging the untapped potential of gaming PCs, we not only address the growing computational demands of industries like AI but also pave the way for a more resilient, cost-effective, and globally

GPU shortage isn’t the problem for Generative AI. GPU selection is.

Are we truly out of GPU compute power or are we just looking in the wrong places for the wrong type of GPUs? Recently, GPU shortage has been in the news everywhere. Just take a peek at the many articles on the topic here – The Information, IT Brew, Wall Street Journal, a16z. The explosive growth of generative AI has created a mad rush and long wait-times for AI-focused GPUs. For growing AI companies serving inference at scale, shortage of such GPUs is not the real problem. Selecting the right GPU is. AI Inference Scalability and the “right-sized” GPU Today’s ‘GPU shortage’ is really a function of inefficient usage and overpaying for GPUs that don’t align with the application’s needs for at-scale AI. The marketing machines at large cloud companies and hardware manufacturers have managed to convince developers that they ABSOLUTELY NEED the newest, most powerful hardware available in order to be a successful AI company. The A100s and H100s – perfect for training and advanced models – certainly deserve the tremendous buzz for being the fastest, most advanced GPUs. But there aren’t enough of these GPUs around and when they are available, it requires pre-paying or having an existing contract. A recent article by semianalysis has two points that confirm this: Meanwhile, GPU benchmark data suggests that there are many use cases where you don’t need the newest, most powerful GPUs. Consumer-grade GPUs (RTX3090, A5000, RTX4090, etc.) not only have high availability but also deliver more inferences per dollar significantly reducing your cloud cost. Selecting the “right sized” GPU at the right stage puts generative AI companies on the path to profitable, scalable growth, lower cloud costs and immune to ‘GPU shortages’. How to Find the “Right Sized” GPU When it comes to determining the “right sized” GPU for your application, there are several factors to consider. The first step is to evaluate the needs of your application at each stage of an AI model’s lifecycle. This means taking into account the varying compute, networking and storage requirements for tasks such as data preprocessing, training, and inference. Training Models During the training stage of machine learning models, it is common for large amounts of computational resources to be required. This includes the use of high-powered graphics processing units (GPUs) which can number from hundreds to thousands. These GPUs need to be connected through lightning-fast network connections in specially designed clusters to ensure that the machine learning models receive the necessary resources to train effectively. These specially designed clusters are optimized for the specific needs of machine learning and are capable of handling the intense computational demands required during the training stage. Example: Training Stable Diffusion (Approximate Cost: $600k) Serving Models (Inference) When it comes to serving your model, scalability and throughput are particularly crucial. By carefully considering these factors, you can ensure that your infrastructure is able to accommodate the needs of your growing user base. This includes being mindful of both budgetary constraints and architectural considerations. It’s worth noting that there are many examples in which the GPU requirements for serving inference are significantly lower than those for training. Despite this, many people continue to use the same GPUs for both tasks. This can lead to inefficiencies, as the hardware may not be optimized for the unique demands of each task. By taking the time to carefully assess your infrastructure needs and make necessary adjustments, you can ensure that your system is operating as efficiently and effectively as possible. Example 1: 6X more images per dollar on consumer-grade GPUs In a recent Stable Diffusion benchmark, consumer-grade GPUs generated 4X-8X more images per dollar compared to AI-focused GPUs. Most generative AI companies in the text-to-image space will be well served using consumer-grade GPUs for serving inference at scale. The economics and availability make them a winner for this use case. Example 2: Serving Stable Diffusion SDXL In the recent announcement introducing SDXL, Stability.ai noted that SDXL 0.9 can be run on a modern consumer GPU with just 16GB RAM and a minimum of 8GB of vRAM. Serving “Right Sized” AI Inference at Scale At Salad, we understand the importance of being able to serve AI/ML inference at scale without breaking the bank. That’s why we’ve created a globally distributed network of consumer GPUs that are designed from the ground up to meet your needs. Our customers have found that turning to SaladCloud instead of relying on large cloud computing providers has allowed them to not only save up to 90% of their cloud cost, but also improve their product offerings and reduce their dev ops time. Example: Generating 9M+ images in 24 hours for only $1872 In a recent benchmark for a customer, we generated 9.2 Million Stable Diffusion images in 24 hours for just $1872 – all on Nvidia’s 3000/4000 series GPUs. That’s ~5000 images per dollar leading to significant savings for this image generation company. With SaladCloud, you won’t have to worry about costly infrastructure maintenance or unexpected downtime. If it works on your system, it works on Salad. Instead, you can focus on what really matters – serving your growing user base while remaining profitable. To see if your use case is a fit for consumer-grade GPUs, contact our team today.