SaladCloud Blog

The Future of Scientific Compute is Decentralized

Scientific computing is undergoing a fundamental architectural transformation from centralized supercomputers and cloud providers to distributed networks of consumer hardware and peer-to-peer marketplaces. This shift matters because it democratizes access to computational resources that were previously the exclusive domain of well-funded institutions, offering 60-85% cost reductions while enabling research at unprecedented scales. There are 10^60 molecules to explore in ‘chemical space’ and how many of those we can digitally synthesize is simply a function of cost: how many TFLOPs/$ you can achieve. The backstory traces through Stanford’s Folding@home reaching exascale computing in 2020 with volunteer PCs, to today’s blockchain-based compute marketplaces like Golem and consumer GPU platforms like Salad Technologies. This evolution connects to broader DeSci (Decentralized Science) movements and addresses critical bottlenecks as AI workloads create GPU shortages affecting the entire generative AI industry. Folding@home pioneered planetary-scale volunteer computing for science When Dr. Vijay Pande at Stanford launched Folding@home in October 2000, he applied an insight borrowed from Napster’s music-sharing revolution: protein folding simulations could be divided into thousands of independent parallel calculations distributed across volunteer computers worldwide. The founding vision proved prescient-within two decades, Folding@home would not only advance fundamental protein science but become the world’s first exascale computing system, more powerful than the top 500 supercomputers combined. The technical innovation centered on algorithmic breakthroughs that transformed how simulations could be parallelized. Pande’s team developed Markov State Models (MSMs) starting in 2004, creating network representations of protein conformational ensembles that became foundational methodology across computational biophysics. The FAST (Folding Accelerated by Sampling and Tracking) algorithm balanced exploration-exploitation to capture slow conformational changes with orders of magnitude less simulation time. Critically, Folding@home pioneered GPU computing for molecular dynamics on October 2, 2006–the first distributed computing project to harness GPUs, delivering 20-30x speedups that later led to the widely-used OpenMM software. Scale achievements tell a remarkable story. On September 16, 2007, Folding@home became the first project to cross 1 petaFLOP, recognized by Guinness World Records. Growth continued steadily: 2 petaFLOPS in 2008, 5 petaFLOPS in 2009 (another first), exceeding 40 petaFLOPS by 2014. Then COVID-19 catalyzed explosive expansion. When the Chodera Lab launched SARS-CoV-2 protein simulations on February 27, 2020, the user base grew 100-fold from 30,000 to over 1 million volunteers within one month. By April 12, 2020, Folding@home peaked at 2.43 exaFLOPS sustained performance-five-fold greater than Summit, then the world’s fastest supercomputer. The scientific output validated the approach. Over 226 peer-reviewed publications emerged from Folding@home data, published in Nature, Science, PNAS, and other top-tier journals. The project achieved the first millisecond-timescale protein folding simulation (NTL9 in 2010), revealed hub-like folding pathway topologies with many parallel routes, and discovered “cryptic pockets”-hidden drug binding sites absent in crystal structures but revealed through molecular dynamics. For COVID-19, the platform simulated 0.1 seconds of SARS-CoV-2 proteome in atomic detail-over 100,000 times more data than typical simulation papers. This revealed dramatic spike protein opening mechanisms far beyond experimental observations, identified over 50 novel cryptic pockets as drug targets, and screened 50,000+ compounds for the COVID Moonshot project, advancing 300 candidates toward clinical trials. The pioneering of citizen science at planetary scale demonstrated that millions of uncoordinated volunteers could create computational resources exceeding traditional infrastructure. The gamification approach-point-based credits, team competitions, leaderboards-combined with transparent data sharing engaged a diverse community from hardware enthusiasts to casual participants. Research on 400+ active participants revealed primarily altruistic motivations: desire to help scientists, personal connections to diseases studied, interest in scientific questions. The platform provided tangible mental health benefits during COVID-19, offering agency when people felt helpless. Remarkably, nearly 150 publications were facilitated by contributions from the “overclocker” community-enthusiasts who custom-build high-performance computers specifically to maximize Folding@home contributions. Modern platforms prove decentralized compute can serve real science While Folding@home demonstrated volunteer computing’s potential, contemporary platforms are creating commercial marketplaces that make decentralized computing accessible on-demand. These represent different architectural approaches to the same goal: aggregating underutilized hardware into research-grade infrastructure. Golem Network launched as a blockchain-based peer-to-peer marketplace in 2016. The vision: create a “decentralized supercomputer” where users worldwide rent unused computing power or access resources from others, paying with GLM tokens on Ethereum. The technical implementation, called Yagna (written in Rust), runs as a daemon on nodes that can function as both provider and requestor. The architecture handles market negotiations, executes tasks in sandboxed VM or WebAssembly runtimes, and settles payments via Polygon Layer 2-enabling transactions 150x cheaper than Ethereum mainnet. The LIFE@Golem project, partnering with Allchemy (founded by Prof. Bartosz Grzybowski with 300+ publications including 40+ in Nature/Science), simulated prebiotic chemical syntheses to trace life’s origins. Starting from 9 primordial molecules and applying 4,000+ expert-coded reaction rules plausible under early Earth conditions, the project reached the 10th generation of molecular evolution with a pool approaching 1 billion molecules. Previous research had only reached thousands. This work was published in the prestigious journal Chem in January 2024-establishing that blockchain-based computing can support serious academic research and demonstrating what Golem calls a framework for interconnecting scientific facilities to decentralized protocols “almost on demand.” Render Network – another decentralized blockchain-based project arrived at a similar conclusion from a different starting point. Reder founders came to a concrete realization: local compute and even centralized cloud infrastructure would inevitably hit a wall as creative and scientific workloads scaled up. While working on a large-format immersive display project tied to Madison Square Garden, the Render team calculated that fulfilling the job would require six months of all of Amazon’s West Coast GPU capacity and they only had three. That bottleneck made a strong case for a distributed alternative. Rather than building just another render farm, Render Network created a GPU marketplace where node operators monetize idle hardware while creators and researchers access near-unlimited on-demand compute. The platform has since rendered millions frames, powered productions for the Las Vegas Sphere and Super Bowl concerts, and even served NASA. But the ambition extends well beyond rendering. The vision is a full compute marketplace where services and modules can be shared and monetized by everyone: creating artwork, media production, video games, scientific simulations, and AI model training and inference. These blockchain-coordinated platforms demonstrate that decentralized infrastructure can support serious

1000s of GPUs, over 50% savings: Klyne accelerates AI drug discovery on SaladCloud

Klyne accelerates AI drug discovery with SaladCloud's low-cost, high-scale compute

Accelerating early-stage drug discovery at low-cost with AI The journey of drug discovery is a winding, often frustrating road defined by risk, long timelines, and staggering costs. A typical pharmaceutical company spends up to $2 billion and more than 15 years from the discovery of a drug target to an approved treatment. The high costs involved in early-stage drug discovery – the process of finding potential drug candidates – are especially frustrating & daunting.  This was the spark for Klyne AI. CEO Zachary Lawrence comes with a background in mathematics and computer science. He also led investments within the biotech and drug discovery sectors. The challenges of high-risk investments and slow, expensive drug development cycles led to a crucial realization: early-stage drug discovery needed a technological overhaul. Klyne was born in late 2023 to fix this – to help accelerate early-stage drug discovery at low-cost with AI-powered software that improves the efficiency of finding novel hits and optimizing them.  “We help smaller companies and startups go from spending hundreds of thousands of dollars to just tens of thousands of dollars – an order of magnitude difference”, says Zack. Klyne’s USP: Proprietary AI model. Pay-go business model. Klyne’s USP lies in their proprietary AI models, including the SPARC model, designed to predict binding affinity — a key measure of how well a drug will interact with a target protein. Klyne’s machine learning-driven predictions are as accurate as anything currently available. They also demand just 1% of the total compute power compared industry standard approaches. This was made possible by the construction of a large proprietary training dataset.  There’s another model in Klyne’s arsenal that gives them an advantage: their business model. Rather than charging high upfront subscription fees, Klyne offers pay-as-you-go services and software licenses, democratizing access to advanced drug discovery tools for startups and small businesses.  Companies can provide Klyne with a protein file, and their software will analyze trillions of compounds to deliver a list of chemical structures that are very likely to bind to the protein. Since many companies do not have in silico drug discovery expertise in-house, most companies elect to use their white-glove service  to ensure the discovery pipeline is set-up in the most optimal way.  The challenge in scaling computational resources for drug discovery However, like many startups in the biotech and life sciences space, Klyne faced significant challenges in scaling their operations, particularly around the high computational costs and burst GPU requirements associated with molecular dynamics (MD) simulations.  Zack adds, “We needed massive amounts of low-cost compute resources to run 100s of thousands of molecular dynamics simulations. Even with the most efficient algorithms, these simulations are slow, lengthy, and expensive to run. It’s crucial to have as many GPUs as possible at the cheapest price.” Traditional cloud providers offered expensive, high-performance GPUs – ideal for AI training, but are not cost-effective of molecular dynamics simulations. Klyne’s GPU needs varied from tens to hundreds to thousands, which made scaling on hyperscalers prohibitively expensive and challenging.   This is where SaladCloud’s flexible, cost-effective cloud computing services became an essential part of Klyne’s success.  “You don’t really want higher-end GPUs for your MD simulations. Other cloud providers are focused on providing high-end GPUs. But molecular dynamics is a different market with different needs. SaladCloud’s lower-end GPUs offer way better cost-efficiency for MD simulations” “We may ramp up to 10,000 simulations and ramp down quickly and often. So renting GPUs wasn’t an option. Spot pricing on GCP/AWS is so much more expensive for molecular dynamics simulations than SaladCloud. Plus they don’t always have availability of 1000s of GPUs. With SaladCloud, we can scale up to 1000s of GPUs quickly saving at least 50% compared to other alternatives.” Zachary Lawrence, CEO of Klyne AI SaladCloud helps Klyne scale fast at a low-cost on consumer GPUs While traditional cloud providers offered high-end GPUs, SaladCloud’s distributed infrastructure is tailored to fit the specific needs of drug discovery simulations – allowing companies like Klyne access to 1000s of consumer GPUs at the lowest market cost.  “We may ramp up to 10,000 simulations and ramp down quickly and often. So renting GPUs wasn’t an option. Spot pricing on GCP/AWS is so much more expensive for molecular dynamics simulations than SaladCloud. Plus they don’t always have availability of 1000s of GPUs. With SaladCloud, we can scale up to 1000s of GPUs quickly saving at least 50% compared to other alternatives”, added Zack. “The downside of interruptible instances do exist but can be overcome with specific code to save snapshots”.  Whether it’s running thousands of simulations or conducting a more targeted molecular dynamics simulation, Salad’s flexibility has been a game-changer for Klyne’s business. “Salad’s team has been a valuable partner for us, building Kelpie for our batch needs and being really customer friendly. We can directly connect with Salad’s engineering team and get the kind of support we wouldn’t from a bigger cloud provider”, adds Zack.  The future of AI drug discovery: Klyne, SaladCloud, and beyond Like any AI startup, Klyne has faced infrastructure challenges along the way. The transition from traditional infrastructure to distributed cloud computing was a learning curve. Salad’s support team worked directly with Klyne to develop best practices for running molecular dynamics simulations at low-cost and high-scale on a distributed cloud.  By enabling companies to predict drug affinity early in the process and reduce reliance on costly experimental testing, Klyne is helping to make the entire process more efficient and cost-effective. Their platform is used by a range of clients—from academic researchers at universities to enterprise-level drug discovery teams at large pharma companies. “We’re creating the next wave of drug discovery software,” said the Klyne founder. “And SaladCloud is helping us get there faster.” SaladCloudSaladCloud is the world’s largest distributed cloud computing network with 11,000+ daily GPUs and 450,000 GPUs contributing compute, all at the lowest cost in the market.

Molecular Simulation: GROMACS Benchmark on 30 GPUs on SaladCloud, 90+% Cost Savings

Molecular Simulation GROMACS Benchmark on SaladCloud

Note: Prices have fallen considerably since this benchmark was conducted, so actual costs will be even lower! Benchmarking GROMACS for Molecular Simulation on Consumer GPUs In this deep dive, we will benchmark GROMACS on SaladCloud, analyzing simulation speed and cost-effectiveness across a spectrum of small, medium, and large molecular systems. Additionally, we will provide recommendations for selecting the most appropriate resource types for various workloads on SaladCloud. Building on the OpenMM benchmark on SaladCloud and our continuous efforts to optimize system architecture and batch job implementation, we have achieved a 90% cost savings by using consumer GPUs for molecular simulations with GROMACS, compared to CPUs and data center GPUs.This capability enables effective static and dynamic load balancing across the system’s various components. GROMACS is a highly optimized, open-source software package for molecular dynamics simulations. Researchers in fields like biochemistry, biophysics, and materials science widely use it to study the physical movements of atoms and molecules over time. GROMACS stands out for its exceptional performance compared to other programs, efficiently leveraging both CPU and GPU resources. This capability enables effective static and dynamic load balancing across the system’s various components. Are you running more than $250K/yr in MDS compute? Migrate to the lowest cost GPU cloud with free, white-glove engineering support. GROMACS benchmark methodology The gmx mdrun is the main computational chemistry engine within GROMACS. The following command is to perform molecular dynamics simulations in the target environment: The mdrun program reads the input TPR file (-s), which contains the initial molecular topology and parameters, and produces several output files (-deffnm) with different extension names for logs, trajectories, structures and energies. GROMACS relies on close collaboration between the CPU and GPU to achieve optimal performance. Although many calculations can be offloaded to the GPU using the options (-nb, -pme, -bonded, -update), the program still demands considerable CPU processing power and multiple threads for task management, communication, and I/O operations. To fully utilize a powerful GPU, GROMACS also depends on robust CPU performance. While running more OpenMP threads than the number of physical cores could be beneficial in certain situations for GROMACS, but for our benchmark test, we only selected Salad nodes with CPUs that have 8 or more cores and configured each node to run 8 OpenMP threads (-ntmpi, -ntomp). We used GROMACS 2024.1 with CUDA 11.8 to build the container image. When running on SaladCloud, it first runs the simulations against typical molecular systems, reports the test data to an AWS DynamoDB table, and then exits. Finally, the data is downloaded and analyzed using Pandas on JupyterLab. Two key performance indicators are collected and analyzed during the test: ns/day stands for nanoseconds per day. It measures simulation speed, indicating how many nanoseconds of simulated time can be computed in one day of real time. ns/dollar stands for nanoseconds per dollar. It measures cost-effectiveness, showing how many nanoseconds of simulated time can be computed for one dollar. Below are the two scenarios and the methods used to collect data and calculate the final results: Scenario Resource Simulation Speed (ns/day) Cost Effectiveness (ns/dollar) ConsumerGPUs 8 cores for 8 OpenMP threads 30 GPU types Create a container group with 100 instances with all GPU types on SaladCloud, and run it for a few hours. Once the code execution is finished on an instance, SaladCloud will allocate a new node and continuously run the instance.   Collect test data from thousands of unique Salad nodes, ensuring sufficient samples for each GPU type. Calculate the average performance for each GPU type. Pricing from the SaladCloud Price Calculator: $0.072/hour for 16 vCPUs, 8GB RAM$0.015 ~ $0.18/hour for different GPU types (Priority: Batch ) https://salad.com/pricing  Data CenterGPUs 16 Cores for 16 OpenMP threads A40 48GBA100 40GBH100 80GB Use the test data in the GROMACS benchmarks by NHR@FAU. The lowest prices are selected from the data center GPU market, that closely match the resource requirements: $1.86/hour for A40 (24 vCPUs)$1.29/hour for A100 (30 vCPUs)$2.99/hour for H100 (30 vCPUs) https://getdeploying.com/reference/cloud-gpu It is worth mentioning that performance can be influenced by many factors, such as operating systems (Windows, Linux, or WSL) and their versions, CPU models, GPU models, and driver versions, CUDA framework versions, GROMACS versions, and additional features enabled in the runtime environment. It is very common to see different results between our benchmarks and those of others. Benchmark Results Here are six typical biochemical systems used to benchmark GROMACS: No Model Description Size 1 R-143a in hexane (20,248 atoms) with very high output rate Small 2 A short RNA piece with explicit water (31,889 atoms) Small 3 A protein inside a membrane surrounded by explicit water (80,289 atoms) Medium 4 A protein in explicit water (170,320 atoms) Medium 5 A protein membrane channel with explicit water (615,924 atoms) Large 6 A huge virus protein (1,066,628 atoms) Large Model 1: R-143a in hexane (20,248 atoms) with very high output rate Model 2: A short RNA piece with explicit water (31,889 atoms) Model 3: A protein inside a membrane surrounded by explicit water (80,289 atoms) Model 4: A protein in explicit water (170,320 atoms) Model 5: A protein membrane channel with explicit water (615,924 atoms) Model 6: A huge virus protein (1,066,628 atoms) Observations from the GROMACS benchmark Here are some interesting observations from the GROMACS benchmarks: The VRAM usage for all simulations is only 1-2 GB, which means nearly all GPU types can theoretically be utilized to run these models. GROMACS primarily utilizes the CUDA Cores of GPUs (not Tensor Cores), and typically operates in single-precision (FP32). High-end GPUs generally outperform low-end models in simulation speed due to their greater number of CUDA cores and higher memory bandwidth. However, the flagship model of a GPU generation often surpasses the low-end models of the following generation. For smaller models, GPUs are often underutilized, and communication between the CPU and GPU can become a bottleneck, making CPU performance a critical factor in overall system performance. On nodes with GPUs of similar performance, higher CPU clock speeds and more physical cores usually lead to better performance. Data center GPUs are

Molecular Simulation: OpenMM Benchmark on 25 Consumer GPUs, 95% Less Cost

OpenMM-benchmark-on-GPUs-Salad-Blog-cover

Note: Prices have fallen considerably since this benchmark was conducted, so actual costs will be even lower! Benchmarking OpenMM for Molecular Simulation on consumer GPUs OpenMM is one of the most popular toolkits for molecular dynamics simulations, renowned for its high performance, flexibility, and extensibility. It enables users to easily incorporate new features, such as novel forces, integration algorithms, and simulation protocols, which can run efficiently on both CPUs and GPUs. This analysis uses typical biochemical systems to benchmark OpenMM on SaladCloud’s network of AI-enabled consumer GPUs. We will analyze simulation speed and cost-effectiveness in each case and discuss how to build high-performance and reliable molecular simulation workloads on SaladCloud. This approach supports unlimited throughput and offers over 95% cost savings compared to solutions based on data center GPUs. Are you running more than $250K/yr in MDS compute? Migrate to the lowest cost GPU cloud with free, white-glove engineering support. Why run Molecular Simulations on GPUs? GPUs have a high degree of parallelism, which means they can perform many calculations simultaneously. This is particularly useful for molecular simulations, which involve many repetitive calculations, such as evaluating forces between atoms. Using GPUs can significantly accelerate molecular simulations, offering nearly real-time feedback and allowing researchers to run more simulations in less time. This enhanced efficiency accelerates the pace of discovery and lowers computational costs. OpenMM benchmark methodology The OpenMM team has provided benchmarking code in Python, along with benchmarks of simulation speed for typical biochemical systems based on OpenMM 8.0. To conduct the benchmarking test, you can run the following scripts on the target environment: Following the OpenMM benchmarks, we used OpenMM 8.0 with CUDA 11.8 to build the container image. When running on SaladCloud, it first executes the benchmarking code, reports the test data to an AWS DynamoDB table, and then exits. Finally, the data is downloaded and analyzed using Pandas on JupyterLab. We primarily focused on two key performance indicators across three scenarios: ns/day stands for nanoseconds per day. It measures simulation speed, indicating how many nanoseconds of simulated time can be computed in one day of real-time.  ns/dollar stands for nanoseconds per dollar. It measures cost-effectiveness, showing how many nanoseconds of simulated time can be computed for one dollar. Molecular simulations often operate on the timescale of nanoseconds to microseconds, as molecular motions and interactions occur very rapidly. Below are the three scenarios and the methods used to collect data and calculate the final results: Scenario Resource Simulation Speed (ns/day) Cost Effectiveness (ns/dollar) CPUs 16 vCPUs8GB RAM Create a container group with 100 instances with all GPU types on SaladCloud and run it for a few hours.  Collect test data from thousands of unique Salad nodes, ensuring sufficient samples for each GPU type. Calculate the average performance for each GPU type. Pricing from the SaladCloud Price Calculator: $0.040/hour for   8 vCPUs, 8GB RAM$0.072/hour for 16 vCPUs, 8GB RAM $0.02 ~ $0.30/hour for different GPU types https://salad.com/pricing Consumer GPUs 8 vCPUs 8GB RAM 20+ GPU types Create a container group with 100 instances with all GPU types on SaladCloud and run it forofew hours.  Collect test data from thousands of unique Salad nodes, ensuring sufficient samples for each GPU type. Calculate the average performance for each GPU type. Pricing from the SaladCloud Price Calculator: $0.040/hour for   8 vCPUs, 8GB RAM$0.072/hour for 16 vCPUs, 8GB RAM $0.02 ~ $0.30/hour for different GPU types https://salad.com/pricing Datacenter GPUs A100H100 Use the test data in the OpenMM benchmarks. Pricing from the AWS EC2 Capacity Blocks: $1.844/hour for 1 A100$4.916/hour for 1 H100 https://aws.amazon.com/ec2/capacityblocks/pricing/ It is worth mentioning that performance can be influenced by many factors, such as operating systems (Windows, Linux, or WSL) and their versions, CPU models, GPU models, and driver versions, CUDA framework versions, OpenMM versions, and additional features enabled in the runtime environment. It is very common to see different results between our benchmarks and those of others. Benchmark Results Here are five typical biochemical systems used to benchmark OpenMM 8.0, along with the corresponding test scripts: Model Description Test script 1 Dihydrofolate Reductase (DHFR), Explicit-PME This is a 159 residue protein with 2489 atoms. The version used for explicit solvent simulations included 7023 TIP3P water molecules, giving a total of 23,558 atoms. All simulations used the AMBER99SB force field and a Langevin integrator. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=pme 2 Apolipoprotein A1 (ApoA1), PME This consists of 392 protein residues, 160 POPC lipids, and 21,458 water molecules, for a total of 92,224 atoms. All simulations used the AMBER14 force field. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=apoa1pme 3 Cellulose, PME It consists of a set of cellulose molecules (91,044 atoms) solvated with 105,855 water molecules, for a total of 408,609 atoms. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=amber20-cellulose 4 Satellite Tobacco Mosaic Virus (STMV), PME It consists of 8820 protein residues, 949 RNA bases, 300,053 water molecules, and 649 sodium ions, for a total of 1,067,095 atoms. python benchmark.py –platform=CUDA or CPU–seconds=60–test=amber20-stmv 5 AMOEBA DHFR, PME Full mutual polarization was used, with induced dipoles iterated until they converged to a tolerance of 1e-5. python benchmark.py –platform=CUDA or CPU –seconds=60  –test=amoebapme Model 1: Dihydrofolate Reductase (DHFR), Explicit-PME Model 2: Apolipoprotein A1 (ApoA1), PME Model 3: Cellulose, PME Model 4: Satellite Tobacco Mosaic Virus (STMV), PME Model 5: AMOEBA DHFR, PME Observations from the OpenMM GPU benchmarks: Here are some interesting observations from the OpenMM GPU benchmarks: The VRAM usage for all simulations is only 1-2 GB, which means nearly all platforms (CPU-only or GPU) and all GPU types can theoretically be utilized to run these models. For all models, the simulation speed of GPUs is significantly higher than that of CPUs, ranging from nearly hundreds of times in Model 1 to more than tens of thousands of times in Model 5. In general, high-end GPUs outperform low-end GPUs in terms of simulation speed. However, the flagship model of a given GPU family often surpasses the low-end models of the next family. As models become more complex with additional molecules and atoms, the performance differences between low-end