Molecular Simulation: OpenMM Benchmark on 25 consumer GPUs, 95% less cost
Benchmarking OpenMM for Molecular Simulation on consumer GPUs OpenMM is one of the most popular toolkits for molecular dynamics simulations, renowned for its high-performance, flexibility and extensibility. It enables users to easily incorporate new features, such as novel forces, integration algorithms, and simulation protocols, which can run efficiently on both CPUs and GPUs. In this analysis, we use typical biochemical systems to benchmark OpenMM on SaladCloud’s network of AI-enabled consumer GPUs. We will analyze simulation speed and cost-effectiveness in each case and discuss how to build high-performance and reliable molecular simulation workloads on SaladCloud. This approach supports unlimited throughput and offers over 95% cost savings compared to solutions based on data center GPUs. Why run Molecular Simulations on GPUs? GPUs have a high degree of parallelism, which means they can perform many calculations simultaneously. This is particularly useful for molecular simulations, which involve a large number of repetitive calculations, such as evaluating forces between atoms. Using GPUs can significantly accelerate molecular simulations, offering nearly real-time feedback and allowing researchers to run more simulations in less time. This enhanced efficiency accelerates the pace of discovery and lowers computational costs. OpenMM benchmark methodology The OpenMM team has provided benchmarking code in Python, along with benchmarks of simulation speed for typical biochemical systems based on OpenMM 8.0. To conduct the benchmarking test, you can run the following scripts on the target environment: Following the OpenMM benchmarks, we used OpenMM 8.0 with CUDA 11.8 to build the container image. When running on SaladCloud, it first executes the benchmarking code, reports the test data to an AWS DynamoDB table, and then exits. Finally, the data is downloaded and analyzed using Pandas on JupyterLab. We primarily focused on two key performance indicators across three scenarios: ns/day stands for nanoseconds per day. It measures simulation speed, indicating how many nanoseconds of simulated time can be computed in one day of real time. ns/dollar stands for nanoseconds per dollar. It is a measure of cost-effectiveness, showing how many nanoseconds of simulated time can be computed for one dollar. Molecular simulations often operate on the timescale of nanoseconds to microseconds, as molecular motions and interactions occur very rapidly. Below are the three scenarios and the methods used to collect data and calculate the final results: Scenario Resource Simulation Speed (ns/day) Cost Effectiveness (ns/dollar) CPUs 16 vCPUs8GB RAM Create a container group with 10 CPU-only instances on SaladCloud, and run it for a few hours. Once the code execution is finished on an instance, SaladCloud will allocate a new node and continuously run the instance. Collect test data from tens of unique Salad nodes to calculate the average performance. Pricing from the Salad Price Calculator: $0.040/hour for 8 vCPUs, 8GB RAM$0.072/hour for 16 vCPUs, 8GB RAM $0.02 ~ $0.30/hour for different GPU types https://salad.com/pricing Consumer GPUs 8 vCPUs 8GB RAM 20+ GPU types Create a container group with 100 instances with all GPU types on SaladCloud, and run it for a few hours. Collect test data from thousands of unique Salad nodes, ensuring sufficient samples for each GPU type. Calculate the average performance for each GPU type. Pricing from the Salad Price Calculator: $0.040/hour for 8 vCPUs, 8GB RAM$0.072/hour for 16 vCPUs, 8GB RAM $0.02 ~ $0.30/hour for different GPU types https://salad.com/pricing Datacenter GPUs A100H100 Use the test data in the OpenMM benchmarks. Pricing from the AWS EC2 Capacity Blocks: $1.844/hour for 1 A100$4.916/hour for 1 H100 https://aws.amazon.com/ec2/capacityblocks/pricing/ It is worth mentioning that performance can be influenced by many factors, such as operating systems (Windows, Linux, or WSL) and their versions, CPU models, GPU models and driver versions, CUDA framework versions, OpenMM versions, and additional features enabled in the runtime environment. It is very common to see different results between our benchmarks and those of others. Benchmark Results Here are five typical biochemical systems used to benchmark OpenMM 8.0, along with the corresponding test scripts: Model Description Test script 1 Dihydrofolate Reductase (DHFR), Explicit-PME This is a 159 residue protein with 2489 atoms. The version used for explicit solvent simulations included 7023 TIP3P water molecules, giving a total of 23,558 atoms. All simulations used the AMBER99SB force field and a Langevin integrator. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=pme 2 Apolipoprotein A1 (ApoA1), PME This consists of 392 protein residues, 160 POPC lipids, and 21,458 water molecules, for a total of 92,224 atoms. All simulations used the AMBER14 force field. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=apoa1pme 3 Cellulose, PME It consists of a set of cellulose molecules (91,044 atoms) solvated with 105,855 water molecules, for a total of 408,609 atoms. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=amber20-cellulose 4 Satellite Tobacco Mosaic Virus (STMV), PME It consists of 8820 protein residues, 949 RNA bases, 300,053 water molecules, and 649 sodium ions, for a total of 1,067,095 atoms. python benchmark.py –platform=CUDA or CPU–seconds=60–test=amber20-stmv 5 AMOEBA DHFR, PME Full mutual polarization was used, with induced dipoles iterated until they converged to a tolerance of 1e-5. python benchmark.py –platform=CUDA or CPU –seconds=60 –test=amoebapme Model 1: Dihydrofolate Reductase (DHFR), Explicit-PME Model 2: Apolipoprotein A1 (ApoA1), PME Model 3: Cellulose, PME Model 4: Satellite Tobacco Mosaic Virus (STMV), PME Model 5: AMOEBA DHFR, PME Observations from the OpenMM GPU benchmarks: Here are some interesting observations from the OpenMM GPU benchmarks: The VRAM usage for all simulations is only 1-2 GB, which means nearly all platforms (CPU-only or GPU) and all GPU types can theoretically be utilized to run these models. For all models, the simulation speed of GPUs is significantly higher than that of CPUs, ranging from nearly hundreds of times in Model 1 to more than tens of thousands of times in Model 5. In general, high-end GPUs outperform low-end GPUs in terms of simulation speed. However, the flagship model of a given GPU family often surpasses the low-end models of the next family. As models become more complex with additional molecules and atoms, the performance differences between low-end GPUs and high-end GPUs become more pronounced, ranging from a few times to tens of times. For example, in Model