INSIDE SALAD

Llama 3.1 8B GPU benchmark – $0.228 per Million output tokens on SaladCloud

Published: December 9, 2024

Maksim Gorkii

Llama 3.1 8B GPU benchmark

Llama 3.1 is a family of Large Language Models (LLMs) released by Meta, available with a friendly community license and a variety of sizes (8B, 70B and 405B). In this benchmark, we evaluate the throughput and cost-efficiency of running the Llama 3.1 8B variant with Ollama across 9 different GPUs on SaladCloud.

Benchmark Design

The benchmark was conducted using k6, a load testing tool from Grafana Labs, to simulate a gradually increasing load from 10 to 35 virtual users over approximately 1 hour. Each virtual user asked the model to write a recipe for a salad with a maximum of 1024 output tokens. See the exact configuration in GitHub. The test environment consisted of multiple container groups on SaladCloud with up 8-10 replicas (most commonly running 9 replicas).

Deploying on SaladCloud

We deployed the “Ollama Llama 3.1” recipe on SaladCloud, using the default configuration, but setting priority to “batch”, and requesting 10 replicas. This setup was duplicated for each of 9 GPU types. We started each benchmark when the container group first hit 10 replicas.

Results of the Llama 3.1 8B benchmark

The RTX 3090 was a standout performer, achieving the best cost-per-token, and remarkably high throughput at peak load. I was surprised to see how little the difference in performance was between the RTX 3090 and the RTX 4090 in this task, and with the 3090 being quite a bit less expensive, it’s an easy pick in this case. Even the modest RTX 3060 performed reasonably well, though it still was slightly less cost-effective than the RTX 3090.

Conclusions

Llama 3.1 8B with Ollama shows solid performance across a wide range of devices, including lower-end last-generation GPUs. The RTX 3090 24GB stood out with 99.983% of requests successful, and generating over 1700 tokens per second across the cluster with 35 concurrent users, which comes out to a cost of just $0.228 per million output tokens.

Interested in free credits to try SaladCloud for Image Generation? Contact our support team today.

Have questions about enterprise pricing for SaladCloud?