SaladCloud Blog

INSIDE SALAD

Llama 3.1 8B GPU benchmark – $0.228 per Million output tokens on SaladCloud

Maksim Gorkii

Llama 3.1 8B GPU benchmark

Llama 3.1 is a family of Large Language Models (LLMs) released by Meta, available with a friendly community license and a variety of sizes (8B, 70B and 405B). In this benchmark, we evaluate the throughput and cost-efficiency of running the Llama 3.1 8B variant with Ollama across 9 different GPUs on SaladCloud.

Benchmark Design

The benchmark was conducted using k6, a load testing tool from Grafana Labs, to simulate a gradually increasing load from 10 to 35 virtual users over approximately 1 hour. Each virtual user asked the model to write a recipe for a salad with a maximum of 1024 output tokens. See the exact configuration in GitHub. The test environment consisted of multiple container groups on SaladCloud with up 8-10 replicas (most commonly running 9 replicas).

Benchmark design for Llama 3.1 8B on SaladCloud’s consumer GPUs

Deploying on SaladCloud

We deployed the “Ollama Llama 3.1” recipe on SaladCloud, using the default configuration, but setting priority to “batch”, and requesting 10 replicas. This setup was duplicated for each of 9 GPU types. We started each benchmark when the container group first hit 10 replicas.

Results of the Llama 3.1 8B benchmark

The RTX 3090 was a standout performer, achieving the best cost-per-token, and remarkably high throughput at peak load. I was surprised to see how little the difference in performance was between the RTX 3090 and the RTX 4090 in this task, and with the 3090 being quite a bit less expensive, it’s an easy pick in this case. Even the modest RTX 3060 performed reasonably well, though it still was slightly less cost-effective than the RTX 3090.

Conclusions

Llama 3.1 8B with Ollama shows solid performance across a wide range of devices, including lower-end last-generation GPUs. The RTX 3090 24GB stood out with 99.983% of requests successful, and generating over 1700 tokens per second across the cluster with 35 concurrent users, which comes out to a cost of just $0.228 per million output tokens.

Interested in free credits to try SaladCloud for Image Generation? Contact our support team today.

Have questions about enterprise pricing for SaladCloud?

Book a 15 min call with our team.

Related Blog Posts

Salad will become a Render Subnet, Salad and Render Partnership

RNP-023 Approved: Salad Is Joining the Render Network

It's official. RNP-023 has passed the community vote, and Salad will now become an exclusive subnet on the Render Network. A few weeks ago we shared our proposal to fully...
Read More

Use Cline with SaladCloud: Building Real Apps for Under $0.01

At SaladCloud, we've been working on easy-to-deploy recipes designed to cover most agentic use cases out of the box. When you run LLMs on Salad, you're not worried about token...
Read More

Salad Proposes Integration with the Render Network

I’m excited to share that Salad has submitted a formal proposal alongside the Render Network Foundation to become a subnet on the Render Network. This would involve fully transitioning our...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.