We wanted to build something practical with OpenClaw not a demo, but an actual workflow we can use every day. The idea: an agent that continuously pulls news about topics we are interested in, summarizes them using a Salad-hosted LLM, and delivers the summaries straight to a Telegram chat. Every five minutes, around the clock.
Here’s how we set it up, what it actually costs, and why the numbers make running your own model on SaladCloud an easy choise.
The Setup
The architecture has three parts:
SaladCloud deployment. We deployed an Ollama container running gpt-oss:20b model on SaladCloud using an RTX 3090 on the lowest priority tier, which costs $0.09 per hour. This model handles all the summarization work.
OpenClaw running locally. We installed OpenClaw on a local machine and configured it with two model providers: our SaladCloud-hosted model as the primary model, and Claude Opus 4.5 available for any tasks that need heavier reasoning.
Telegram integration. OpenClaw connects directly to Telegram. The agent sends news summaries to a designated chat, so we do not miss anything of our interest. We can also pull new posts from our channels on Telegram and have the agent summarize those too but we will keep this of this project for now.
The Workflow
Every five minutes, the agent:
- Pulls the latest news on several topics we defined.
- Sends the content to our gpt-oss:20b model on SaladCloud for summarization.
- Posts the summary to our Telegram chat.
The Numbers
After running the workflow, we measured what a typical summarization request actually costs in tokens. Each request uses roughly 8,000 input tokens (the raw news content being summarized) and 500 output tokens (the summary itself). With a request every five minutes, that gets to:
- 102,000 tokens per hour (96K input + 6K output)
- ~2.4 million tokens over 24 hours
Now here’s where the cost comparison gets interesting.
If we ran this on Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens, the daily bill would be roughly $15 per day – or about $450 per month just for automated news summaries.
On SaladCloud, we’re running an RTX 3090 at $0.09/hour on the lowest priority tier. Running it 24 hours a day, that’s $2.16 per day – or about $65 per month. That’s an 86% cost reduction.
We are also only using the model once every five minutes. The GPU sits idle between requests. We could run far more summarization tasks, add more topics, summarize Telegram channels, or run entirely different workloads on the same deployment – all without spending a single dollar more. On SaladCloud, you pay per hour of compute, not per token. Whether you send 100 requests or 10,000, the cost is the same.
Why Smaller Models Work Here
A 20B parameter model is more than capable of producing high-quality summaries. Summarization is a well-understood task that doesn’t require frontier-model reasoning since the model needs to read content, identify what matters, and condense it clearly. Modern open-source models at 14B–20B parameters do this extremely well. Same model can be used for other purposes as well, translation for example, or key points extraction.
Where you might still want a bigger model is for the initial setup. Defining the workflow, writing the prompts, configuring the agent behavior, and debugging edge cases might be much quicker and easy on the big models. For that, having Opus 4.5 available as an option in the same OpenClaw config is useful. However once the workflow is running the self-hosted model handles the repetitive summarization work without any quality issues.
The Config
Here’s the relevant portion of the OpenClaw configuration we used:
{
"models": {
"providers": {
"ollama": {
"baseUrl": "<https://your-salad-deployment-url.salad.cloud/v1>",
"apiKey": "ollama-local",
"api": "openai-completions",
"models": [
{
"id": "gpt-oss:20b",
"name": "gpt-oss:20b",
"reasoning": false,
"input": ["text"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 128000,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/gpt-oss:20b",
"fallbacks": [
"anthropic/claude-opus-4-5"
]
},
"models": {
"ollama/gpt-oss:20b": { "alias": "gpt-oss" },
"anthropic/claude-opus-4-5": { "alias": "opus" }
},
"heartbeat": {
"every": "5m",
"model": "ollama/gpt-oss:20b",
"target": "last"
}
}
}
}
The SaladCloud-hosted model is the primary for all automated work. Opus is there when we need it – switch with /model opus – but the daily grind of summarization runs entirely on our $0.09/hour GPU.
Cost Summary
| Claude Opus 4.5 | SaladCloud (RTX 3090) | |
|---|---|---|
| Billing model | Per token | Per hour |
| Hourly cost (this workload) | ~$0.63 | $0.09 |
| Daily cost (24h) | ~$15 | $2.16 |
| Monthly cost | ~$453 | ~$65 |
| Additional usage | Costs scale linearly | Already included |
The SaladCloud cost stays flat regardless of how much more we use the model. The Opus cost scales with every additional token.
Takeaway
For repetitive, well-defined tasks like news summarization, a smaller model hosted on SaladCloud is dramatically cheaper than using a model API and the quality is more than sufficient. In addition hourly compute model means you can keep adding workloads without extra cost. Our news bot runs every five minutes, but the same deployment could simultaneously handle Telegram channel digests, document summaries, emails, or any other summarization or other task we give it.
Summarization is just one example. Smaller self-hosted models are capable of handling a wide range of everyday agent tasks:
- Email drafting. Have the agent scan your inbox on a schedule, flag what’s urgent, and draft replies for routine messages. The input/output pattern is similar to news summarization – mostly reading, with short structured output.
- Code review. Point OpenClaw at a git diff and ask for a review. Models at 14B–20B can catch bugs, suggest improvements, and flag style issues reliably, especially coding-focused models like Qwen Coder.
- Meeting prep and follow-ups. Pull calendar events and related documents, generate briefing notes before meetings, and draft follow-up action items from notes afterward.
- Content repurposing. Take a blog post and have the agent generate social media posts, email newsletter, or internal summaries – all different output formats from the same source material.
- And many more. The range of options is huge. Most agents need far less “thinking” than LLM’s you talk to directly, if you provide well-defined instructions. Also new open-source models get released daily and improve quickly.
The common thing is that these are all well-defined, repeatable tasks where the model’s job is to read, process, and produce structured output – not to reason about novel problems. That’s the sweet spot for self-hosted models on affordable hardware.
The frontier model might still be needed for building the workflow, handling complex reasoning, and tackling tasks that need it. But for the 90% of agent work that’s routine, a 20B model on a $0.09/hour GPU gets the job done.
