Understanding Cost Sources
The primary cost in OpenClaw comes from token consumption in model API calls. Understanding the cost breakdown is the first step to optimization.
openclaw models cost --period 30d --detailed
Cost Breakdown (last 30 days):
Provider Model Input Tokens Output Tokens Cost
──────────────────────────────────────────────────────────────
openai gpt-4o 2.5M 1.8M $28.50
openai gpt-4o-mini 5.0M 3.2M $2.67
anthropic claude-sonnet 800K 600K $8.40
──────────────────────────────────────────────────────────────
Total 8.3M 5.6M $39.57
Top consumers:
telegram-main: $22.00 (56%)
discord-dev: $12.00 (30%)
webchat: $5.57 (14%)
Strategy 1: Tiered Model Usage
Don't use the most expensive model for every request:
{
"routing": {
"rules": [
{"match": {"contentLength": {"max": 50}}, "model": "fast"},
{"match": {"content": ".*"}, "model": "smart"}
]
},
"models": {
"fast": {
"provider": "openai",
"model": "gpt-4o-mini",
"maxTokens": 1024
},
"smart": {
"provider": "openai",
"model": "gpt-4o",
"maxTokens": 2048
}
}
}
Simple greetings and short questions use the cheaper model; complex questions use the premium one.
Strategy 2: Control Output Length
{
"models": {
"main": {
"maxTokens": 2048,
"systemPrompt": "Answer questions concisely. Unless the user asks for details, keep responses under 200 words."
}
}
}
Strategy 3: Optimize the Context Window
Conversation history consumes a large amount of input tokens:
{
"sessions": {
"maxHistory": 10,
"contextStrategy": "smart-trim",
"summaryAfter": 20
}
}
Strategy 4: Use Caching
For repetitive Q&A (such as FAQs), enable response caching:
{
"cache": {
"enabled": true,
"ttl": 3600,
"maxEntries": 1000,
"strategy": "semantic",
"similarityThreshold": 0.95
}
}
Semantic caching can match questions that are "phrased differently but mean the same thing."
Strategy 5: Set Budget Caps
{
"budget": {
"daily": 10.00,
"monthly": 200.00,
"perUser": {
"daily": 1.00
},
"actions": {
"warning": 0.8,
"downgrade": 0.9,
"stop": 1.0
},
"downgradeModel": "fast"
}
}
- At 80% of budget: send an alert
- At 90%: automatically downgrade to the cheaper model
- At 100%: stop the service
Strategy 6: Use Local Models
For non-critical scenarios, use free local models:
{
"models": {
"local": {
"provider": "ollama",
"model": "llama3.1:8b",
"maxTokens": 2048
}
},
"routing": {
"rules": [
{"match": {"channel": "internal-chat"}, "model": "local"},
{"match": {"content": ".*"}, "model": "smart"}
]
}
}
Strategy 7: Rate Limiting
Prevent any single user from consuming excessive resources:
{
"channels": {
"telegram-main": {
"rateLimit": {
"maxMessages": 20,
"window": 60,
"maxTokensPerDay": 50000
}
}
}
}
Cost Monitoring and Alerts
# View real-time cost
openclaw cost today
# View monthly trend
openclaw cost trend --period 30d
# Set alerts
openclaw cost alert --daily 10 --notify telegram-admin
Cost Reports
# Generate a monthly cost report
openclaw cost report --period monthly --output cost-report.json
ROI Calculation
Monthly AI cost: $39.57
Alternative (human agent 4h/day × 30 days): $3,000+
Savings: ~$2,960/month
ROI: 7,480%
Summary
The core of cost optimization is "using the right model for the right scenario." Through a combination of model tiering, context optimization, caching, and budget controls, you can typically reduce costs by 50-70% while maintaining service quality. Regularly reviewing cost reports and continuously optimizing routing rules is the key to long-term savings.