AI Model Cost Optimization: A Practical Guide

Understanding Cost Sources

The primary cost in OpenClaw comes from token consumption in model API calls. Understanding the cost breakdown is the first step to optimization.

openclaw models cost --period 30d --detailed

Cost Breakdown (last 30 days):
  Provider     Model          Input Tokens  Output Tokens  Cost
  ──────────────────────────────────────────────────────────────
  openai       gpt-4o         2.5M          1.8M           $28.50
  openai       gpt-4o-mini    5.0M          3.2M           $2.67
  anthropic    claude-sonnet  800K          600K           $8.40
  ──────────────────────────────────────────────────────────────
  Total                       8.3M          5.6M           $39.57

  Top consumers:
    telegram-main: $22.00 (56%)
    discord-dev: $12.00 (30%)
    webchat: $5.57 (14%)

Strategy 1: Tiered Model Usage

Don't use the most expensive model for every request:

{
  "routing": {
    "rules": [
      {"match": {"contentLength": {"max": 50}}, "model": "fast"},
      {"match": {"content": ".*"}, "model": "smart"}
    ]
  },
  "models": {
    "fast": {
      "provider": "openai",
      "model": "gpt-4o-mini",
      "maxTokens": 1024
    },
    "smart": {
      "provider": "openai",
      "model": "gpt-4o",
      "maxTokens": 2048
    }
  }
}

Simple greetings and short questions use the cheaper model; complex questions use the premium one.

Strategy 2: Control Output Length

{
  "models": {
    "main": {
      "maxTokens": 2048,
      "systemPrompt": "Answer questions concisely. Unless the user asks for details, keep responses under 200 words."
    }
  }
}

Strategy 3: Optimize the Context Window

Conversation history consumes a large amount of input tokens:

{
  "sessions": {
    "maxHistory": 10,
    "contextStrategy": "smart-trim",
    "summaryAfter": 20
  }
}

Strategy 4: Use Caching

For repetitive Q&A (such as FAQs), enable response caching:

{
  "cache": {
    "enabled": true,
    "ttl": 3600,
    "maxEntries": 1000,
    "strategy": "semantic",
    "similarityThreshold": 0.95
  }
}

Semantic caching can match questions that are "phrased differently but mean the same thing."

Strategy 5: Set Budget Caps

{
  "budget": {
    "daily": 10.00,
    "monthly": 200.00,
    "perUser": {
      "daily": 1.00
    },
    "actions": {
      "warning": 0.8,
      "downgrade": 0.9,
      "stop": 1.0
    },
    "downgradeModel": "fast"
  }
}

At 80% of budget: send an alert
At 90%: automatically downgrade to the cheaper model
At 100%: stop the service

Strategy 6: Use Local Models

For non-critical scenarios, use free local models:

{
  "models": {
    "local": {
      "provider": "ollama",
      "model": "llama3.1:8b",
      "maxTokens": 2048
    }
  },
  "routing": {
    "rules": [
      {"match": {"channel": "internal-chat"}, "model": "local"},
      {"match": {"content": ".*"}, "model": "smart"}
    ]
  }
}

Strategy 7: Rate Limiting

Prevent any single user from consuming excessive resources:

{
  "channels": {
    "telegram-main": {
      "rateLimit": {
        "maxMessages": 20,
        "window": 60,
        "maxTokensPerDay": 50000
      }
    }
  }
}

Cost Monitoring and Alerts

# View real-time cost
openclaw cost today

# View monthly trend
openclaw cost trend --period 30d

# Set alerts
openclaw cost alert --daily 10 --notify telegram-admin

Cost Reports

# Generate a monthly cost report
openclaw cost report --period monthly --output cost-report.json

ROI Calculation

Monthly AI cost: $39.57
Alternative (human agent 4h/day × 30 days): $3,000+
Savings: ~$2,960/month
ROI: 7,480%

Summary

The core of cost optimization is "using the right model for the right scenario." Through a combination of model tiering, context optimization, caching, and budget controls, you can typically reduce costs by 50-70% while maintaining service quality. Regularly reviewing cost reports and continuously optimizing routing rules is the key to long-term savings.