OpenClaw Resource Monitoring and Alert Configuration

Introduction

Understanding OpenClaw's resource consumption is key to ensuring service stability. This article covers how to comprehensively monitor OpenClaw's resource usage — including memory, CPU, network connections, and message throughput — and configure automated alert notifications when metrics become abnormal.

1. Built-in Resource Monitoring

1.1 The openclaw stats Command

OpenClaw provides a built-in statistics command for a quick overview of resource usage:

# View current resource usage
openclaw stats

# Example output
# ┌─────────────────────────────────────────┐
# │         OpenClaw Runtime Statistics     │
# ├─────────────────────────────────────────┤
# │ Uptime:       3d 12h 45m               │
# │ Process PID:  12345                     │
# │ Memory (Heap): 168MB / 512MB (32%)     │
# │ Memory (RSS):  245MB                    │
# │ CPU (1m avg):  2.3%                     │
# │ Active Channels: 3 / 3                  │
# │ Today's Messages: 342                   │
# │ Today's Tokens:   125,800               │
# │ Today's Cost:     $1.85                 │
# │ Avg Response:     1.8s                  │
# │ Error Rate:       0.3%                  │
# └─────────────────────────────────────────┘

1.2 Real-Time Monitoring Dashboard

# Start real-time monitoring (similar to the top command)
openclaw stats --live

# Custom refresh interval (seconds)
openclaw stats --live --interval 5

The real-time dashboard continuously updates the following metrics:

Memory usage trend graph (ASCII chart)
Messages processed per minute
Current active connections
API call latency
Error count

1.3 Historical Statistics Queries

# View message statistics for the past 24 hours
openclaw stats --period 24h

# View resource trends for the past 7 days
openclaw stats --period 7d --metric memory

# Export statistics as CSV
openclaw stats --period 30d --format csv > openclaw-stats.csv

2. HTTP API Monitoring Endpoints

2.1 Retrieving Runtime Metrics

# Basic runtime metrics
curl -s http://localhost:18789/health/stats | jq .

Response data:

{
  "uptime": 302400,
  "memory": {
    "heapUsed": 168000000,
    "heapTotal": 536870912,
    "rss": 257000000,
    "external": 15000000
  },
  "cpu": {
    "user": 125000,
    "system": 45000,
    "percent": 2.3
  },
  "messages": {
    "today": 342,
    "thisHour": 28,
    "total": 15680
  },
  "tokens": {
    "today": {
      "input": 89500,
      "output": 36300
    }
  },
  "responseTime": {
    "avg": 1800,
    "p50": 1500,
    "p95": 3200,
    "p99": 5100
  },
  "errors": {
    "today": 3,
    "rate": 0.003
  }
}

2.2 Channel-Level Statistics

# Get message statistics per channel
curl -s http://localhost:18789/health/channels | jq .

{
  "channels": [
    {
      "name": "whatsapp",
      "status": "connected",
      "uptime": 302400,
      "messagesReceived": 180,
      "messagesSent": 175,
      "avgResponseTime": 1650,
      "errors": 1
    },
    {
      "name": "telegram",
      "status": "connected",
      "uptime": 302400,
      "messagesReceived": 120,
      "messagesSent": 118,
      "avgResponseTime": 1950,
      "errors": 2
    }
  ]
}

3. Prometheus Metrics Collection

3.1 Enabling the Prometheus Endpoint

// ~/.config/openclaw/openclaw.json5
{
  "monitoring": {
    "prometheus": {
      "enabled": true,
      "port": 9191,
      "path": "/metrics"
    }
  }
}

3.2 Key Prometheus Metrics

Complete list of metrics exported by OpenClaw:

Message Processing Metrics:

Metric Name	Type	Description
`openclaw_messages_received_total`	Counter	Total received messages (labeled by channel)
`openclaw_messages_sent_total`	Counter	Total sent messages
`openclaw_messages_failed_total`	Counter	Failed message count

Model Call Metrics:

Metric Name	Type	Description
`openclaw_model_requests_total`	Counter	Total model API calls
`openclaw_model_errors_total`	Counter	Model API error count
`openclaw_model_duration_seconds`	Histogram	Model response time distribution
`openclaw_model_tokens_total`	Counter	Total token usage (input/output labels)

Resource Metrics:

Metric Name	Type	Description
`openclaw_memory_heap_bytes`	Gauge	Heap memory usage
`openclaw_memory_rss_bytes`	Gauge	Resident set size
`openclaw_active_connections`	Gauge	Active connection count
`openclaw_queue_length`	Gauge	Request queue length

3.3 Useful PromQL Queries

# Messages processed per minute
rate(openclaw_messages_received_total[5m]) * 60

# Message volume by channel
sum by (channel) (increase(openclaw_messages_received_total[24h]))

# Model call P95 latency
histogram_quantile(0.95, rate(openclaw_model_duration_seconds_bucket[5m]))

# Error rate
rate(openclaw_model_errors_total[5m]) / rate(openclaw_model_requests_total[5m])

# Memory usage percentage
openclaw_memory_heap_bytes / openclaw_memory_heap_max_bytes * 100

# Token consumption rate (per hour)
rate(openclaw_model_tokens_total[1h]) * 3600

4. Alert Rule Configuration

4.1 Threshold-Based Alerts

Set alert rules in the OpenClaw configuration:

{
  "alerts": {
    "enabled": true,
    "rules": [
      {
        "name": "High Memory Usage",
        "condition": "memory.heapPercent > 85",
        "duration": "10m",
        "severity": "warning",
        "message": "Memory usage exceeds 85%, current: {value}%"
      },
      {
        "name": "Slow Response",
        "condition": "responseTime.p95 > 5000",
        "duration": "5m",
        "severity": "warning",
        "message": "P95 response time exceeds 5 seconds, current: {value}ms"
      },
      {
        "name": "High Error Rate",
        "condition": "errors.rate > 0.05",
        "duration": "5m",
        "severity": "critical",
        "message": "Error rate exceeds 5%, current: {value}"
      },
      {
        "name": "Channel Disconnected",
        "condition": "channels.disconnected > 0",
        "duration": "3m",
        "severity": "critical",
        "message": "{value} channel(s) disconnected"
      },
      {
        "name": "Queue Backlog",
        "condition": "queue.length > 30",
        "duration": "2m",
        "severity": "warning",
        "message": "Request queue backlog: {value} items"
      }
    ]
  }
}

4.2 Alert Notification Channels

{
  "alerts": {
    "notifications": [
      {
        "type": "telegram",
        "botToken": "YOUR_BOT_TOKEN",
        "chatId": "YOUR_CHAT_ID",
        // Only receive critical-level alerts
        "minSeverity": "critical"
      },
      {
        "type": "webhook",
        "url": "https://hooks.slack.com/services/xxx",
        "minSeverity": "warning"
      },
      {
        "type": "email",
        "to": "[email protected]",
        "minSeverity": "critical"
      }
    ],
    // Alert throttling: minimum interval between identical alerts
    "throttle": "15m",
    // Send notification on resolution
    "notifyOnResolve": true
  }
}

4.3 Grafana Alert Rules

If using Grafana, you can configure more flexible alerts:

# Grafana alert rules
groups:
  - name: openclaw
    rules:
      - alert: OpenClawHighMemory
        expr: openclaw_memory_heap_bytes > 400 * 1024 * 1024
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "OpenClaw memory usage too high"
          description: "Heap memory: {{ $value | humanizeBytes }}"

      - alert: OpenClawMessageBacklog
        expr: openclaw_queue_length > 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OpenClaw message queue backlog"

      - alert: OpenClawDown
        expr: up{job="openclaw"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "OpenClaw service unavailable"

5. Message Volume and Cost Statistics

5.1 Daily Report

# View today's summary statistics
openclaw stats --period today --summary

# Output
# Today's Statistics (2026-03-14)
# ──────────────────────
# Total Messages:    342
# Token Usage:       125,800 (input: 89,500 / output: 36,300)
# Estimated Cost:    $1.85
# Avg Response:      1.8s
# Slowest Response:  6.2s
# Errors:            3 (0.9%)
# Active Users:      28

5.2 Cost Forecasting

# View this month's cost trend and forecast
openclaw stats --cost --period month

# Output
# Monthly Cost Statistics
# ──────────────────────
# Spent:            $42.50
# Daily Average:    $3.04
# Month-End Forecast: $94.20
# Highest Cost Channel: WhatsApp ($22.30)
# Highest Cost User:    user_abc ($8.50)

6. Monitoring Architecture Recommendations

Choose a monitoring approach that matches your deployment scale:

Personal / Small Team (1-5 users):

Use openclaw stats for manual checks
cron + watchdog scripts for basic health checks
Telegram alert notifications

Medium Scale (5-50 users):

Enable Prometheus metrics collection
Deploy a Grafana dashboard
Configure multi-level alert rules
Regularly review cost reports

Large Scale / Enterprise:

Full Prometheus + Grafana + Alertmanager stack
Integrate with enterprise monitoring platforms (Datadog/New Relic)
Centralized log collection (ELK/Loki)
SLA monitoring and automated operations

Choose a monitoring approach that matches your scale — don't over-engineer, but don't leave blind spots on critical metrics either. Continuous monitoring is the foundation for keeping OpenClaw running stably.