Home Tutorials Categories Skills About
ZH EN JA KO
Operations

OpenClaw Performance Tuning and Resource Usage Control

· 19 min read

Introduction

As a long-running AI gateway daemon, OpenClaw's resource consumption directly impacts server costs and service stability. This article systematically covers how to optimize OpenClaw's resource usage across four dimensions — memory, CPU, network, and storage — so your service runs efficiently within limited resources.

1. Memory Optimization

1.1 Understanding Memory Consumption Sources

OpenClaw's memory consumption primarily comes from:

  • Conversation history cache: Context for each active user's conversations
  • Channel connection pool: WebSocket persistent connections to various messaging platforms
  • Skill runtime: Loaded skill modules
  • Request queue: Message buffer awaiting processing

Check current memory usage:

# Via the health check endpoint
curl -s http://localhost:18789/health/detail | jq '.memory'

# Example output
# {
#   "heapUsed": "128MB",
#   "heapTotal": "256MB",
#   "rss": "310MB",
#   "external": "15MB"
# }

1.2 Limiting Node.js Heap Memory

Control the V8 engine's maximum heap memory via the NODE_OPTIONS environment variable:

// ~/.config/openclaw/openclaw.json5
{
  "runtime": {
    "nodeOptions": "--max-old-space-size=384"  // in MB
  }
}

Or set it via environment variable:

export NODE_OPTIONS="--max-old-space-size=384"
openclaw restart

Recommended values:

Server Memory Suggested Heap Limit Use Case
512MB 256MB Personal use, 1-2 channels
1GB 384MB Small team, 3-5 channels
2GB 512MB Medium load, multiple channels
4GB+ 1024MB High-load production

1.3 Controlling Conversation History Length

Conversation history is a major memory consumer. The longer each user's history, the greater the token consumption and memory usage per API call:

// ~/.config/openclaw/openclaw.json5
{
  "conversation": {
    // Maximum conversation turns to keep
    "maxHistory": 20,
    // Maximum token limit (older messages are truncated when exceeded)
    "maxTokens": 8000,
    // Conversation timeout (history cleared after this idle period)
    "idleTimeout": "30m"
  }
}

For resource-constrained environments, you can compress further:

{
  "conversation": {
    "maxHistory": 10,
    "maxTokens": 4000,
    "idleTimeout": "15m",
    // Enable history summary compression: automatically summarizes old conversations when limits are exceeded
    "summarize": true
  }
}

1.4 Memory Cache Strategy

{
  "cache": {
    // User profile cache size
    "userProfileMaxSize": 100,
    // Skill result cache TTL
    "skillResultTTL": "5m",
    // Maximum cache entries
    "maxEntries": 500
  }
}

2. CPU Optimization

2.1 Limiting Concurrent Requests

Processing too many requests simultaneously causes CPU spikes and increased response latency. Control concurrency via queuing:

{
  "gateway": {
    // Maximum concurrent model requests
    "maxConcurrentRequests": 5,
    // Maximum request queue length
    "queueMaxSize": 50,
    // Queue wait timeout
    "queueTimeout": "30s"
  }
}

2.2 Optimizing Skill Loading

Unnecessary skills consume CPU and memory resources. Only load the skills you actually use:

{
  "skills": {
    // Explicitly specify enabled skills instead of loading all
    "enabled": ["weather", "reminder", "rss"],
    // Skill execution timeout
    "timeout": 10000,
    // Skill result caching (reduces redundant computation)
    "cache": true
  }
}

2.3 Using Systemd CPU Quotas

If you manage OpenClaw via Systemd, you can hard-limit CPU usage:

# /etc/systemd/system/openclaw.service
[Service]
CPUQuota=50%          # Limit to 50% CPU max
CPUWeight=80          # Scheduling weight (100 is standard)

2.4 PM2 Cluster Mode

On multi-core servers, use PM2's cluster mode to distribute the load:

// openclaw-ecosystem.config.js
module.exports = {
  apps: [{
    name: "openclaw",
    script: "openclaw",
    args: "up",
    instances: 2,          // Start 2 instances
    exec_mode: "cluster",
    max_memory_restart: "400M"
  }]
};

Note: Cluster mode requires OpenClaw to support multi-instance operation. Make sure shared session storage is configured.

3. Model Invocation Optimization

3.1 Choosing the Right Model

Model selection directly affects response speed and cost:

{
  "model": {
    // Use a lightweight model for everyday conversations
    "default": "claude-3-5-haiku",
    // Use an advanced model for complex tasks
    "advanced": "claude-3.5-sonnet",
    // Automatically switch based on message complexity
    "autoSwitch": true,
    "autoSwitchThreshold": 100  // Use advanced model when input exceeds 100 characters
  }
}

3.2 Enabling Streaming Responses

Streaming responses reduce time-to-first-byte latency and improve user experience:

{
  "model": {
    "stream": true,
    // Streaming chunk send interval (for channels that don't support streaming)
    "streamChunkInterval": 500
  }
}

3.3 Setting Request Timeouts

Prevent abnormal requests from occupying resources for extended periods:

{
  "model": {
    "timeout": 30000,     // Single request timeout: 30 seconds
    "maxRetries": 2,      // Maximum retry count
    "retryDelay": 2000    // Retry interval: 2 seconds
  }
}

4. Storage and Network Optimization

4.1 Log File Control

Log file growth can consume significant disk space:

{
  "log": {
    "level": "info",        // Don't use debug in production
    "rotation": {
      "maxSize": "30MB",
      "maxAge": 14,         // Retain for 14 days
      "compress": true
    }
  }
}

4.2 Session Data Cleanup

Periodically clean up expired session data:

# Check session data size
du -sh ~/.openclaw/sessions/

# Clean up sessions inactive for more than 30 days
openclaw session cleanup --older-than 30d

# Set up auto-cleanup
openclaw config set session.autoCleanup true
openclaw config set session.cleanupInterval "24h"
openclaw config set session.maxAge "30d"

4.3 Network Connection Optimization

{
  "gateway": {
    // HTTP Keep-Alive timeout
    "keepAliveTimeout": 30000,
    // Connection pool size
    "maxSockets": 20,
    // Enable gzip compression
    "compression": true
  }
}

5. Performance Monitoring and Benchmarking

5.1 Built-in Performance Metrics

# View runtime performance statistics
curl -s http://localhost:18789/health/stats | jq .

# Example output
# {
#   "avgResponseTime": "1.8s",
#   "p95ResponseTime": "3.2s",
#   "messagesPerMinute": 12,
#   "activeConnections": 3,
#   "queueLength": 0
# }

5.2 Resource Usage Trends

Combine with Prometheus metrics to monitor resource usage trends:

# View memory trend over the past 24 hours
openclaw stats --metric memory --period 24h

# View response time trend
openclaw stats --metric latency --period 7d

5.3 Verifying Optimization Results

After each configuration adjustment, it's recommended to observe the following metrics for at least 24 hours:

Metric Healthy Range Requires Attention
Heap memory usage < 70% > 85%
Average response time < 3s > 5s
Error rate < 1% > 5%
CPU usage < 40% > 70%
Queue length 0-5 > 20

6. Optimization for Low-Spec Servers

For entry-level VPS instances with 512MB of memory, the following aggressive optimization configuration is recommended:

{
  "runtime": {
    "nodeOptions": "--max-old-space-size=256"
  },
  "conversation": {
    "maxHistory": 8,
    "maxTokens": 3000,
    "idleTimeout": "10m",
    "summarize": true
  },
  "gateway": {
    "maxConcurrentRequests": 2,
    "queueMaxSize": 20
  },
  "model": {
    "default": "claude-3-5-haiku",
    "stream": true,
    "timeout": 20000,
    "maxRetries": 1
  },
  "skills": {
    "enabled": [],
    "cache": true
  },
  "log": {
    "level": "warn",
    "rotation": {
      "maxSize": "10MB",
      "maxAge": 7
    }
  }
}

With the optimizations above, OpenClaw can run stably on a server with 512MB of memory while maintaining acceptable response speeds. Gradually adjust the parameters based on your actual load to find the best balance between performance and resources.

OpenClaw is a free, open-source personal AI assistant that supports WhatsApp, Telegram, Discord, and many more platforms