Error Handling Overview
AI services face multiple potential failures: provider API outages, rate limits, network timeouts, channel disconnections, and more. Proper error handling configuration can significantly improve service availability.
Provider Error Handling
Auto-Retry
{
"providers": {
"openai": {
"retry": {
"maxAttempts": 3,
"initialDelay": 1000,
"maxDelay": 10000,
"backoffMultiplier": 2,
"retryableErrors": [429, 500, 502, 503, 504, "ECONNRESET", "ETIMEDOUT"]
}
}
}
}
Retries use exponential backoff: 1s, 2s, 4s.
Model Failover
{
"models": {
"primary": {
"provider": "openai",
"model": "gpt-4o",
"fallback": "secondary"
},
"secondary": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"fallback": "local"
},
"local": {
"provider": "ollama",
"model": "llama3.1:8b"
}
}
}
Failover chain: gpt-4o -> Claude -> Llama (local)
Circuit Breaker Pattern
When a provider fails repeatedly, temporarily stop sending requests:
{
"providers": {
"openai": {
"circuitBreaker": {
"enabled": true,
"failureThreshold": 5,
"resetTimeout": 60000,
"halfOpenRequests": 2
}
}
}
}
- After 5 consecutive failures, the circuit breaker opens
- After 60 seconds, it enters a half-open state
- 2 probe requests are allowed through
- If successful, the circuit breaker closes
Rate Limit Handling
{
"providers": {
"openai": {
"rateLimit": {
"respectHeaders": true,
"queueOverflow": true,
"maxQueueSize": 100,
"queueTimeout": 30000
}
}
}
}
When a 429 error is received, OpenClaw will:
- Read the
Retry-Afterheader - Queue the request
- Resend after the specified wait time
Channel Error Handling
Auto-Reconnect
{
"channels": {
"whatsapp-main": {
"reconnect": {
"enabled": true,
"maxAttempts": 10,
"interval": 30000,
"backoff": true,
"notifyOnDisconnect": true,
"notifyChannel": "telegram-admin"
}
}
}
}
Webhook Error Handling
{
"channels": {
"telegram": {
"webhook": {
"errorResponse": {
"onProviderError": "retry",
"onTimeout": "apologize",
"onRateLimit": "queue"
},
"errorMessages": {
"providerDown": "Sorry, the AI service is temporarily unavailable. Please try again later.",
"rateLimit": "Too many requests. Please wait a moment.",
"timeout": "Processing timed out. Please resend your message."
}
}
}
}
}
Global Error Handling
{
"errorHandling": {
"global": {
"uncaughtException": "restart",
"unhandledRejection": "log",
"memoryLimit": {
"threshold": "450MB",
"action": "restart"
}
},
"notifications": {
"enabled": true,
"channels": ["telegram-admin"],
"minSeverity": "error",
"cooldown": 300
}
}
}
Graceful Degradation
When primary features are unavailable, provide degraded service:
{
"degradation": {
"rules": [
{
"condition": "provider.openai.down",
"actions": [
{"switch": "model", "to": "local"},
{"disable": "tools", "except": ["basic"]},
{"notify": "admin"}
]
},
{
"condition": "memory.high",
"actions": [
{"reduce": "maxHistory", "to": 5},
{"disable": "memory_search"}
]
}
]
}
}
Error Log Analysis
openclaw errors stats --period 24h
Error Statistics (last 24h):
Total errors: 23
By type:
Provider timeout: 12 (52%)
Rate limit: 8 (35%)
Channel disconnect: 2 (9%)
Unknown: 1 (4%)
Recovery:
Auto-retried: 18 (78%)
Fell back: 3 (13%)
Failed: 2 (9%)
Test Error Handling
# Simulate a provider failure
openclaw test failover --provider openai
# Simulate a timeout
openclaw test timeout --model main --duration 30
# Simulate rate limiting
openclaw test rate-limit --provider openai
Summary
Reliable error handling is the backbone of OpenClaw's high availability. By combining auto-retry, failover, circuit breakers, and graceful degradation, you can achieve 99.9%+ service uptime. The key is having a response plan for every possible failure scenario.