错误处理概述
AI 服务面临多种潜在故障:供应商 API 不可用、速率限制、网络超时、频道断开等。合理配置错误处理机制可以大幅提升服务可用性。
供应商错误处理
自动重试
{
"providers": {
"openai": {
"retry": {
"maxAttempts": 3,
"initialDelay": 1000,
"maxDelay": 10000,
"backoffMultiplier": 2,
"retryableErrors": [429, 500, 502, 503, 504, "ECONNRESET", "ETIMEDOUT"]
}
}
}
}
重试时使用指数退避:1s → 2s → 4s
模型故障转移
{
"models": {
"primary": {
"provider": "openai",
"model": "gpt-4o",
"fallback": "secondary"
},
"secondary": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"fallback": "local"
},
"local": {
"provider": "ollama",
"model": "llama3.1:8b"
}
}
}
故障转移链:gpt-4o → Claude → Llama(本地)
断路器模式
当供应商持续出错时,暂时停止发送请求:
{
"providers": {
"openai": {
"circuitBreaker": {
"enabled": true,
"failureThreshold": 5,
"resetTimeout": 60000,
"halfOpenRequests": 2
}
}
}
}
- 连续 5 次失败后打开断路器
- 60 秒后进入半开状态
- 允许 2 个试探请求通过
- 如果成功则关闭断路器
速率限制处理
{
"providers": {
"openai": {
"rateLimit": {
"respectHeaders": true,
"queueOverflow": true,
"maxQueueSize": 100,
"queueTimeout": 30000
}
}
}
}
当收到 429 错误时,OpenClaw 会:
- 读取
Retry-After头部 - 将请求放入队列
- 等待指定时间后重新发送
频道错误处理
自动重连
{
"channels": {
"whatsapp-main": {
"reconnect": {
"enabled": true,
"maxAttempts": 10,
"interval": 30000,
"backoff": true,
"notifyOnDisconnect": true,
"notifyChannel": "telegram-admin"
}
}
}
}
Webhook 错误处理
{
"channels": {
"telegram": {
"webhook": {
"errorResponse": {
"onProviderError": "retry",
"onTimeout": "apologize",
"onRateLimit": "queue"
},
"errorMessages": {
"providerDown": "抱歉,AI 服务暂时不可用,请稍后再试。",
"rateLimit": "请求过于频繁,请稍等片刻。",
"timeout": "处理超时了,请重新发送消息。"
}
}
}
}
}
全局错误处理
{
"errorHandling": {
"global": {
"uncaughtException": "restart",
"unhandledRejection": "log",
"memoryLimit": {
"threshold": "450MB",
"action": "restart"
}
},
"notifications": {
"enabled": true,
"channels": ["telegram-admin"],
"minSeverity": "error",
"cooldown": 300
}
}
}
优雅降级
当主要功能不可用时,提供降级服务:
{
"degradation": {
"rules": [
{
"condition": "provider.openai.down",
"actions": [
{"switch": "model", "to": "local"},
{"disable": "tools", "except": ["basic"]},
{"notify": "admin"}
]
},
{
"condition": "memory.high",
"actions": [
{"reduce": "maxHistory", "to": 5},
{"disable": "memory_search"}
]
}
]
}
}
错误日志分析
# 查看错误统计
openclaw errors stats --period 24h
Error Statistics (last 24h):
Total errors: 23
By type:
Provider timeout: 12 (52%)
Rate limit: 8 (35%)
Channel disconnect: 2 (9%)
Unknown: 1 (4%)
By provider:
openai: 15
anthropic: 5
ollama: 0
Recovery:
Auto-retried: 18 (78%)
Fell back: 3 (13%)
Failed: 2 (9%)
测试错误处理
# 模拟供应商故障
openclaw test failover --provider openai
# 模拟超时
openclaw test timeout --model main --duration 30
# 模拟速率限制
openclaw test rate-limit --provider openai
总结
可靠的错误处理是 OpenClaw 高可用的保障。通过自动重试、故障转移、断路器和优雅降级的组合配置,可以将服务可用性提升到 99.9% 以上。关键是为每种可能的故障场景都准备好应对方案。