首页 教程 分类 Skills下载 关于
ZH EN JA KO
故障排查

错误处理与自动恢复配置

· 9 分钟

错误处理概述

AI 服务面临多种潜在故障:供应商 API 不可用、速率限制、网络超时、频道断开等。合理配置错误处理机制可以大幅提升服务可用性。

供应商错误处理

自动重试

{
  "providers": {
    "openai": {
      "retry": {
        "maxAttempts": 3,
        "initialDelay": 1000,
        "maxDelay": 10000,
        "backoffMultiplier": 2,
        "retryableErrors": [429, 500, 502, 503, 504, "ECONNRESET", "ETIMEDOUT"]
      }
    }
  }
}

重试时使用指数退避:1s → 2s → 4s

模型故障转移

{
  "models": {
    "primary": {
      "provider": "openai",
      "model": "gpt-4o",
      "fallback": "secondary"
    },
    "secondary": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "fallback": "local"
    },
    "local": {
      "provider": "ollama",
      "model": "llama3.1:8b"
    }
  }
}

故障转移链:gpt-4o → Claude → Llama(本地)

断路器模式

当供应商持续出错时,暂时停止发送请求:

{
  "providers": {
    "openai": {
      "circuitBreaker": {
        "enabled": true,
        "failureThreshold": 5,
        "resetTimeout": 60000,
        "halfOpenRequests": 2
      }
    }
  }
}
  • 连续 5 次失败后打开断路器
  • 60 秒后进入半开状态
  • 允许 2 个试探请求通过
  • 如果成功则关闭断路器

速率限制处理

{
  "providers": {
    "openai": {
      "rateLimit": {
        "respectHeaders": true,
        "queueOverflow": true,
        "maxQueueSize": 100,
        "queueTimeout": 30000
      }
    }
  }
}

当收到 429 错误时,OpenClaw 会:

  1. 读取 Retry-After 头部
  2. 将请求放入队列
  3. 等待指定时间后重新发送

频道错误处理

自动重连

{
  "channels": {
    "whatsapp-main": {
      "reconnect": {
        "enabled": true,
        "maxAttempts": 10,
        "interval": 30000,
        "backoff": true,
        "notifyOnDisconnect": true,
        "notifyChannel": "telegram-admin"
      }
    }
  }
}

Webhook 错误处理

{
  "channels": {
    "telegram": {
      "webhook": {
        "errorResponse": {
          "onProviderError": "retry",
          "onTimeout": "apologize",
          "onRateLimit": "queue"
        },
        "errorMessages": {
          "providerDown": "抱歉,AI 服务暂时不可用,请稍后再试。",
          "rateLimit": "请求过于频繁,请稍等片刻。",
          "timeout": "处理超时了,请重新发送消息。"
        }
      }
    }
  }
}

全局错误处理

{
  "errorHandling": {
    "global": {
      "uncaughtException": "restart",
      "unhandledRejection": "log",
      "memoryLimit": {
        "threshold": "450MB",
        "action": "restart"
      }
    },
    "notifications": {
      "enabled": true,
      "channels": ["telegram-admin"],
      "minSeverity": "error",
      "cooldown": 300
    }
  }
}

优雅降级

当主要功能不可用时,提供降级服务:

{
  "degradation": {
    "rules": [
      {
        "condition": "provider.openai.down",
        "actions": [
          {"switch": "model", "to": "local"},
          {"disable": "tools", "except": ["basic"]},
          {"notify": "admin"}
        ]
      },
      {
        "condition": "memory.high",
        "actions": [
          {"reduce": "maxHistory", "to": 5},
          {"disable": "memory_search"}
        ]
      }
    ]
  }
}

错误日志分析

# 查看错误统计
openclaw errors stats --period 24h
Error Statistics (last 24h):
  Total errors: 23
  By type:
    Provider timeout: 12 (52%)
    Rate limit: 8 (35%)
    Channel disconnect: 2 (9%)
    Unknown: 1 (4%)

  By provider:
    openai: 15
    anthropic: 5
    ollama: 0

  Recovery:
    Auto-retried: 18 (78%)
    Fell back: 3 (13%)
    Failed: 2 (9%)

测试错误处理

# 模拟供应商故障
openclaw test failover --provider openai

# 模拟超时
openclaw test timeout --model main --duration 30

# 模拟速率限制
openclaw test rate-limit --provider openai

总结

可靠的错误处理是 OpenClaw 高可用的保障。通过自动重试、故障转移、断路器和优雅降级的组合配置,可以将服务可用性提升到 99.9% 以上。关键是为每种可能的故障场景都准备好应对方案。

OpenClaw 是开源免费的个人AI助手,支持 WhatsApp、Telegram、Discord 等多平台接入