前言
对话数据是 AI Agent 运营中最宝贵的资产之一。通过分析对话记录,你可以了解用户的真实需求、发现 Agent 的不足之处、优化提示词效果、计算成本消耗。OpenClaw 以 JSONL 格式存储所有会话数据,提供了灵活的导出工具和丰富的分析维度。
本文将详细介绍 JSONL 会话文件的结构、导出方法以及实际的数据分析场景。
JSONL 会话文件结构
文件位置
OpenClaw 的会话数据默认存储在以下路径:
~/.openclaw/agents/<agentId>/sessions/
├── session-abc123.jsonl
├── session-def456.jsonl
└── session-ghi789.jsonl
消息格式
每行是一个独立的 JSON 对象,代表一条消息:
{"id":"msg_001","parentId":null,"role":"user","content":"你好","timestamp":1710400000,"metadata":{"userId":"user_123","platform":"telegram","channelType":"dm"}}
{"id":"msg_002","parentId":"msg_001","role":"assistant","content":"你好!有什么可以帮你的?","timestamp":1710400002,"metadata":{"model":"claude-sonnet-4-20250514","inputTokens":45,"outputTokens":12}}
{"id":"msg_003","parentId":"msg_002","role":"user","content":"帮我写一段Python代码","timestamp":1710400010,"metadata":{"userId":"user_123"}}
{"id":"msg_004","parentId":"msg_003","role":"assistant","content":"好的,这是一段示例代码...","timestamp":1710400015,"metadata":{"model":"claude-sonnet-4-20250514","inputTokens":120,"outputTokens":280,"toolsUsed":["run_code"]}}
消息字段说明
| 字段 | 类型 | 说明 |
|---|---|---|
id |
string | 消息唯一ID |
parentId |
string/null | 父消息ID,形成树形结构 |
role |
string | 角色:user / assistant / system / tool |
content |
string | 消息内容 |
timestamp |
number | Unix 时间戳(秒) |
type |
string | 特殊类型:compaction / edit / branch |
metadata |
object | 元数据 |
元数据字段
{
"metadata": {
"userId": "user_123",
"platform": "telegram",
"channelType": "dm",
"channelId": "chat_456",
"model": "claude-sonnet-4-20250514",
"inputTokens": 150,
"outputTokens": 320,
"totalTokens": 470,
"latencyMs": 2340,
"toolsUsed": ["search", "run_code"],
"costUsd": 0.0023
}
}
导出对话记录
命令行导出
# 导出某Agent的所有会话
openclaw export --agent my-agent --output ./export/
# 导出指定时间范围
openclaw export --agent my-agent \
--from "2026-03-01" --to "2026-03-14" \
--output ./export/march.jsonl
# 导出为CSV格式
openclaw export --agent my-agent \
--format csv --output ./export/conversations.csv
# 导出为JSON格式(带完整的树形结构)
openclaw export --agent my-agent \
--format json --output ./export/conversations.json
# 仅导出特定用户的对话
openclaw export --agent my-agent \
--user-id "user_123" --output ./export/user123.jsonl
# 仅导出特定平台的对话
openclaw export --agent my-agent \
--platform telegram --output ./export/telegram.jsonl
API 导出
# 通过API导出
curl -X GET "http://localhost:3000/api/v1/export/conversations" \
-H "Authorization: Bearer sk-openclaw-xxx" \
-G -d "agentId=my-agent" \
-d "from=2026-03-01" \
-d "to=2026-03-14" \
-d "format=jsonl" \
-o conversations.jsonl
Dashboard 导出
OpenClaw Web Dashboard 提供了可视化的导出界面:
- 进入 Dashboard → 会话管理
- 设置筛选条件(时间范围、Agent、平台等)
- 点击"导出"按钮
- 选择格式(JSONL / CSV / JSON)
- 下载文件
数据分析实践
Python 分析脚本
基础数据加载
import json
from datetime import datetime
from collections import Counter, defaultdict
def load_sessions(filepath):
messages = []
with open(filepath, "r", encoding="utf-8") as f:
for line in f:
if line.strip():
messages.append(json.loads(line))
return messages
messages = load_sessions("./export/conversations.jsonl")
print(f"总消息数: {len(messages)}")
分析一:Token 消耗统计
def analyze_token_usage(messages):
total_input = 0
total_output = 0
daily_usage = defaultdict(lambda: {"input": 0, "output": 0})
for msg in messages:
if msg["role"] == "assistant" and "metadata" in msg:
meta = msg["metadata"]
input_t = meta.get("inputTokens", 0)
output_t = meta.get("outputTokens", 0)
total_input += input_t
total_output += output_t
date = datetime.fromtimestamp(msg["timestamp"]).strftime("%Y-%m-%d")
daily_usage[date]["input"] += input_t
daily_usage[date]["output"] += output_t
print(f"总输入Token: {total_input:,}")
print(f"总输出Token: {total_output:,}")
print(f"日均输入Token: {total_input // max(len(daily_usage), 1):,}")
return daily_usage
usage = analyze_token_usage(messages)
分析二:用户活跃度
def analyze_user_activity(messages):
user_messages = Counter()
user_sessions = defaultdict(set)
for msg in messages:
if msg["role"] == "user" and "metadata" in msg:
uid = msg["metadata"].get("userId", "unknown")
user_messages[uid] += 1
print("Top 10 活跃用户:")
for uid, count in user_messages.most_common(10):
print(f" {uid}: {count} 条消息")
return user_messages
activity = analyze_user_activity(messages)
分析三:常见问题分类
def categorize_topics(messages):
"""简单的关键词分类"""
categories = {
"技术问题": ["代码", "bug", "错误", "部署", "配置"],
"产品咨询": ["价格", "功能", "对比", "试用"],
"使用帮助": ["怎么", "如何", "教程", "步骤"],
"反馈建议": ["建议", "希望", "改进", "不好用"]
}
results = Counter()
for msg in messages:
if msg["role"] == "user":
content = msg["content"]
for category, keywords in categories.items():
if any(kw in content for kw in keywords):
results[category] += 1
break
return results
topics = categorize_topics(messages)
for topic, count in topics.most_common():
print(f" {topic}: {count}")
分析四:响应延迟分布
def analyze_latency(messages):
latencies = []
for msg in messages:
if msg["role"] == "assistant" and "metadata" in msg:
latency = msg["metadata"].get("latencyMs")
if latency:
latencies.append(latency)
if latencies:
latencies.sort()
print(f"平均延迟: {sum(latencies)/len(latencies):.0f}ms")
print(f"P50延迟: {latencies[len(latencies)//2]:.0f}ms")
print(f"P95延迟: {latencies[int(len(latencies)*0.95)]:.0f}ms")
print(f"P99延迟: {latencies[int(len(latencies)*0.99)]:.0f}ms")
analyze_latency(messages)
分析五:工具使用统计
def analyze_tool_usage(messages):
tool_counts = Counter()
for msg in messages:
if msg["role"] == "assistant" and "metadata" in msg:
tools = msg["metadata"].get("toolsUsed", [])
for tool in tools:
tool_counts[tool] += 1
print("工具使用频率:")
for tool, count in tool_counts.most_common():
print(f" {tool}: {count} 次")
analyze_tool_usage(messages)
数据可视化
使用 matplotlib 绘图
import matplotlib.pyplot as plt
def plot_daily_usage(daily_usage):
dates = sorted(daily_usage.keys())
input_tokens = [daily_usage[d]["input"] for d in dates]
output_tokens = [daily_usage[d]["output"] for d in dates]
fig, ax = plt.subplots(figsize=(12, 6))
ax.bar(dates, input_tokens, label="输入Token", alpha=0.7)
ax.bar(dates, output_tokens, bottom=input_tokens,
label="输出Token", alpha=0.7)
ax.set_xlabel("日期")
ax.set_ylabel("Token数")
ax.set_title("每日Token消耗趋势")
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("daily_token_usage.png", dpi=150)
plot_daily_usage(usage)
隐私与合规
数据脱敏导出
# 导出时自动脱敏
openclaw export --agent my-agent \
--anonymize \
--redact-patterns "phone,email,id_card" \
--output ./export/anonymized.jsonl
数据保留策略
{
storage: {
retention: {
// 会话数据保留天数
sessionTTL: 90, // 90天后自动清理
// 导出数据保留天数
exportTTL: 30,
// 清理前是否自动归档
archiveBeforeDelete: true,
archiveDir: "./archive/"
}
}
}
小结
OpenClaw 的 JSONL 会话存储格式既简单又强大——每行一条消息,JSON 格式易于解析,树形结构完整记录对话分支。通过命令行、API 或 Dashboard 导出数据后,你可以使用 Python 等工具进行深入分析,从 Token 消耗、用户活跃度、话题分布到响应延迟,全方位了解 AI Agent 的运行状况。这些数据洞察将帮助你持续优化 Agent 的表现和成本效率。