Introduction
AI models have limited context windows, and conversation history keeps growing with each interaction. How to retain the most relevant information within a limited context while controlling API call costs is a problem every AI gateway must solve. OpenClaw provides flexible history length controls and automatic compression mechanisms, helping you find the right balance between conversation quality and resource consumption.
Why Control History Length
Not controlling history length leads to the following problems:
- Cost Spiral: Every API call sends the full context history — the longer the history, the more tokens consumed
- Slower Responses: Models take longer to process longer contexts
- Information Noise: Conversation content from long ago may be irrelevant to the current topic, actually interfering with the model's judgment
- Exceeding Limits: Directly exceeding the model's context window causes API errors
Basic Configuration
Configure history length limits in the sessions section of openclaw.json:
{
"sessions": {
"maxHistoryMessages": 50,
"maxHistoryTokens": 8000
}
}
Dual Limit Mechanism
OpenClaw uses a dual limit on both message count and token count, whichever threshold is reached first:
maxHistoryMessages: Maximum number of messages retained in context. Each message includes both user input and AI responsemaxHistoryTokens: Maximum number of tokens retained in context. OpenClaw uses the corresponding model's tokenizer for precise calculation
For example, if you set maxHistoryMessages: 50 and maxHistoryTokens: 8000, when there are already 30 messages but the token count has reached 8000, truncation or compression will begin even though the message count has not been reached.
Differentiated Configuration by Channel Type
The usage patterns of DMs and group chats differ significantly. DMs are typically continuous long conversations that benefit from retaining more context; group chats tend to be fragmented short interactions where historical value is relatively lower. OpenClaw supports setting different limits by channel type:
{
"sessions": {
"maxHistoryMessages": 50,
"maxHistoryTokens": 8000,
"channelOverrides": {
"dm": {
"maxHistoryMessages": 100,
"maxHistoryTokens": 16000
},
"group": {
"maxHistoryMessages": 20,
"maxHistoryTokens": 4000
}
}
}
}
Supported Channel Types
| Type ID | Description | Suggested Configuration |
|---|---|---|
dm |
Direct message / one-on-one conversation | Longer history (50-100 messages) |
group |
Group chat / multi-person conversation | Shorter history (10-30 messages) |
channel |
Channel messages (e.g., Discord channels) | Medium history (20-50 messages) |
thread |
Topics / threads | Medium history (20-50 messages) |
Per-Platform Configuration
You can also set independent limits for specific messaging platforms:
{
"sessions": {
"channelOverrides": {
"dm": {
"maxHistoryMessages": 100
},
"group": {
"maxHistoryMessages": 20
}
},
"platformOverrides": {
"telegram": {
"maxHistoryMessages": 80,
"maxHistoryTokens": 12000
},
"discord": {
"maxHistoryMessages": 30,
"maxHistoryTokens": 6000
}
}
}
}
Priority order: platformOverrides > channelOverrides > global defaults.
Automatic Compression Mechanisms
When conversation history approaches the limit, OpenClaw provides two handling strategies: simple truncation and intelligent compression.
Truncation Strategy
{
"sessions": {
"autoCompaction": true,
"compactionStrategy": "truncate"
}
}
Simply discards the oldest messages, retaining only the most recent N messages. This approach is simple and efficient with no additional API calls, but completely loses information from earlier conversations.
Workflow:
Original history: [msg1, msg2, msg3, ..., msg50, msg51]
After truncation: [msg21, msg22, ..., msg50, msg51] (keeping the most recent 30)
Summary Strategy
{
"sessions": {
"autoCompaction": true,
"compactionStrategy": "summary",
"compactionThreshold": 0.8,
"summaryMaxTokens": 500
}
}
When context usage reaches the compactionThreshold (default 80%), OpenClaw automatically sends the older conversation history to the model for summarization. The summary result is inserted as a system message at the beginning of the context, replacing the original history messages.
Workflow:
Step 1: Context usage detected at 80%
Step 2: Send the first 30 messages to the model, requesting a summary
Step 3: Model returns the summary text
Step 4: Replace the original 30 messages with the summary
Final context: [summary message, msg31, msg32, ..., msg51]
Summary Configuration Details
{
"sessions": {
"compaction": {
"strategy": "summary",
"threshold": 0.8,
"summaryMaxTokens": 500,
"summaryModel": "default",
"summaryPrompt": "Please compress the following conversation history into a concise summary, retaining key information and user preferences:",
"preserveSystemMessages": true,
"preserveLastN": 10
}
}
}
| Parameter | Description |
|---|---|
threshold |
Context usage ratio threshold that triggers compression |
summaryMaxTokens |
Maximum token count for the summary text |
summaryModel |
Model used for generating summaries; "default" uses the current model |
summaryPrompt |
Custom summarization instruction |
preserveSystemMessages |
Whether to preserve original system messages from compression |
preserveLastN |
Always keep the most recent N messages out of compression |
Rolling Compression
For extremely long conversations, compression may trigger multiple times. Each compression merges the previous summary with new old messages to generate an updated summary. This process is called "rolling compression":
First compression: [msg1-30] → Summary A
Second compression: [Summary A, msg31-50] → Summary B
Third compression: [Summary B, msg51-70] → Summary C
As compression occurs more times, the earliest information is gradually condensed and summarized, but key information is typically preserved.
Cost Estimation
Different history limit strategies have a significant impact on API costs. Here is a rough estimate:
| Configuration | Avg tokens per request | Monthly cost (100 messages/day) |
|---|---|---|
| maxHistoryMessages: 10 | ~2,000 | Lower |
| maxHistoryMessages: 50 | ~8,000 | Medium |
| maxHistoryMessages: 100 | ~15,000 | Higher |
| summary compression | ~3,000 + compression overhead | Medium-low |
Actual costs depend on model pricing, average message length, and usage frequency.
Manual History Management
In addition to automatic mechanisms, you can manually manage session history via the command line:
# View history statistics for a session
openclaw session stats --session telegram_123456
# Manually trigger compression
openclaw session compact --session telegram_123456
# Trim history but keep the session
openclaw session trim --session telegram_123456 --keep 10
# Completely clear a session
openclaw session clear --session telegram_123456
Users can also use built-in commands in chat to manage their own history:
/clear - Clear current conversation history
/compact - Manually trigger conversation compression
/history - View current history message count and token usage
Recommended Configurations
Personal Use (Cost Priority)
{
"sessions": {
"maxHistoryMessages": 20,
"maxHistoryTokens": 4000,
"autoCompaction": true,
"compactionStrategy": "truncate"
}
}
Daily Use (Balanced Approach)
{
"sessions": {
"maxHistoryMessages": 50,
"maxHistoryTokens": 8000,
"autoCompaction": true,
"compactionStrategy": "summary",
"channelOverrides": {
"dm": { "maxHistoryMessages": 80 },
"group": { "maxHistoryMessages": 20 }
}
}
}
Deep Conversations (Quality Priority)
{
"sessions": {
"maxHistoryMessages": 150,
"maxHistoryTokens": 32000,
"autoCompaction": true,
"compactionStrategy": "summary",
"compaction": {
"summaryMaxTokens": 1000,
"preserveLastN": 30
}
}
}
Summary
Conversation history length control is a core tool for cost management and conversation quality optimization in OpenClaw. By configuring differentiated history limits by channel type, combined with summary compression or simple truncation strategies, you can find the optimal balance for different scenarios. It is recommended to start with a moderate configuration and gradually tune based on actual token usage and conversation quality feedback.