Introduction
AI model context windows are finite. Even the most advanced models have context windows of only a few hundred thousand tokens. As conversations continue and messages accumulate, managing context becomes a critical concern. OpenClaw provides a sophisticated context pruning mechanism that, combined with TTL (Time-To-Live) based caching strategies, maximizes token savings while maintaining conversation quality.
This article provides a detailed breakdown of the mechanism's design and configuration options.
The Challenge of Context Management
In long conversations, sending the entire message history as context to the model presents several problems:
- Soaring token costs: Every API call sends the full history, causing input token costs to grow linearly
- Context overflow: Exceeding the model's window limit causes the request to fail outright
- Noise interference: Older messages may be irrelevant to the current topic and can actually impair model judgment
- Increased latency: The more input tokens, the longer the model takes to generate its first response
OpenClaw systematically addresses these issues through a multi-layer pruning strategy.
Pruning Strategy Layers
OpenClaw's context pruning operates across four layers, applied from coarse to fine:
Original message history
↓
Layer 1: Message count limit (maxMessages)
↓
Layer 2: Token count limit (maxTokens)
↓
Layer 3: TTL time decay (cache TTL)
↓
Layer 4: Smart compaction
↓
Final context → Sent to model
Layer 1: Message Count Limit
The most basic strategy — limit the maximum number of messages retained:
{
agents: {
"my-agent": {
context: {
// Keep at most 50 messages
maxMessages: 50,
// Differentiate by channel type
channelOverrides: {
dm: { maxMessages: 80 },
group: { maxMessages: 20 }
}
}
}
}
}
When the limit is exceeded, the oldest messages are discarded (but are still preserved in the JSONL file and not physically deleted).
Layer 2: Token Count Limit
More precise control based on actual token counts:
{
agents: {
"my-agent": {
context: {
// Maximum context tokens (excluding system prompt)
maxTokens: 16000,
// Tokens reserved for model output
reservedOutputTokens: 4000,
// Token counting method
tokenCounter: "tiktoken" // Precise calculation
}
}
}
}
OpenClaw counts tokens starting from the newest messages going backward. When the cumulative count exceeds maxTokens, it stops and discards older messages.
Layer 3: TTL Time Decay (Core Feature)
This is OpenClaw's most distinctive context management strategy. Based on message timestamps, it applies different retention policies to messages of different "ages":
{
agents: {
"my-agent": {
context: {
ttl: {
enabled: true,
// TTL rules (from newest to oldest)
rules: [
{
// Messages within the last 5 minutes: keep all
maxAge: "5m",
keep: "all"
},
{
// Messages from 5 minutes to 1 hour ago: keep the most recent 20
maxAge: "1h",
keep: 20
},
{
// Messages from 1 to 6 hours ago: keep the most recent 10
maxAge: "6h",
keep: 10
},
{
// Messages from 6 to 24 hours ago: keep only a summary
maxAge: "24h",
keep: "summary"
},
{
// Messages older than 24 hours: not included in context
maxAge: "inf",
keep: "none"
}
]
}
}
}
}
}
Design Philosophy Behind the TTL Strategy
Human conversation has a natural time decay characteristic: messages from 5 minutes ago are almost always relevant, messages from an hour ago may be partially relevant, and yesterday's conversation has most likely moved to a different topic. The TTL strategy models this natural decay.
Keep Option Reference
| Value | Meaning |
|---|---|
"all" |
Keep all messages within this time window |
number |
Keep the most recent N messages within this time window |
"summary" |
Compress messages in this time window into a summary |
"none" |
Do not include any messages from this time window |
Layer 4: Smart Compaction
When a TTL rule specifies keep: "summary", OpenClaw automatically compresses messages in that time window:
{
agents: {
"my-agent": {
context: {
compaction: {
enabled: true,
// Model used for compaction (a cheaper model works well)
model: "gpt-4o-mini",
// Maximum tokens for the summary
summaryMaxTokens: 512,
// Compaction prompt
summaryPrompt: "Please compress the following conversation content into a concise summary, preserving key information and user preferences:",
// Cache compaction results
cacheSummary: true,
// Summary cache TTL
summaryCacheTTL: "1h"
}
}
}
}
}
Caching Mechanisms in Detail
Context Cache
OpenClaw caches previously built contexts to avoid recalculating on every request:
{
agents: {
"my-agent": {
context: {
cache: {
enabled: true,
// Cache invalidation conditions
invalidateOn: [
"new_message", // Invalidate when a new message arrives
"config_change" // Invalidate when configuration changes
],
// Cache storage method
storage: "memory", // memory / redis
// Maximum cache entries
maxEntries: 1000
}
}
}
}
}
Prompt Caching (Provider Level)
Some AI providers support Prompt Caching (such as Anthropic and OpenAI), and OpenClaw automatically leverages this feature:
{
agents: {
"my-agent": {
context: {
providerCache: {
enabled: true,
// Mark the system prompt as cacheable (since it rarely changes)
cacheSystemPrompt: true,
// Mark the most recent N messages as a cacheable prefix
cacheableMessageCount: 10
}
}
}
}
}
Prompt Caching can reduce the cost of repeated context prefixes by up to 90%, making it ideal for scenarios with lengthy system prompts.
Practical Configuration Examples
Personal Assistant (long conversations, deep memory)
{
context: {
maxMessages: 100,
maxTokens: 32000,
ttl: {
enabled: true,
rules: [
{ maxAge: "30m", keep: "all" },
{ maxAge: "4h", keep: 30 },
{ maxAge: "24h", keep: "summary" },
{ maxAge: "inf", keep: "none" }
]
},
compaction: { enabled: true, summaryMaxTokens: 1024 }
}
}
Customer Service Bot (short conversations, fast responses)
{
context: {
maxMessages: 30,
maxTokens: 8000,
ttl: {
enabled: true,
rules: [
{ maxAge: "10m", keep: "all" },
{ maxAge: "1h", keep: 10 },
{ maxAge: "inf", keep: "none" }
]
}
}
}
Group Chat Bot (high message frequency, low context needs)
{
context: {
maxMessages: 15,
maxTokens: 4000,
ttl: {
enabled: true,
rules: [
{ maxAge: "5m", keep: "all" },
{ maxAge: "30m", keep: 5 },
{ maxAge: "inf", keep: "none" }
]
}
}
}
Monitoring and Tuning
View Context Usage
# View context statistics for a specific Agent
openclaw agent stats my-agent --context
# Example output:
# Active sessions: 45
# Average context tokens: 8,234
# Max context tokens: 28,102
# Compaction executions: 12
# Cache hit rate: 73%
Context Usage Alerts
{
monitoring: {
alerts: [{
name: "context-overflow-warning",
condition: "context_tokens > maxTokens * 0.9",
action: "log_warning"
}]
}
}
Summary
Context management is one of the most impactful factors in AI Agent operations, affecting both cost and quality. OpenClaw's four-layer pruning strategy — message count limit, token limit, TTL time decay, and smart compaction — provides precise control from coarse to fine. The TTL time decay strategy is a unique OpenClaw design that models the natural temporal decay of human conversation, preserving the integrity of recent context while significantly reducing token consumption for older messages. Combined with prompt caching and context caching mechanisms, you can achieve meaningful token cost optimization without sacrificing conversation quality.