OpenClaw Context Pruning and Cache TTL Strategies

Introduction

AI model context windows are finite. Even the most advanced models have context windows of only a few hundred thousand tokens. As conversations continue and messages accumulate, managing context becomes a critical concern. OpenClaw provides a sophisticated context pruning mechanism that, combined with TTL (Time-To-Live) based caching strategies, maximizes token savings while maintaining conversation quality.

This article provides a detailed breakdown of the mechanism's design and configuration options.

The Challenge of Context Management

In long conversations, sending the entire message history as context to the model presents several problems:

Soaring token costs: Every API call sends the full history, causing input token costs to grow linearly
Context overflow: Exceeding the model's window limit causes the request to fail outright
Noise interference: Older messages may be irrelevant to the current topic and can actually impair model judgment
Increased latency: The more input tokens, the longer the model takes to generate its first response

OpenClaw systematically addresses these issues through a multi-layer pruning strategy.

Pruning Strategy Layers

OpenClaw's context pruning operates across four layers, applied from coarse to fine:

Original message history
    ↓
Layer 1: Message count limit (maxMessages)
    ↓
Layer 2: Token count limit (maxTokens)
    ↓
Layer 3: TTL time decay (cache TTL)
    ↓
Layer 4: Smart compaction
    ↓
Final context → Sent to model

Layer 1: Message Count Limit

The most basic strategy — limit the maximum number of messages retained:

{
  agents: {
    "my-agent": {
      context: {
        // Keep at most 50 messages
        maxMessages: 50,
        // Differentiate by channel type
        channelOverrides: {
          dm: { maxMessages: 80 },
          group: { maxMessages: 20 }
        }
      }
    }
  }
}

When the limit is exceeded, the oldest messages are discarded (but are still preserved in the JSONL file and not physically deleted).

Layer 2: Token Count Limit

More precise control based on actual token counts:

{
  agents: {
    "my-agent": {
      context: {
        // Maximum context tokens (excluding system prompt)
        maxTokens: 16000,
        // Tokens reserved for model output
        reservedOutputTokens: 4000,
        // Token counting method
        tokenCounter: "tiktoken"  // Precise calculation
      }
    }
  }
}

OpenClaw counts tokens starting from the newest messages going backward. When the cumulative count exceeds maxTokens, it stops and discards older messages.

Layer 3: TTL Time Decay (Core Feature)

This is OpenClaw's most distinctive context management strategy. Based on message timestamps, it applies different retention policies to messages of different "ages":

{
  agents: {
    "my-agent": {
      context: {
        ttl: {
          enabled: true,
          // TTL rules (from newest to oldest)
          rules: [
            {
              // Messages within the last 5 minutes: keep all
              maxAge: "5m",
              keep: "all"
            },
            {
              // Messages from 5 minutes to 1 hour ago: keep the most recent 20
              maxAge: "1h",
              keep: 20
            },
            {
              // Messages from 1 to 6 hours ago: keep the most recent 10
              maxAge: "6h",
              keep: 10
            },
            {
              // Messages from 6 to 24 hours ago: keep only a summary
              maxAge: "24h",
              keep: "summary"
            },
            {
              // Messages older than 24 hours: not included in context
              maxAge: "inf",
              keep: "none"
            }
          ]
        }
      }
    }
  }
}

Design Philosophy Behind the TTL Strategy

Human conversation has a natural time decay characteristic: messages from 5 minutes ago are almost always relevant, messages from an hour ago may be partially relevant, and yesterday's conversation has most likely moved to a different topic. The TTL strategy models this natural decay.

Keep Option Reference

Value	Meaning
`"all"`	Keep all messages within this time window
`number`	Keep the most recent N messages within this time window
`"summary"`	Compress messages in this time window into a summary
`"none"`	Do not include any messages from this time window

Layer 4: Smart Compaction

When a TTL rule specifies keep: "summary", OpenClaw automatically compresses messages in that time window:

{
  agents: {
    "my-agent": {
      context: {
        compaction: {
          enabled: true,
          // Model used for compaction (a cheaper model works well)
          model: "gpt-4o-mini",
          // Maximum tokens for the summary
          summaryMaxTokens: 512,
          // Compaction prompt
          summaryPrompt: "Please compress the following conversation content into a concise summary, preserving key information and user preferences:",
          // Cache compaction results
          cacheSummary: true,
          // Summary cache TTL
          summaryCacheTTL: "1h"
        }
      }
    }
  }
}

Caching Mechanisms in Detail

Context Cache

OpenClaw caches previously built contexts to avoid recalculating on every request:

{
  agents: {
    "my-agent": {
      context: {
        cache: {
          enabled: true,
          // Cache invalidation conditions
          invalidateOn: [
            "new_message",     // Invalidate when a new message arrives
            "config_change"    // Invalidate when configuration changes
          ],
          // Cache storage method
          storage: "memory",   // memory / redis
          // Maximum cache entries
          maxEntries: 1000
        }
      }
    }
  }
}

Prompt Caching (Provider Level)

Some AI providers support Prompt Caching (such as Anthropic and OpenAI), and OpenClaw automatically leverages this feature:

{
  agents: {
    "my-agent": {
      context: {
        providerCache: {
          enabled: true,
          // Mark the system prompt as cacheable (since it rarely changes)
          cacheSystemPrompt: true,
          // Mark the most recent N messages as a cacheable prefix
          cacheableMessageCount: 10
        }
      }
    }
  }
}

Prompt Caching can reduce the cost of repeated context prefixes by up to 90%, making it ideal for scenarios with lengthy system prompts.

Practical Configuration Examples

Personal Assistant (long conversations, deep memory)

{
  context: {
    maxMessages: 100,
    maxTokens: 32000,
    ttl: {
      enabled: true,
      rules: [
        { maxAge: "30m", keep: "all" },
        { maxAge: "4h", keep: 30 },
        { maxAge: "24h", keep: "summary" },
        { maxAge: "inf", keep: "none" }
      ]
    },
    compaction: { enabled: true, summaryMaxTokens: 1024 }
  }
}

Customer Service Bot (short conversations, fast responses)

{
  context: {
    maxMessages: 30,
    maxTokens: 8000,
    ttl: {
      enabled: true,
      rules: [
        { maxAge: "10m", keep: "all" },
        { maxAge: "1h", keep: 10 },
        { maxAge: "inf", keep: "none" }
      ]
    }
  }
}

Group Chat Bot (high message frequency, low context needs)

{
  context: {
    maxMessages: 15,
    maxTokens: 4000,
    ttl: {
      enabled: true,
      rules: [
        { maxAge: "5m", keep: "all" },
        { maxAge: "30m", keep: 5 },
        { maxAge: "inf", keep: "none" }
      ]
    }
  }
}

Monitoring and Tuning

View Context Usage

# View context statistics for a specific Agent
openclaw agent stats my-agent --context

# Example output:
# Active sessions: 45
# Average context tokens: 8,234
# Max context tokens: 28,102
# Compaction executions: 12
# Cache hit rate: 73%

Context Usage Alerts

{
  monitoring: {
    alerts: [{
      name: "context-overflow-warning",
      condition: "context_tokens > maxTokens * 0.9",
      action: "log_warning"
    }]
  }
}

Summary

Context management is one of the most impactful factors in AI Agent operations, affecting both cost and quality. OpenClaw's four-layer pruning strategy — message count limit, token limit, TTL time decay, and smart compaction — provides precise control from coarse to fine. The TTL time decay strategy is a unique OpenClaw design that models the natural temporal decay of human conversation, preserving the integrity of recent context while significantly reducing token consumption for older messages. Combined with prompt caching and context caching mechanisms, you can achieve meaningful token cost optimization without sacrificing conversation quality.