OpenClaw Web Search Skill Configuration and Optimization

Overview

Web search is a critical capability for AI agents to access real-time information. OpenClaw's web built-in tool provides powerful search functionality, supports multiple search engine backends, and can work with the browser tool for deep information extraction. This article covers how to configure and optimize OpenClaw's web search skill in detail.

Web Tool Architecture

OpenClaw's web tool is one of the core tools injected during the "OpenClaw Built-in Tools" stage of the seven-stage tool pipeline. It runs directly within the Pi SDK embedding layer and can execute HTTP requests, call search APIs, and scrape web content.

Unlike the browser tool, the web tool does not require an actual browser instance. Instead, it makes requests directly through an HTTP client, making it more lightweight and faster — ideal for high-concurrency search operations.

Search Engine Configuration

Google Search API

tools:
  web:
    search:
      provider: google
      apiKey: ${GOOGLE_API_KEY}
      searchEngineId: ${GOOGLE_CX}
      defaultResultCount: 10
      safeSearch: moderate
      language: zh-CN
      region: CN

Bing Search API

tools:
  web:
    search:
      provider: bing
      apiKey: ${BING_API_KEY}
      defaultResultCount: 10
      market: zh-CN
      safeSearch: moderate

SearXNG (Self-Hosted)

For scenarios that prioritize privacy or require full control over search behavior, OpenClaw supports connecting to a self-hosted SearXNG instance:

tools:
  web:
    search:
      provider: searxng
      endpoint: "http://localhost:8888/search"
      format: json
      engines:
        - google
        - bing
        - duckduckgo
      defaultResultCount: 10

Search Result Processing

Result Formatting

Search results are formatted before being presented to the AI agent. Each result includes a title, URL, snippet, and source information. The AI agent uses this information to determine which results are worth reading in depth.

Content Extraction

When the AI agent decides to read a search result in depth, the web tool fetches the target page and extracts the main content. The extraction process includes:

HTML parsing: Parsing the page DOM structure
Content identification: Using algorithms to identify the main content area, filtering out navigation, ads, sidebars, and other irrelevant content
Format conversion: Converting HTML to clean plain text or Markdown format
Length control: Truncating overly long content while preserving the most relevant parts

tools:
  web:
    extraction:
      method: readability
      maxContentLength: 5000
      includeImages: false
      includeLinks: true
      outputFormat: markdown

Caching Strategy

To reduce duplicate requests and improve response speed, the web tool includes multi-layer caching:

tools:
  web:
    cache:
      searchResults:
        enabled: true
        ttl: 3600
        maxEntries: 1000
      pageContent:
        enabled: true
        ttl: 7200
        maxSize: 100MB

Search result cache TTL is typically set shorter (1 hour) since search results may change frequently. Page content cache can have a longer TTL since page content changes more slowly.

Search Quality Optimization

Query Rewriting

Before calling the search, the AI agent automatically rewrites the user's natural language question, extracting keywords and combining them into a more effective search query.

tools:
  web:
    queryRewriting:
      enabled: true
      addDateFilter: auto
      expandAcronyms: true

When addDateFilter is set to auto, the AI agent automatically adds date filters based on the timeliness requirements of the question. For example, "latest tech news" will automatically restrict the search scope to recent content.

Multi-Round Search

For complex questions, a single search is often not enough. OpenClaw supports the AI agent performing multi-round searches — first conducting a broad search to understand the general direction, then refining queries for deeper searches based on initial results.

tools:
  web:
    multiRound:
      enabled: true
      maxRounds: 3
      maxTotalResults: 30

Source Diversity

To avoid single-source information bias, you can configure source diversity requirements:

tools:
  web:
    diversity:
      minDomains: 3
      maxResultsPerDomain: 3

Collaboration with the Browser Tool

The web tool and browser tool are complementary:

Web tool: Suitable for quick searches and lightweight content extraction, no JavaScript rendering needed
Browser tool: Suitable for pages that require interaction or JavaScript rendering

The AI agent automatically selects the appropriate tool based on page characteristics. When the web tool's extracted content is incomplete (e.g., single-page applications), the agent switches to the browser tool for complete rendering and extraction.

Channel Adaptation

Search results are displayed differently across channels:

Discord: Uses embed messages to display search result cards with title, snippet, and link
Telegram: Uses HTML formatting with directly previewable links
Slack: Uses Block Kit for structured search results
WhatsApp: Plain text format with clickable links

Security and Compliance

Domain Filtering

tools:
  web:
    security:
      blockedDomains:
        - "*.malware.com"
        - "phishing-site.example"
      allowedDomains: []  # Empty means all unblocked domains are allowed

Content Filtering

Search results and extracted content undergo content safety checks to filter inappropriate content.

Rate Limiting

tools:
  web:
    rateLimit:
      searchesPerMinute: 30
      pagesPerMinute: 60
      perUser:
        searchesPerMinute: 5

Monitoring and Analytics

OpenClaw records search tool usage statistics, including search count, average response time, cache hit rate, and common query terms. This data helps you understand search tool usage patterns and continuously optimize your configuration.

Summary

The web search skill is the AI agent's window to the internet. By properly configuring search engines, optimizing query strategies, and setting up caching and security rules, you can enable the AI agent to efficiently and securely access real-time information, providing users with accurate and timely answers.