Overview
Web search is a critical capability for AI agents to access real-time information. OpenClaw's web built-in tool provides powerful search functionality, supports multiple search engine backends, and can work with the browser tool for deep information extraction. This article covers how to configure and optimize OpenClaw's web search skill in detail.
Web Tool Architecture
OpenClaw's web tool is one of the core tools injected during the "OpenClaw Built-in Tools" stage of the seven-stage tool pipeline. It runs directly within the Pi SDK embedding layer and can execute HTTP requests, call search APIs, and scrape web content.
Unlike the browser tool, the web tool does not require an actual browser instance. Instead, it makes requests directly through an HTTP client, making it more lightweight and faster — ideal for high-concurrency search operations.
Search Engine Configuration
Google Search API
tools:
web:
search:
provider: google
apiKey: ${GOOGLE_API_KEY}
searchEngineId: ${GOOGLE_CX}
defaultResultCount: 10
safeSearch: moderate
language: zh-CN
region: CN
Bing Search API
tools:
web:
search:
provider: bing
apiKey: ${BING_API_KEY}
defaultResultCount: 10
market: zh-CN
safeSearch: moderate
SearXNG (Self-Hosted)
For scenarios that prioritize privacy or require full control over search behavior, OpenClaw supports connecting to a self-hosted SearXNG instance:
tools:
web:
search:
provider: searxng
endpoint: "http://localhost:8888/search"
format: json
engines:
- google
- bing
- duckduckgo
defaultResultCount: 10
Search Result Processing
Result Formatting
Search results are formatted before being presented to the AI agent. Each result includes a title, URL, snippet, and source information. The AI agent uses this information to determine which results are worth reading in depth.
Content Extraction
When the AI agent decides to read a search result in depth, the web tool fetches the target page and extracts the main content. The extraction process includes:
- HTML parsing: Parsing the page DOM structure
- Content identification: Using algorithms to identify the main content area, filtering out navigation, ads, sidebars, and other irrelevant content
- Format conversion: Converting HTML to clean plain text or Markdown format
- Length control: Truncating overly long content while preserving the most relevant parts
tools:
web:
extraction:
method: readability
maxContentLength: 5000
includeImages: false
includeLinks: true
outputFormat: markdown
Caching Strategy
To reduce duplicate requests and improve response speed, the web tool includes multi-layer caching:
tools:
web:
cache:
searchResults:
enabled: true
ttl: 3600
maxEntries: 1000
pageContent:
enabled: true
ttl: 7200
maxSize: 100MB
Search result cache TTL is typically set shorter (1 hour) since search results may change frequently. Page content cache can have a longer TTL since page content changes more slowly.
Search Quality Optimization
Query Rewriting
Before calling the search, the AI agent automatically rewrites the user's natural language question, extracting keywords and combining them into a more effective search query.
tools:
web:
queryRewriting:
enabled: true
addDateFilter: auto
expandAcronyms: true
When addDateFilter is set to auto, the AI agent automatically adds date filters based on the timeliness requirements of the question. For example, "latest tech news" will automatically restrict the search scope to recent content.
Multi-Round Search
For complex questions, a single search is often not enough. OpenClaw supports the AI agent performing multi-round searches — first conducting a broad search to understand the general direction, then refining queries for deeper searches based on initial results.
tools:
web:
multiRound:
enabled: true
maxRounds: 3
maxTotalResults: 30
Source Diversity
To avoid single-source information bias, you can configure source diversity requirements:
tools:
web:
diversity:
minDomains: 3
maxResultsPerDomain: 3
Collaboration with the Browser Tool
The web tool and browser tool are complementary:
- Web tool: Suitable for quick searches and lightweight content extraction, no JavaScript rendering needed
- Browser tool: Suitable for pages that require interaction or JavaScript rendering
The AI agent automatically selects the appropriate tool based on page characteristics. When the web tool's extracted content is incomplete (e.g., single-page applications), the agent switches to the browser tool for complete rendering and extraction.
Channel Adaptation
Search results are displayed differently across channels:
- Discord: Uses embed messages to display search result cards with title, snippet, and link
- Telegram: Uses HTML formatting with directly previewable links
- Slack: Uses Block Kit for structured search results
- WhatsApp: Plain text format with clickable links
Security and Compliance
Domain Filtering
tools:
web:
security:
blockedDomains:
- "*.malware.com"
- "phishing-site.example"
allowedDomains: [] # Empty means all unblocked domains are allowed
Content Filtering
Search results and extracted content undergo content safety checks to filter inappropriate content.
Rate Limiting
tools:
web:
rateLimit:
searchesPerMinute: 30
pagesPerMinute: 60
perUser:
searchesPerMinute: 5
Monitoring and Analytics
OpenClaw records search tool usage statistics, including search count, average response time, cache hit rate, and common query terms. This data helps you understand search tool usage patterns and continuously optimize your configuration.
Summary
The web search skill is the AI agent's window to the internet. By properly configuring search engines, optimizing query strategies, and setting up caching and security rules, you can enable the AI agent to efficiently and securely access real-time information, providing users with accurate and timely answers.