Overview
The Browser Tool is one of OpenClaw's core built-in tools, giving AI agents the ability to directly interact with web pages. With this tool, agents can navigate pages, extract content, fill out forms, click buttons, take screenshots, and even execute complex multi-step web interaction workflows.
Tool Architecture
OpenClaw's browser tool communicates with real browser instances through a sandboxed browser bridge mechanism. Since the Pi SDK is directly embedded, the browser tool is injected as part of the "OpenClaw Built-in Tools" stage in the seven-stage tool pipeline, requiring no additional installation.
The browser tool's communication chain is: AI Agent -> Tool Call -> Sandboxed Browser Bridge -> Browser Instance (WebSocket protocol). This architecture ensures that browser operations execute in a controlled environment.
Enabling and Configuring
Basic Configuration
tools:
browser:
enabled: true
headless: true
bridgeUrl: "ws://localhost:9222"
defaultTimeout: 15000
viewport:
width: 1280
height: 720
Advanced Options
tools:
browser:
userAgent: "OpenClaw-Browser/1.0"
acceptLanguage: "zh-CN,zh;q=0.9,en;q=0.8"
ignoreHTTPSErrors: false
extraHTTPHeaders:
X-Custom-Header: "openclaw"
proxy:
server: "http://proxy.example.com:8080"
Core Operations
Page Navigation
The browser tool supports standard page navigation operations. The AI agent can instruct the tool to open a specified URL, wait for the page to finish loading, and then proceed with subsequent operations. Navigation supports configurable timeouts and wait conditions (such as waiting for a specific element to appear).
Element Interaction
The tool provides a rich set of element interaction capabilities:
- Click: Locate and click elements via selectors
- Fill: Enter text content into input fields
- Select: Interact with dropdown menu options
- Hover: Move the mouse cursor over an element
- Scroll: Scroll within pages or containers
Element targeting supports both CSS selector and text content matching methods. AI agents typically use the content extraction feature first to understand the page structure, then choose an appropriate targeting strategy.
Content Extraction
The browser tool can extract various types of content from pages:
- Text content: Extracts plain text from the page, automatically handling hidden elements and invisible text
- Structured data: Extracts structured information like tables and lists
- Link information: Retrieves all links and their text from the page
- Metadata: Reads page titles, descriptions, Open Graph tags, and more
Screenshot Capability
Screenshots are an important feature of the browser tool. It supports full-page and region-specific screenshots, with output in PNG or JPEG format. Screenshot results can be displayed directly in the conversation or saved to the file system for later use.
tools:
browser:
screenshot:
format: png
quality: 80
fullPage: false
maxWidth: 1920
maxHeight: 1080
Multi-Tab Management
The browser tool supports managing multiple tabs simultaneously. AI agents can switch between different tabs to compare information and consolidate data across pages. To prevent resource abuse, the sandbox limits the maximum number of tabs (default is 3).
Cookie and State Management
Cookie Operations
The tool provides cookie read and write capabilities. This is essential for operations that require authentication. Administrators can pre-configure authentication cookies so AI agents can use them automatically when needed.
Session Persistence
Browser state (including cookies and local storage) can be persisted between sessions. This means a login completed by an AI agent in one conversation remains valid in subsequent conversations, eliminating the need for repeated authentication.
Collaboration with Other Tools
The browser tool is rarely used in isolation — it typically works alongside other tools to complete complex tasks:
- With the web tool: The browser handles interactive operations while the web tool handles API calls
- With the canvas tool: Data extracted from web pages can be visualized through the canvas tool
- With the messaging tool: Browser screenshots or extracted information can be sent via messaging tools
Security Considerations
- Domain allowlist: Always configure
allowedDomainsto prevent AI agents from accessing sites they shouldn't - Credential protection: Never pass passwords in plaintext through conversations; use pre-configured authentication methods
- Download restrictions: The sandbox blocks file downloads by default; if needed, carefully configure allowed file types
- JavaScript execution: The browser tool supports injecting scripts into pages — a powerful capability that should be used with caution
Troubleshooting
- Page load timeout: Check network connectivity and
defaultTimeoutsettings - Element not found: Verify that the selector is correct and the page has fully loaded
- Bridge connection failure: Check the browser instance status and
bridgeUrlconfiguration - Blank screenshot: Check viewport size configuration and page rendering state
Summary
The browser tool gives OpenClaw's AI agents genuine web interaction capabilities. With proper configuration and security constraints, it can safely and efficiently handle everything from simple information lookups to complex multi-step web operations.