OpenClaw Image Generation Skill Configuration

Overview

Image generation is one of the most popular capabilities of AI agents. OpenClaw supports integrating multiple AI image generation services through its skill plugin system, enabling agents to generate images based on users' text descriptions. This article covers how to configure and optimize the image generation skill in OpenClaw.

Skill Architecture

The image generation skill is part of OpenClaw's tool system, registered and managed within the seven-stage tool pipeline. It combines the graphic processing capabilities of the canvas built-in tool with external AI image generation APIs.

The Pi SDK's direct embedding feature shortens the image generation call chain — everything from user request to API call happens within the same runtime without inter-process communication.

Supported Generation Services

OpenAI DALL-E

skills:
  imageGen:
    provider: openai
    model: dall-e-3
    apiKey: ${OPENAI_API_KEY}
    defaultSize: "1024x1024"
    defaultQuality: standard
    defaultStyle: vivid

Stable Diffusion (Local or Remote)

skills:
  imageGen:
    provider: stable-diffusion
    endpoint: "http://localhost:7860/sdapi/v1/txt2img"
    defaultSteps: 30
    defaultSampler: "DPM++ 2M Karras"
    defaultSize: "512x512"
    defaultCfgScale: 7

Midjourney (via Proxy)

skills:
  imageGen:
    provider: midjourney
    proxyEndpoint: "https://mj-proxy.example.com"
    apiKey: ${MJ_API_KEY}
    defaultAspectRatio: "1:1"

Configuration Details

Basic Parameters

provider: Image generation service provider
apiKey: API authentication key (recommended to use environment variable references)
defaultSize: Default image dimensions
maxGenerationsPerDay: Maximum daily generation count (cost control)
outputFormat: Output format (png / jpg / webp)

Prompt Translation

Since most image generation models perform best with English prompts, OpenClaw has a built-in prompt translation feature. When users describe their needs in another language, the AI agent first translates the description to English before sending it as a prompt to the generation service.

skills:
  imageGen:
    promptTranslation:
      enabled: true
      targetLanguage: en
      enhancePrompt: true

When enhancePrompt is enabled, the AI agent not only translates but also optimizes the prompt based on image generation best practices — adding quality descriptors, style keywords, lighting instructions, and more.

Negative Prompt

You can configure a global negative prompt that is automatically appended to every generation request:

skills:
  imageGen:
    negativePrompt: "low quality, blurry, watermark, text, deformed"

Channel Adaptation

Different channels handle images differently, and the image generation skill automatically adapts based on channel type.

Discord

Generated images are sent as embed messages with prompt descriptions and generation parameters. When combined with the discord_embed channel tool, "Regenerate" buttons can also be added.

Images are sent directly as photo messages. Telegram automatically compresses large images; to preserve original quality, you can configure them to be sent as files instead.

Slack

Images are sent via Slack's file upload API, with automatic alt text for accessibility.

Images are sent as media messages, subject to WhatsApp's file size limit (16MB maximum).

Image Processing Pipeline

Generated images can pass through a processing pipeline before being sent:

Resizing: Automatic scaling based on the target channel
Format conversion: Converting to the most suitable format for the channel
Watermark addition: Optional custom watermark overlay
Metadata injection: Writing generation parameters into image EXIF data
Content moderation: Optional NSFW detection to filter inappropriate content

skills:
  imageGen:
    pipeline:
      resize: auto
      format: auto
      watermark:
        enabled: false
        text: "Generated by OpenClaw"
      contentFilter:
        enabled: true
        strictness: medium

Image Management in Sessions

Generated images are bound to sessions. When sessions are persisted using JSONL format, image reference paths and metadata are recorded, but the image files themselves are stored in a separate file store.

When session compaction occurs, older image references may be cleaned up. You can control image retention time with the imageRetentionDays configuration.

Cost Control

AI image generation is typically billed per generation. OpenClaw provides several cost control mechanisms:

Daily quota: Limit the number of generations per day
User quota: Set individual usage limits for each user
Channel quota: Control generation frequency per channel
Caching: Identical prompts within a short time window won't trigger duplicate generations

skills:
  imageGen:
    quotas:
      daily: 100
      perUser: 10
      perChannel: 30
    cache:
      enabled: true
      ttl: 3600

Collaboration with Other Tools

The image generation skill can work in conjunction with other OpenClaw tools:

browser + imageGen: First fetch reference materials from web pages, then generate images based on them
cron + imageGen: Scheduled image generation (e.g., daily wallpaper recommendations)
canvas + imageGen: Secondary editing and annotation of generated images

Troubleshooting

Common issues and solutions:

Generation timeout: Image generation typically takes 10-30 seconds; ensure timeout settings are long enough
API rate limiting: Add request queuing and retry logic
Content rejection: Adjust prompts or review content policies
Poor image quality: Enable enhancePrompt or adjust generation parameters

Summary

OpenClaw's image generation skill seamlessly integrates AI image creation into the conversation experience. Through flexible provider configuration, intelligent prompt optimization, and comprehensive cost controls, it provides users with a convenient and efficient image creation tool.