Home Tutorials Categories Skills About
ZH EN JA KO
Skills-Plugins

OpenClaw Image Generation Skill Configuration

· 13 min read

Overview

Image generation is one of the most popular capabilities of AI agents. OpenClaw supports integrating multiple AI image generation services through its skill plugin system, enabling agents to generate images based on users' text descriptions. This article covers how to configure and optimize the image generation skill in OpenClaw.

Skill Architecture

The image generation skill is part of OpenClaw's tool system, registered and managed within the seven-stage tool pipeline. It combines the graphic processing capabilities of the canvas built-in tool with external AI image generation APIs.

The Pi SDK's direct embedding feature shortens the image generation call chain — everything from user request to API call happens within the same runtime without inter-process communication.

Supported Generation Services

OpenAI DALL-E

skills:
  imageGen:
    provider: openai
    model: dall-e-3
    apiKey: ${OPENAI_API_KEY}
    defaultSize: "1024x1024"
    defaultQuality: standard
    defaultStyle: vivid

Stable Diffusion (Local or Remote)

skills:
  imageGen:
    provider: stable-diffusion
    endpoint: "http://localhost:7860/sdapi/v1/txt2img"
    defaultSteps: 30
    defaultSampler: "DPM++ 2M Karras"
    defaultSize: "512x512"
    defaultCfgScale: 7

Midjourney (via Proxy)

skills:
  imageGen:
    provider: midjourney
    proxyEndpoint: "https://mj-proxy.example.com"
    apiKey: ${MJ_API_KEY}
    defaultAspectRatio: "1:1"

Configuration Details

Basic Parameters

  • provider: Image generation service provider
  • apiKey: API authentication key (recommended to use environment variable references)
  • defaultSize: Default image dimensions
  • maxGenerationsPerDay: Maximum daily generation count (cost control)
  • outputFormat: Output format (png / jpg / webp)

Prompt Translation

Since most image generation models perform best with English prompts, OpenClaw has a built-in prompt translation feature. When users describe their needs in another language, the AI agent first translates the description to English before sending it as a prompt to the generation service.

skills:
  imageGen:
    promptTranslation:
      enabled: true
      targetLanguage: en
      enhancePrompt: true

When enhancePrompt is enabled, the AI agent not only translates but also optimizes the prompt based on image generation best practices — adding quality descriptors, style keywords, lighting instructions, and more.

Negative Prompt

You can configure a global negative prompt that is automatically appended to every generation request:

skills:
  imageGen:
    negativePrompt: "low quality, blurry, watermark, text, deformed"

Channel Adaptation

Different channels handle images differently, and the image generation skill automatically adapts based on channel type.

Discord

Generated images are sent as embed messages with prompt descriptions and generation parameters. When combined with the discord_embed channel tool, "Regenerate" buttons can also be added.

Telegram

Images are sent directly as photo messages. Telegram automatically compresses large images; to preserve original quality, you can configure them to be sent as files instead.

Slack

Images are sent via Slack's file upload API, with automatic alt text for accessibility.

WhatsApp

Images are sent as media messages, subject to WhatsApp's file size limit (16MB maximum).

Image Processing Pipeline

Generated images can pass through a processing pipeline before being sent:

  1. Resizing: Automatic scaling based on the target channel
  2. Format conversion: Converting to the most suitable format for the channel
  3. Watermark addition: Optional custom watermark overlay
  4. Metadata injection: Writing generation parameters into image EXIF data
  5. Content moderation: Optional NSFW detection to filter inappropriate content
skills:
  imageGen:
    pipeline:
      resize: auto
      format: auto
      watermark:
        enabled: false
        text: "Generated by OpenClaw"
      contentFilter:
        enabled: true
        strictness: medium

Image Management in Sessions

Generated images are bound to sessions. When sessions are persisted using JSONL format, image reference paths and metadata are recorded, but the image files themselves are stored in a separate file store.

When session compaction occurs, older image references may be cleaned up. You can control image retention time with the imageRetentionDays configuration.

Cost Control

AI image generation is typically billed per generation. OpenClaw provides several cost control mechanisms:

  • Daily quota: Limit the number of generations per day
  • User quota: Set individual usage limits for each user
  • Channel quota: Control generation frequency per channel
  • Caching: Identical prompts within a short time window won't trigger duplicate generations
skills:
  imageGen:
    quotas:
      daily: 100
      perUser: 10
      perChannel: 30
    cache:
      enabled: true
      ttl: 3600

Collaboration with Other Tools

The image generation skill can work in conjunction with other OpenClaw tools:

  • browser + imageGen: First fetch reference materials from web pages, then generate images based on them
  • cron + imageGen: Scheduled image generation (e.g., daily wallpaper recommendations)
  • canvas + imageGen: Secondary editing and annotation of generated images

Troubleshooting

Common issues and solutions:

  1. Generation timeout: Image generation typically takes 10-30 seconds; ensure timeout settings are long enough
  2. API rate limiting: Add request queuing and retry logic
  3. Content rejection: Adjust prompts or review content policies
  4. Poor image quality: Enable enhancePrompt or adjust generation parameters

Summary

OpenClaw's image generation skill seamlessly integrates AI image creation into the conversation experience. Through flexible provider configuration, intelligent prompt optimization, and comprehensive cost controls, it provides users with a convenient and efficient image creation tool.

OpenClaw is a free, open-source personal AI assistant that supports WhatsApp, Telegram, Discord, and many more platforms