Home Tutorials Categories Skills About
ZH EN JA KO
Model Integration

Hugging Face Model Integration Guide

· 6 min read

Hugging Face Introduction

Hugging Face is the world's largest open-source AI model community, hosting hundreds of thousands of pretrained models. Through the Hugging Face Inference API, you can easily call these models from OpenClaw without building your own inference infrastructure.

Get an API Token

  1. Register and log in at huggingface.co
  2. Go to Settings -> Access Tokens
  3. Click "New token"
  4. Select "Read" permissions (choose "Write" for private models)
  5. Copy the generated token

Basic Configuration

{
  "providers": {
    "huggingface": {
      "type": "openai",
      "baseUrl": "https://api-inference.huggingface.co/models/",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["mistralai/Mistral-7B-Instruct-v0.3"]
    }
  }
}
openclaw secrets set HF_API_TOKEN "hf_your_token_here"

Using Inference Endpoints

For production, consider Hugging Face Inference Endpoints for more stable performance and lower latency:

{
  "providers": {
    "hf-endpoint": {
      "type": "openai",
      "baseUrl": "https://your-endpoint-id.us-east-1.aws.endpoints.huggingface.cloud/v1",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["tgi"]
    }
  }
}

Recommended Models

Model Parameters Use Case
mistralai/Mistral-7B-Instruct-v0.3 7B General conversation
meta-llama/Llama-3.1-8B-Instruct 8B General conversation
microsoft/Phi-3-mini-4k-instruct 3.8B Lightweight conversation
Qwen/Qwen2.5-72B-Instruct 72B Chinese language scenarios

Using TGI Format

Hugging Face's TGI service is compatible with the OpenAI API format:

{
  "providers": {
    "hf-tgi": {
      "type": "openai",
      "baseUrl": "https://api-inference.huggingface.co/v1",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["meta-llama/Llama-3.1-8B-Instruct"]
    }
  }
}

Common Questions

Q: Does the free API have limits? The free Inference API has rate limits of approximately 30 requests per minute. For production, consider a Pro subscription or Inference Endpoints.

Q: Why are model responses slow? Free API models may be in a cold-start state. The first request may take tens of seconds to load the model. Inference Endpoints keep the model in memory.

Summary

Hugging Face provides a rich selection of open-source models. Connecting them to OpenClaw via the Inference API or Inference Endpoints gives you flexible model choices for different scenarios while avoiding the complexity of self-hosted inference servers.

OpenClaw is a free, open-source personal AI assistant that supports WhatsApp, Telegram, Discord, and many more platforms