Hugging Face Model Integration Guide

Hugging Face Introduction

Hugging Face is the world's largest open-source AI model community, hosting hundreds of thousands of pretrained models. Through the Hugging Face Inference API, you can easily call these models from OpenClaw without building your own inference infrastructure.

Get an API Token

Register and log in at huggingface.co
Go to Settings -> Access Tokens
Click "New token"
Select "Read" permissions (choose "Write" for private models)
Copy the generated token

Basic Configuration

{
  "providers": {
    "huggingface": {
      "type": "openai",
      "baseUrl": "https://api-inference.huggingface.co/models/",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["mistralai/Mistral-7B-Instruct-v0.3"]
    }
  }
}

openclaw secrets set HF_API_TOKEN "hf_your_token_here"

Using Inference Endpoints

For production, consider Hugging Face Inference Endpoints for more stable performance and lower latency:

{
  "providers": {
    "hf-endpoint": {
      "type": "openai",
      "baseUrl": "https://your-endpoint-id.us-east-1.aws.endpoints.huggingface.cloud/v1",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["tgi"]
    }
  }
}

Recommended Models

Model	Parameters	Use Case
mistralai/Mistral-7B-Instruct-v0.3	7B	General conversation
meta-llama/Llama-3.1-8B-Instruct	8B	General conversation
microsoft/Phi-3-mini-4k-instruct	3.8B	Lightweight conversation
Qwen/Qwen2.5-72B-Instruct	72B	Chinese language scenarios

Using TGI Format

Hugging Face's TGI service is compatible with the OpenAI API format:

{
  "providers": {
    "hf-tgi": {
      "type": "openai",
      "baseUrl": "https://api-inference.huggingface.co/v1",
      "apiKey": "{{HF_API_TOKEN}}",
      "models": ["meta-llama/Llama-3.1-8B-Instruct"]
    }
  }
}

Common Questions

Q: Does the free API have limits? The free Inference API has rate limits of approximately 30 requests per minute. For production, consider a Pro subscription or Inference Endpoints.

Q: Why are model responses slow? Free API models may be in a cold-start state. The first request may take tens of seconds to load the model. Inference Endpoints keep the model in memory.

Summary

Hugging Face provides a rich selection of open-source models. Connecting them to OpenClaw via the Inference API or Inference Endpoints gives you flexible model choices for different scenarios while avoiding the complexity of self-hosted inference servers.