LLM Gateway Docs v2.0

LLM Providers

Configure one or more LLM provider backends. The gateway translates every request to the provider's native format — your application always uses the same OpenAI-compatible API.

OpenAI

OpenAI

Supports streaming, function calling, vision, and embeddings. Set orgId if using a specific organisation.

Supported Models

gpt-4ogpt-4o-minigpt-4-turbogpt-3.5-turboo1-previewo1-mini

Configuration

gateway.yaml — OpenAI
backends:
  - name: "openai-backend"
    type: openai
    config:
      apiKey: "${OPENAI_API_KEY}"
      orgId:  "${OPENAI_ORG_ID}"    # optional
      defaultModel: "gpt-4o"
      timeout: 30s
      maxRetries: 3
Anthropic

Anthropic

The gateway translates OpenAI-format requests to Anthropic's messages API automatically. max_tokens is required for Anthropic.

Supported Models

claude-3-5-sonnet-20241022claude-3-5-haiku-20241022claude-3-opus-20240229claude-3-sonnet-20240229

Configuration

gateway.yaml — Anthropic
backends:
  - name: "anthropic-backend"
    type: anthropic
    config:
      apiKey: "${ANTHROPIC_API_KEY}"
      defaultModel: "claude-3-5-sonnet-20241022"
      maxTokens: 8192           # anthropic requires max_tokens
      timeout: 60s
Azure OpenAI

Azure OpenAI

Azure OpenAI uses deployment names (not model names). The apiVersion must match the version available in your Azure region.

Supported Models

gpt-4o (deployment name)gpt-4 (deployment name)text-embedding-3-large

Configuration

gateway.yaml — Azure OpenAI
backends:
  - name: "azure-backend"
    type: azure-openai
    config:
      apiKey:      "${AZURE_OPENAI_API_KEY}"
      endpoint:    "${AZURE_OPENAI_ENDPOINT}"
      # e.g. https://myresource.openai.azure.com
      deploymentName: "gpt-4o-prod"
      apiVersion: "2024-08-01-preview"
      timeout: 30s
Google Gemini

Google Gemini

Supports both Google AI Studio (API key) and Vertex AI (project + service account). For Vertex AI, set projectId and location instead of apiKey.

Supported Models

gemini-1.5-progemini-1.5-flashgemini-1.0-protext-embedding-004

Configuration

gateway.yaml — Google Gemini
backends:
  - name: "gemini-backend"
    type: google
    config:
      apiKey:  "${GOOGLE_AI_API_KEY}"
      defaultModel: "gemini-1.5-pro"
      # Optional: use Vertex AI instead of AI Studio
      # projectId: "${GCP_PROJECT_ID}"
      # location: "us-central1"
Meta Llama

Meta Llama

Meta Llama can be self-hosted or accessed via third-party providers like Together AI, Groq, or Fireworks. Set endpoint to your hosting provider's base URL.

Supported Models

llama-3.3-70b-instructllama-3.1-8b-instructllama-3.2-11b-vision

Configuration

gateway.yaml — Meta Llama
backends:
  - name: "llama-backend"
    type: meta
    config:
      # Self-hosted or via hosting providers (Together, Groq, etc.)
      endpoint: "${LLAMA_ENDPOINT_URL}"
      apiKey:   "${LLAMA_API_KEY}"
      defaultModel: "llama-3.3-70b-instruct"
      timeout: 60s
Mistral AI

Mistral AI

Mistral supports the OpenAI-compatible API format natively, making it straightforward to integrate. Codestral is recommended for code generation tasks.

Supported Models

mistral-large-latestmistral-nemocodestral-latestmistral-embed

Configuration

gateway.yaml — Mistral AI
backends:
  - name: "mistral-backend"
    type: mistral
    config:
      apiKey: "${MISTRAL_API_KEY}"
      defaultModel: "mistral-large-latest"
      timeout: 30s

Multi-Provider Pool

Combine multiple backends into a pool for load balancing and automatic failover:

gateway.yaml — pool example
backends:
  # Individual provider backends
  - name: "openai-backend"
    type: openai
    config:
      apiKey: "${OPENAI_API_KEY}"
      defaultModel: "gpt-4o"
  - name: "anthropic-backend"
    type: anthropic
    config:
      apiKey: "${ANTHROPIC_API_KEY}"
      defaultModel: "claude-3-5-sonnet-20241022"

  # Pool combining both backends
  - name: "llm-pool"
    type: pool
    strategy: latency          # picks the fastest responding backend
    fallback: true             # on error, try next backend
    backends:
      - name: openai-backend
        weight: 70
      - name: anthropic-backend
        weight: 30

routes:
  - name: "chat"
    path: "/v1/chat/completions"
    methods: ["POST"]
    backend: "llm-pool"        # route to the pool