LLM Providers
Configure one or more LLM provider backends. The gateway translates every request to the provider's native format — your application always uses the same OpenAI-compatible API.
OpenAI
Supports streaming, function calling, vision, and embeddings. Set orgId if using a specific organisation.
Supported Models
gpt-4ogpt-4o-minigpt-4-turbogpt-3.5-turboo1-previewo1-miniConfiguration
backends:
- name: "openai-backend"
type: openai
config:
apiKey: "${OPENAI_API_KEY}"
orgId: "${OPENAI_ORG_ID}" # optional
defaultModel: "gpt-4o"
timeout: 30s
maxRetries: 3Anthropic
The gateway translates OpenAI-format requests to Anthropic's messages API automatically. max_tokens is required for Anthropic.
Supported Models
claude-3-5-sonnet-20241022claude-3-5-haiku-20241022claude-3-opus-20240229claude-3-sonnet-20240229Configuration
backends:
- name: "anthropic-backend"
type: anthropic
config:
apiKey: "${ANTHROPIC_API_KEY}"
defaultModel: "claude-3-5-sonnet-20241022"
maxTokens: 8192 # anthropic requires max_tokens
timeout: 60sAzure OpenAI
Azure OpenAI uses deployment names (not model names). The apiVersion must match the version available in your Azure region.
Supported Models
gpt-4o (deployment name)gpt-4 (deployment name)text-embedding-3-largeConfiguration
backends:
- name: "azure-backend"
type: azure-openai
config:
apiKey: "${AZURE_OPENAI_API_KEY}"
endpoint: "${AZURE_OPENAI_ENDPOINT}"
# e.g. https://myresource.openai.azure.com
deploymentName: "gpt-4o-prod"
apiVersion: "2024-08-01-preview"
timeout: 30sGoogle Gemini
Supports both Google AI Studio (API key) and Vertex AI (project + service account). For Vertex AI, set projectId and location instead of apiKey.
Supported Models
gemini-1.5-progemini-1.5-flashgemini-1.0-protext-embedding-004Configuration
backends:
- name: "gemini-backend"
type: google
config:
apiKey: "${GOOGLE_AI_API_KEY}"
defaultModel: "gemini-1.5-pro"
# Optional: use Vertex AI instead of AI Studio
# projectId: "${GCP_PROJECT_ID}"
# location: "us-central1"Meta Llama
Meta Llama can be self-hosted or accessed via third-party providers like Together AI, Groq, or Fireworks. Set endpoint to your hosting provider's base URL.
Supported Models
llama-3.3-70b-instructllama-3.1-8b-instructllama-3.2-11b-visionConfiguration
backends:
- name: "llama-backend"
type: meta
config:
# Self-hosted or via hosting providers (Together, Groq, etc.)
endpoint: "${LLAMA_ENDPOINT_URL}"
apiKey: "${LLAMA_API_KEY}"
defaultModel: "llama-3.3-70b-instruct"
timeout: 60sMistral AI
Mistral supports the OpenAI-compatible API format natively, making it straightforward to integrate. Codestral is recommended for code generation tasks.
Supported Models
mistral-large-latestmistral-nemocodestral-latestmistral-embedConfiguration
backends:
- name: "mistral-backend"
type: mistral
config:
apiKey: "${MISTRAL_API_KEY}"
defaultModel: "mistral-large-latest"
timeout: 30sMulti-Provider Pool
Combine multiple backends into a pool for load balancing and automatic failover:
backends:
# Individual provider backends
- name: "openai-backend"
type: openai
config:
apiKey: "${OPENAI_API_KEY}"
defaultModel: "gpt-4o"
- name: "anthropic-backend"
type: anthropic
config:
apiKey: "${ANTHROPIC_API_KEY}"
defaultModel: "claude-3-5-sonnet-20241022"
# Pool combining both backends
- name: "llm-pool"
type: pool
strategy: latency # picks the fastest responding backend
fallback: true # on error, try next backend
backends:
- name: openai-backend
weight: 70
- name: anthropic-backend
weight: 30
routes:
- name: "chat"
path: "/v1/chat/completions"
methods: ["POST"]
backend: "llm-pool" # route to the pool