LLM Gateway Docs v2.0

Configuration

The gateway is fully configured via a single YAML file. This reference covers every top-level section and its available options.

Full Configuration Example

Below is a complete gateway.yaml showing every major section:

gateway.yaml
# ─── Gateway ────────────────────────────────────
gateway:
  name: "rtd-gateway-prod"
  version: "2.0"
  adminPort: 8081        # Optional admin API port

# ─── Listeners ──────────────────────────────────
listeners:
  - name: "http"
    port: 8080
    protocol: HTTP
  - name: "https"
    port: 8443
    protocol: HTTPS
    tls:
      cert: "/certs/server.crt"
      key:  "/certs/server.key"
      minVersion: "TLS1.2"

# ─── Routes ─────────────────────────────────────
routes:
  - name: "chat"
    path: "/v1/chat/completions"
    methods: ["POST"]
    backend: "primary-pool"
    policies:
      - auth: "jwt-auth"
      - rateLimit: "100rpm"
      - timeout: 30s
  - name: "embeddings"
    path: "/v1/embeddings"
    methods: ["POST"]
    backend: "openai-backend"
    policies:
      - auth: "jwt-auth"
      - timeout: 10s

# ─── Backends ────────────────────────────────────
backends:
  - name: "openai-backend"
    type: openai
    config:
      apiKey: "${OPENAI_API_KEY}"
      orgId:  "${OPENAI_ORG_ID}"     # optional
      defaultModel: "gpt-4o"
  - name: "anthropic-backend"
    type: anthropic
    config:
      apiKey: "${ANTHROPIC_API_KEY}"
      defaultModel: "claude-3-5-sonnet-20241022"
  - name: "primary-pool"
    type: pool
    strategy: latency           # round-robin | weighted | latency | cost
    backends:
      - name: openai-backend
        weight: 60
      - name: anthropic-backend
        weight: 40

# ─── Policies ────────────────────────────────────
policies:
  auth:
    jwt:
      - name: "jwt-auth"
        jwksUrl: "https://auth.mycompany.com/.well-known/jwks.json"
        audience: "rtd-gateway"
        issuer: "https://auth.mycompany.com"
  rateLimits:
    - name: "100rpm"
      requests: 100
      period: 1m
      key: api_key              # api_key | ip | user_id | header:<name>

# ─── Observability ───────────────────────────────
observability:
  metrics:
    prometheus:
      enabled: true
      port: 9090
      path: /metrics
  logging:
    level: info                 # debug | info | warn | error
    format: json
    output: stdout
  tracing:
    enabled: true
    provider: otel
    endpoint: "http://tempo:4317"
    sampleRate: 1.0

Listeners

Listeners define where the gateway accepts incoming traffic. You can run multiple listeners simultaneously.

FieldTypeDescription
namestringUnique identifier for the listener.
portintegerTCP port to bind. Default: 8080 (HTTP), 8443 (HTTPS).
protocolHTTP | HTTPSNetwork protocol. HTTPS requires tls.cert and tls.key.
tls.certstringPath to the PEM certificate file.
tls.keystringPath to the PEM private key file.
tls.minVersionstringMinimum TLS version. Default: TLS1.2.

Provider Backends

Each backend represents one LLM provider connection. The type field determines which adapter is used.

type valueProvider
openaiOpenAI (GPT-4o, GPT-4, GPT-3.5)
anthropicAnthropic (Claude 3 family)
azure-openaiAzure OpenAI Service
googleGoogle Gemini (1.5 Pro, Flash)
metaMeta Llama (via API endpoint)
mistralMistral AI (Mistral Large, Nemo)
poolLoad-balanced group of backends

Environment Variables

Use ${VAR_NAME} syntax in YAML to reference environment variables. Key variables:

VariablePurpose
OPENAI_API_KEYOpenAI API key
ANTHROPIC_API_KEYAnthropic API key
AZURE_OPENAI_API_KEYAzure OpenAI key
AZURE_OPENAI_ENDPOINTAzure OpenAI resource endpoint URL
GOOGLE_AI_API_KEYGoogle AI / Gemini API key
MISTRAL_API_KEYMistral AI API key
RTD_ADMIN_TOKENBearer token for admin API access
RTD_LOG_LEVELOverride log level (debug|info|warn|error)