LLM Gateway Docs v2.0

Observability

RTD LLM Gateway exports Prometheus metrics, structured JSON logs, and OpenTelemetry traces out of the box. Connect to Grafana, Loki, and Jaeger for a full observability stack.

Prometheus Metrics

Metrics are exposed at GET /metrics in Prometheus text format. Scrape interval of 15 s is recommended.

gateway.yaml — metrics
gateway:
  observability:
    metrics:
      enabled: true
      path: "/metrics"
      port: 9090              # separate port from main listener (optional)
      namespace: "rtd"        # prefix for all metric names
      # Histogram buckets for latency (ms)
      latencyBuckets: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000]

Available Metrics

MetricLabelsDescription
rtd_requests_totalroute, backend, statusTotal requests processed
rtd_request_duration_msroute, backend, quantileRequest latency histogram (P50/P90/P99)
rtd_tokens_totaldirection (prompt/completion), backendTokens consumed across all requests
rtd_backend_errors_totalbackend, error_codeBackend errors by code
rtd_rate_limit_hits_totalkey_type, routeRate limit rejections
rtd_circuit_breaker_statebackendCircuit state: 0=closed, 1=open, 2=half-open
rtd_cost_usd_totalbackend, modelEstimated cost in USD based on token pricing
rtd_active_connectionsbackendCurrent in-flight connections per backend

Sample Prometheus scrape config

prometheus.yml
scrape_configs:
  - job_name: "rtd-llm-gateway"
    static_configs:
      - targets: ["gateway:9090"]
    scrape_interval: 15s

Structured Logging

All access logs are emitted as structured JSON to stdout. Use any log aggregator (Loki, Fluentd, Datadog, CloudWatch) to collect and query them.

gateway.yaml — logging
gateway:
  observability:
    logging:
      level: info             # debug | info | warn | error
      format: json            # json | text
      # Include request/response bodies (careful with PII)
      includeRequestBody: false
      includeResponseBody: false
      # Redact sensitive headers
      redactHeaders: ["Authorization", "X-API-Key"]

Access log fields

stdout — example access log
{
  "level":     "info",
  "ts":        "2025-01-15T12:34:56.789Z",
  "msg":       "request",
  "method":    "POST",
  "path":      "/v1/chat/completions",
  "status":    200,
  "latency_ms": 423,
  "route":     "chat-completions",
  "backend":   "openai-backend",
  "model":     "gpt-4o",
  "prompt_tokens":     254,
  "completion_tokens": 183,
  "total_tokens":      437,
  "cost_usd":  0.0098,
  "client_id": "key_01j9...",
  "request_id": "req_01j9aabbccdd"
}

Distributed Tracing

The gateway emits OpenTelemetry traces via OTLP (gRPC or HTTP). Spans cover the full request lifecycle: inbound receive → auth → rate limit → backend call → response.

gateway.yaml — tracing
gateway:
  observability:
    tracing:
      enabled: true
      sampler: 1.0            # 1.0 = 100% sampling; reduce for high-traffic
      exporter: otlp-grpc     # otlp-grpc | otlp-http | jaeger | zipkin
      otlp:
        endpoint: "http://otel-collector:4317"
        insecure: true        # set false in production
      # Propagators for distributed context
      propagators: ["tracecontext", "baggage"]
      serviceName: "rtd-llm-gateway"

Tip: Set the X-Request-Id header from your client to correlate gateway traces with your application traces. The gateway will propagate it as the trace ID.

Pre-built Dashboards

Import the official Grafana dashboards from the RTD Dashboard Repository:

Gateway Overview

RPS, latency P99, error rate, active connections

#rtd-overview

Cost & Token Usage

Token consumption, cost per backend, model breakdown

#rtd-cost

Backend Health

Per-backend availability, circuit breaker state

#rtd-backends

Security & Rate Limits

Auth failures, rate limit hits, blocked requests

#rtd-security