Observability
RTD LLM Gateway exports Prometheus metrics, structured JSON logs, and OpenTelemetry traces out of the box. Connect to Grafana, Loki, and Jaeger for a full observability stack.
Prometheus Metrics
Metrics are exposed at GET /metrics in Prometheus text format. Scrape interval of 15 s is recommended.
gateway:
observability:
metrics:
enabled: true
path: "/metrics"
port: 9090 # separate port from main listener (optional)
namespace: "rtd" # prefix for all metric names
# Histogram buckets for latency (ms)
latencyBuckets: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000]Available Metrics
| Metric | Labels | Description |
|---|---|---|
| rtd_requests_total | route, backend, status | Total requests processed |
| rtd_request_duration_ms | route, backend, quantile | Request latency histogram (P50/P90/P99) |
| rtd_tokens_total | direction (prompt/completion), backend | Tokens consumed across all requests |
| rtd_backend_errors_total | backend, error_code | Backend errors by code |
| rtd_rate_limit_hits_total | key_type, route | Rate limit rejections |
| rtd_circuit_breaker_state | backend | Circuit state: 0=closed, 1=open, 2=half-open |
| rtd_cost_usd_total | backend, model | Estimated cost in USD based on token pricing |
| rtd_active_connections | backend | Current in-flight connections per backend |
Sample Prometheus scrape config
scrape_configs:
- job_name: "rtd-llm-gateway"
static_configs:
- targets: ["gateway:9090"]
scrape_interval: 15sStructured Logging
All access logs are emitted as structured JSON to stdout. Use any log aggregator (Loki, Fluentd, Datadog, CloudWatch) to collect and query them.
gateway:
observability:
logging:
level: info # debug | info | warn | error
format: json # json | text
# Include request/response bodies (careful with PII)
includeRequestBody: false
includeResponseBody: false
# Redact sensitive headers
redactHeaders: ["Authorization", "X-API-Key"]Access log fields
{
"level": "info",
"ts": "2025-01-15T12:34:56.789Z",
"msg": "request",
"method": "POST",
"path": "/v1/chat/completions",
"status": 200,
"latency_ms": 423,
"route": "chat-completions",
"backend": "openai-backend",
"model": "gpt-4o",
"prompt_tokens": 254,
"completion_tokens": 183,
"total_tokens": 437,
"cost_usd": 0.0098,
"client_id": "key_01j9...",
"request_id": "req_01j9aabbccdd"
}Distributed Tracing
The gateway emits OpenTelemetry traces via OTLP (gRPC or HTTP). Spans cover the full request lifecycle: inbound receive → auth → rate limit → backend call → response.
gateway:
observability:
tracing:
enabled: true
sampler: 1.0 # 1.0 = 100% sampling; reduce for high-traffic
exporter: otlp-grpc # otlp-grpc | otlp-http | jaeger | zipkin
otlp:
endpoint: "http://otel-collector:4317"
insecure: true # set false in production
# Propagators for distributed context
propagators: ["tracecontext", "baggage"]
serviceName: "rtd-llm-gateway"Tip: Set the X-Request-Id header from your client to correlate gateway traces with your application traces. The gateway will propagate it as the trace ID.
Pre-built Dashboards
Import the official Grafana dashboards from the RTD Dashboard Repository:
Gateway Overview
RPS, latency P99, error rate, active connections
#rtd-overviewCost & Token Usage
Token consumption, cost per backend, model breakdown
#rtd-costBackend Health
Per-backend availability, circuit breaker state
#rtd-backendsSecurity & Rate Limits
Auth failures, rate limit hits, blocked requests
#rtd-security