Architecture & Concepts
Understand how RealTimeDetect LLM Gateway works, what its core components are, and how requests flow through the system.
What is LLM Gateway?
RealTimeDetect LLM Gateway is an intelligent reverse proxy that sits between your applications and one or more LLM providers (OpenAI, Anthropic, Azure OpenAI, etc.). It abstracts the provider layer so your apps make a single, unified API call — while the gateway handles routing, authentication, rate limiting, cost tracking, and observability.
Think of it as a control plane for AI: all LLM traffic flows through a single point, giving you complete visibility and governance over your AI infrastructure.
Architecture Overview
The gateway consists of two main planes — the data plane (handles live request traffic) and the control plane (manages config, policies, and admin operations).
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your App │ │ Your App │ │ Admin UI │
│ (backend) │ │ (frontend) │ │ / CLI / API│
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└─────────────────┘ │
│ HTTP/gRPC │ Config API
▼ ▼
┌─────────────────────────────────────────────┐
│ RealTimeDetect LLM Gateway │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Listener │→ │ Router │→ │ Backend │ │
│ │ :8080 │ │ /policies │ │ Selector │ │
│ └──────────┘ └───────────┘ └──────────┘ │
│ ↕ ↕ ↕ │
│ ┌──────────────────────────────────────┐ │
│ │ Middleware Chain │ │
│ │ Auth → Rate Limiter → Logger → ... │ │
│ └──────────────────────────────────────┘ │
└───────────────────┬─────────────────────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌───────────┐
│ OpenAI │ │Anthropic │ │Azure OAI │
└──────────┘ │ Claude │ │ Gemini │
└──────────┘ │ Llama │
└───────────┘Core Concepts
A running instance of the LLM Gateway process, configured via a YAML file. One instance can serve multiple applications with different routing rules.
A network endpoint (port + protocol) where the gateway accepts incoming requests. You can define multiple listeners — e.g., HTTP on 8080 and HTTPS on 8443 simultaneously.
A matching rule that maps incoming requests (by path, method, or headers) to a backend provider. Routes are evaluated in order; the first match wins.
A configured LLM provider endpoint (OpenAI, Anthropic, etc.) with its credentials and default settings. Multiple backends enable failover and load balancing.
A reusable behaviour rule attached to a route: rate limits, authentication requirements, retry logic, or timeout settings. Policies are defined once and referenced by name.
Request/response interceptors that run in a chain before and after the backend call. Examples: JWT validator, cost calculator, request logger, response transformer.
A built-in translation layer that normalises provider-specific APIs into the OpenAI-compatible format used by the gateway. Adapters handle request/response transformation, streaming, and error mapping.
Request Lifecycle
For every incoming request, the gateway performs the following steps in order:
- 01Accept connection
The listener accepts the incoming TCP connection on the configured port and protocol.
- 02Parse & validate
The HTTP request is parsed. Missing or malformed fields return a 400 immediately, before any upstream call is made.
- 03Route match
Routes are evaluated top-to-bottom. The first matching route determines which backend and policy set to apply.
- 04Run middleware chain
Pre-request middleware runs: authentication, rate limiter check, request logging, token budget enforcement.
- 05Backend selection
The backend selector picks a provider based on the configured strategy (round-robin, weighted, latency-based, or cost-based).
- 06Provider adapter translate
The request is translated from the canonical format into the provider-specific format (e.g. Anthropic's messages API).
- 07Forward to LLM
The translated request is sent to the upstream provider. Streaming responses are forwarded as Server-Sent Events.
- 08Post-process response
The response is normalised back to OpenAI format. Usage metadata, cost, and latency are injected into the _rtd field.
- 09Return to caller
The final response is sent back to the original caller. Metrics are flushed to Prometheus and the trace span is closed.
Supported Protocols
| Protocol | Port default | Notes |
|---|---|---|
HTTP/1.1 | 8080 | Default. Supports streaming via chunked transfer. |
HTTPS (TLS) | 8443 | Requires cert + key in config. TLS 1.2 minimum. |
HTTP/2 | 8443 | Enabled automatically with HTTPS listener. |
gRPC | 50051 | For internal service-to-gateway communication. |