LLM Gateway Docs v2.0

Architecture & Concepts

Understand how RealTimeDetect LLM Gateway works, what its core components are, and how requests flow through the system.

What is LLM Gateway?

RealTimeDetect LLM Gateway is an intelligent reverse proxy that sits between your applications and one or more LLM providers (OpenAI, Anthropic, Azure OpenAI, etc.). It abstracts the provider layer so your apps make a single, unified API call — while the gateway handles routing, authentication, rate limiting, cost tracking, and observability.

Think of it as a control plane for AI: all LLM traffic flows through a single point, giving you complete visibility and governance over your AI infrastructure.

Architecture Overview

The gateway consists of two main planes — the data plane (handles live request traffic) and the control plane (manages config, policies, and admin operations).

┌─────────────┐   ┌─────────────┐   ┌─────────────┐
  │  Your App   │   │  Your App   │   │   Admin UI  │
  │  (backend)  │   │  (frontend) │   │  / CLI / API│
  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘
         │                 │                  │
         └─────────────────┘                  │
                   │  HTTP/gRPC                │ Config API
                   ▼                           ▼
         ┌─────────────────────────────────────────────┐
         │           RealTimeDetect LLM Gateway         │
         │  ┌──────────┐  ┌───────────┐  ┌──────────┐  │
         │  │ Listener │→ │  Router   │→ │ Backend  │  │
         │  │ :8080    │  │ /policies │  │ Selector │  │
         │  └──────────┘  └───────────┘  └──────────┘  │
         │       ↕              ↕               ↕       │
         │  ┌──────────────────────────────────────┐    │
         │  │  Middleware Chain                     │    │
         │  │  Auth → Rate Limiter → Logger → ...  │    │
         │  └──────────────────────────────────────┘    │
         └───────────────────┬─────────────────────────┘
                             │                         
          ┌──────────────────┼──────────────────┐
          ▼                  ▼                  ▼
    ┌──────────┐      ┌──────────┐      ┌───────────┐
    │  OpenAI  │      │Anthropic │      │Azure OAI  │
    └──────────┘      │ Claude   │      │  Gemini   │
                      └──────────┘      │  Llama    │
                                        └───────────┘

Core Concepts

Gateway Instance

A running instance of the LLM Gateway process, configured via a YAML file. One instance can serve multiple applications with different routing rules.

Listener

A network endpoint (port + protocol) where the gateway accepts incoming requests. You can define multiple listeners — e.g., HTTP on 8080 and HTTPS on 8443 simultaneously.

Route

A matching rule that maps incoming requests (by path, method, or headers) to a backend provider. Routes are evaluated in order; the first match wins.

Backend

A configured LLM provider endpoint (OpenAI, Anthropic, etc.) with its credentials and default settings. Multiple backends enable failover and load balancing.

Policy

A reusable behaviour rule attached to a route: rate limits, authentication requirements, retry logic, or timeout settings. Policies are defined once and referenced by name.

Middleware

Request/response interceptors that run in a chain before and after the backend call. Examples: JWT validator, cost calculator, request logger, response transformer.

Provider Adapter

A built-in translation layer that normalises provider-specific APIs into the OpenAI-compatible format used by the gateway. Adapters handle request/response transformation, streaming, and error mapping.

Request Lifecycle

For every incoming request, the gateway performs the following steps in order:

  1. 01
    Accept connection

    The listener accepts the incoming TCP connection on the configured port and protocol.

  2. 02
    Parse & validate

    The HTTP request is parsed. Missing or malformed fields return a 400 immediately, before any upstream call is made.

  3. 03
    Route match

    Routes are evaluated top-to-bottom. The first matching route determines which backend and policy set to apply.

  4. 04
    Run middleware chain

    Pre-request middleware runs: authentication, rate limiter check, request logging, token budget enforcement.

  5. 05
    Backend selection

    The backend selector picks a provider based on the configured strategy (round-robin, weighted, latency-based, or cost-based).

  6. 06
    Provider adapter translate

    The request is translated from the canonical format into the provider-specific format (e.g. Anthropic's messages API).

  7. 07
    Forward to LLM

    The translated request is sent to the upstream provider. Streaming responses are forwarded as Server-Sent Events.

  8. 08
    Post-process response

    The response is normalised back to OpenAI format. Usage metadata, cost, and latency are injected into the _rtd field.

  9. 09
    Return to caller

    The final response is sent back to the original caller. Metrics are flushed to Prometheus and the trace span is closed.

Supported Protocols

ProtocolPort defaultNotes
HTTP/1.18080Default. Supports streaming via chunked transfer.
HTTPS (TLS)8443Requires cert + key in config. TLS 1.2 minimum.
HTTP/28443Enabled automatically with HTTPS listener.
gRPC50051For internal service-to-gateway communication.