2026

routerx — LLM API Gateway

A self-hosted API gateway for LLM providers with key management, rate limiting, semantic caching, usage tracking, and a real-time admin dashboard.

Technology Stack

GoNext.jsPostgreSQLRedis

Overview

Core Features

  • Per-client API keys — issue, revoke, and track keys independently. Keys are stored as SHA-256 hashes so a database leak doesn't expose client credentials.
  • Token bucket rate limiting — Redis-backed, per-key, configurable window. Implemented as an atomic Lua script that survives restarts and works across instances.
  • Semantic caching — two-tier cache: exact hash match first (fast), then pgvector cosine similarity matching (smart). Saves tokens on repeated or similar queries.
  • Usage logging — per-request logging with model, tokens, cost, and latency. Async writes so logging never blocks the response path.
  • Automatic cost tracking — fetches live pricing from OpenRouter and calculates per-request cost in the background.
  • Admin dashboard — real-time Next.js UI for managing keys and monitoring usage across all clients.

System Architecture

routerx architecture
  • Chi Router — HTTP entry point with structured logging and CORS.
  • Auth Middleware — Bearer token to SHA-256 hash to Postgres lookup against the api_keys table.
  • Rate Limiter — Redis token bucket enforces per-key rate limits before the request reaches the proxy.
  • Semantic Cache — checks for exact hash match, then pgvector similarity. On a cache hit, the response is returned immediately without hitting the upstream provider.
  • Proxy Handler — on cache miss, the request is relayed to OpenRouter with body size limits and the centralized provider key.
  • Usage Logger — asynchronously logs the request (model, tokens, cost, latency) to Postgres after the response is sent.

Design Decisions

  • SHA-256 hashed keys: a database leak doesn't expose client credentials.
  • Redis token bucket: Lua-scripted atomic rate limiting; survives restarts, shared across instances.
  • Two-tier cache: exact hash match first (fast), then pgvector cosine similarity (smart).
  • Async usage logging: writes don't block the response path; cost calculation happens in background.
  • Single upstream key: gateway pattern — clients get isolated keys, billing stays centralized.
  • No external deps for tests: fake SQL drivers let all tests run without Postgres/Redis.

Key Challenges

  • Designing a middleware pipeline that composes cleanly (auth, rate limit, cache, proxy, log).
  • Implementing atomic rate limiting in Redis with Lua scripts for correctness under concurrency.
  • Building a two-tier semantic cache that balances speed (exact hash) with intelligence (vector similarity).
  • Ensuring async usage logging doesn't lose data on crashes while keeping the response path fast.

Key Learnings

  • Chi's middleware composition model maps well to API gateway pipelines.
  • Redis Lua scripts are essential for correct atomic operations like token buckets.
  • pgvector similarity search is surprisingly effective for caching semantically equivalent LLM queries.
  • Separating the admin API from the proxy path simplifies auth and routing concerns.

Impact and Results

  • Enabled multi-team LLM access through a centralized Go-based gateway with API-key isolation, rate limiting, observability, and usage tracking.
  • Reduced redundant LLM API calls through exact + semantic caching architecture, achieving ~90% effective cache hit rate during mixed-workload benchmark testing.
  • Limited upstream provider fallbacks to <0.3% across 40k+ benchmark requests using Redis-backed exact caching and pgvector semantic similarity search.
  • Provided real-time latency, error-rate, throughput, goroutine, and cache-efficiency visibility using Prometheus and Grafana observability pipelines.
  • Load-tested the gateway under concurrent AI inference workloads, sustaining ~62 requests/sec across 40k+ requests using k6-based traffic simulation.