2026
routerx — LLM API Gateway
A self-hosted API gateway for LLM providers with key management, rate limiting, semantic caching, usage tracking, and a real-time admin dashboard.
Technology Stack
GoNext.jsPostgreSQLRedis
Overview
Core Features
- •Per-client API keys — issue, revoke, and track keys independently. Keys are stored as SHA-256 hashes so a database leak doesn't expose client credentials.
- •Token bucket rate limiting — Redis-backed, per-key, configurable window. Implemented as an atomic Lua script that survives restarts and works across instances.
- •Semantic caching — two-tier cache: exact hash match first (fast), then pgvector cosine similarity matching (smart). Saves tokens on repeated or similar queries.
- •Usage logging — per-request logging with model, tokens, cost, and latency. Async writes so logging never blocks the response path.
- •Automatic cost tracking — fetches live pricing from OpenRouter and calculates per-request cost in the background.
- •Admin dashboard — real-time Next.js UI for managing keys and monitoring usage across all clients.
System Architecture
- •Chi Router — HTTP entry point with structured logging and CORS.
- •Auth Middleware — Bearer token to SHA-256 hash to Postgres lookup against the api_keys table.
- •Rate Limiter — Redis token bucket enforces per-key rate limits before the request reaches the proxy.
- •Semantic Cache — checks for exact hash match, then pgvector similarity. On a cache hit, the response is returned immediately without hitting the upstream provider.
- •Proxy Handler — on cache miss, the request is relayed to OpenRouter with body size limits and the centralized provider key.
- •Usage Logger — asynchronously logs the request (model, tokens, cost, latency) to Postgres after the response is sent.
Design Decisions
- •SHA-256 hashed keys: a database leak doesn't expose client credentials.
- •Redis token bucket: Lua-scripted atomic rate limiting; survives restarts, shared across instances.
- •Two-tier cache: exact hash match first (fast), then pgvector cosine similarity (smart).
- •Async usage logging: writes don't block the response path; cost calculation happens in background.
- •Single upstream key: gateway pattern — clients get isolated keys, billing stays centralized.
- •No external deps for tests: fake SQL drivers let all tests run without Postgres/Redis.
Key Challenges
- •Designing a middleware pipeline that composes cleanly (auth, rate limit, cache, proxy, log).
- •Implementing atomic rate limiting in Redis with Lua scripts for correctness under concurrency.
- •Building a two-tier semantic cache that balances speed (exact hash) with intelligence (vector similarity).
- •Ensuring async usage logging doesn't lose data on crashes while keeping the response path fast.
Key Learnings
- •Chi's middleware composition model maps well to API gateway pipelines.
- •Redis Lua scripts are essential for correct atomic operations like token buckets.
- •pgvector similarity search is surprisingly effective for caching semantically equivalent LLM queries.
- •Separating the admin API from the proxy path simplifies auth and routing concerns.
Impact and Results
- •Enabled multi-team LLM access through a centralized Go-based gateway with API-key isolation, rate limiting, observability, and usage tracking.
- •Reduced redundant LLM API calls through exact + semantic caching architecture, achieving ~90% effective cache hit rate during mixed-workload benchmark testing.
- •Limited upstream provider fallbacks to <0.3% across 40k+ benchmark requests using Redis-backed exact caching and pgvector semantic similarity search.
- •Provided real-time latency, error-rate, throughput, goroutine, and cache-efficiency visibility using Prometheus and Grafana observability pipelines.
- •Load-tested the gateway under concurrent AI inference workloads, sustaining ~62 requests/sec across 40k+ requests using k6-based traffic simulation.