Sarthak Srivastav — Software Developer (Typescript, Go, Next.js, Node.js, Postgresql)

Overview

•Per-client API keys — issue, revoke, and track keys independently. Keys are stored as SHA-256 hashes so a database leak doesn't expose client credentials.
•Token bucket rate limiting — Redis-backed, per-key, configurable window. Implemented as an atomic Lua script that survives restarts and works across instances.
•Semantic caching — two-tier cache: exact hash match first (fast), then pgvector cosine similarity matching (smart). Saves tokens on repeated or similar queries.
•Usage logging — per-request logging with model, tokens, cost, and latency. Async writes so logging never blocks the response path.
•Automatic cost tracking — fetches live pricing from OpenRouter and calculates per-request cost in the background.
•Admin dashboard — real-time Next.js UI for managing keys and monitoring usage across all clients.

•Chi Router — HTTP entry point with structured logging and CORS.
•Auth Middleware — Bearer token to SHA-256 hash to Postgres lookup against the api_keys table.
•Rate Limiter — Redis token bucket enforces per-key rate limits before the request reaches the proxy.
•Semantic Cache — checks for exact hash match, then pgvector similarity. On a cache hit, the response is returned immediately without hitting the upstream provider.
•Proxy Handler — on cache miss, the request is relayed to OpenRouter with body size limits and the centralized provider key.
•Usage Logger — asynchronously logs the request (model, tokens, cost, latency) to Postgres after the response is sent.

•SHA-256 hashed keys: a database leak doesn't expose client credentials.
•Redis token bucket: Lua-scripted atomic rate limiting; survives restarts, shared across instances.
•Two-tier cache: exact hash match first (fast), then pgvector cosine similarity (smart).
•Async usage logging: writes don't block the response path; cost calculation happens in background.
•Single upstream key: gateway pattern — clients get isolated keys, billing stays centralized.
•No external deps for tests: fake SQL drivers let all tests run without Postgres/Redis.

•Designing a middleware pipeline that composes cleanly (auth, rate limit, cache, proxy, log).
•Implementing atomic rate limiting in Redis with Lua scripts for correctness under concurrency.
•Building a two-tier semantic cache that balances speed (exact hash) with intelligence (vector similarity).
•Ensuring async usage logging doesn't lose data on crashes while keeping the response path fast.

•Chi's middleware composition model maps well to API gateway pipelines.
•Redis Lua scripts are essential for correct atomic operations like token buckets.
•pgvector similarity search is surprisingly effective for caching semantically equivalent LLM queries.
•Separating the admin API from the proxy path simplifies auth and routing concerns.

•Enabled multi-team LLM access through a centralized Go-based gateway with API-key isolation, rate limiting, observability, and usage tracking.
•Reduced redundant LLM API calls through exact + semantic caching architecture, achieving ~90% effective cache hit rate during mixed-workload benchmark testing.
•Limited upstream provider fallbacks to <0.3% across 40k+ benchmark requests using Redis-backed exact caching and pgvector semantic similarity search.
•Provided real-time latency, error-rate, throughput, goroutine, and cache-efficiency visibility using Prometheus and Grafana observability pipelines.
•Load-tested the gateway under concurrent AI inference workloads, sustaining ~62 requests/sec across 40k+ requests using k6-based traffic simulation.