Control how many requests a client can make in a time window. Protects services from abuse, DDoS, and ensures fair usage across tenants.
Easy-Medium High FrequencyA rate limiter sits in front of your API (usually at the gateway) and tracks how many requests each client makes. When a client exceeds the limit, the limiter rejects requests with HTTP 429 (Too Many Requests) until tokens replenish.
With multiple API servers, you need shared state. Redis is the standard choice: use a Lua script (atomic INCR + EXPIRE, or token bucket logic) to avoid race conditions. A single Redis instance handles 100K+ ops/sec โ more than enough for rate limiting.
Where to enforce โ gateway vs service: Gateway (Kong, AWS API Gateway) catches abuse early and is centralized. But service-level limits allow fine-grained rules (e.g., "only 100 repo creations/hour"). Best practice: coarse limit at gateway, fine-grained in services.
Token Bucket vs Fixed Window: Token Bucket allows bursts while maintaining average rate โ better UX. Fixed Window is simpler but the boundary burst problem (2ร rate at window edge) can overwhelm services. Token Bucket wins for most use cases.
Rate limit key: By API key (per-developer), by user ID (per-account), by IP (anonymous). Tiered limits: free tier = 100/hr, paid = 5000/hr. Consider: authenticated vs anonymous, read vs write operations.
Hard vs soft limits: Hard: reject immediately at the limit. Soft: allow some overflow, log it, maybe degrade quality (serve cached response). Soft limits are friendlier but harder to enforce fairness.
Interview signal: Lead with the algorithm choice and justify it. "I'd use token bucket because it handles bursts while maintaining average rate" is much stronger than just saying "I'd add rate limiting."
X-RateLimit-Remaining headers so clients can self-throttle.Retry-After header. Bots that ignore limits get globally rate-limited, then banned.| Metric | Value |
|---|---|
| Redis memory per user (token bucket state) | ~64 bytes ร 2M active users = ~128MB |
| Redis throughput for rate limiting | ~100K+ ops/sec (single instance) |
| Lua script latency (atomic check) | ~0.1 ms |
| GitHub rate limit | 5,000 req/hour per user |
| Stripe rate limit | 100 req/sec per API key |
| Fixed window boundary burst | 2ร nominal rate in worst case |
| Cost of NOT rate limiting | One script at 10K req/sec can down a service |