โš ๏ธ This guide is AI-generated and may contain inaccuracies. Always verify against authoritative sources and real-world documentation.

Architecture Diagram โ€” Token Bucket

TOKEN BUCKET Capacity: 10 tokens T T T T T T T Refill: 5 tokens/sec Request โ‘  โœ“ takes token Request โ‘ก โœ“ takes token Request โ‘ข โœ“ takes token โšก BURST (OK!) Request โ‘ช โœ• no tokens! HTTP 429 Steady State โ‰ค5 req/sec sustained X-RateLimit-Remaining: 7 X-RateLimit-Reset: 1680000000

How It Works

A rate limiter sits in front of your API (usually at the gateway) and tracks how many requests each client makes. When a client exceeds the limit, the limiter rejects requests with HTTP 429 (Too Many Requests) until tokens replenish.

Rate Limiting Algorithms

  1. Token Bucket โ€” Bucket holds max N tokens, refills at rate R/sec. Each request consumes one token. Allows bursts up to N while maintaining average rate R. Most common in practice (GitHub, Stripe).
  2. Leaky Bucket โ€” Requests enter a FIFO queue, processed at a fixed rate. Smooths output completely โ€” no bursts. Good for rate shaping but adds latency.
  3. Fixed Window โ€” Count requests per fixed time window (e.g., per minute). Simple but allows 2ร— rate at window boundaries (100 req at 0:59 + 100 at 1:01).
  4. Sliding Window Log โ€” Store timestamp of each request, count in last T seconds. Precise but memory-intensive (stores every timestamp).
  5. Sliding Window Counter โ€” Hybrid: weighted combination of current and previous window counts. Good balance of precision and efficiency.

Distributed Rate Limiting

With multiple API servers, you need shared state. Redis is the standard choice: use a Lua script (atomic INCR + EXPIRE, or token bucket logic) to avoid race conditions. A single Redis instance handles 100K+ ops/sec โ€” more than enough for rate limiting.

Key Design Decisions

๐Ÿ“

Where to enforce โ€” gateway vs service: Gateway (Kong, AWS API Gateway) catches abuse early and is centralized. But service-level limits allow fine-grained rules (e.g., "only 100 repo creations/hour"). Best practice: coarse limit at gateway, fine-grained in services.

๐Ÿชฃ

Token Bucket vs Fixed Window: Token Bucket allows bursts while maintaining average rate โ€” better UX. Fixed Window is simpler but the boundary burst problem (2ร— rate at window edge) can overwhelm services. Token Bucket wins for most use cases.

๐Ÿ”‘

Rate limit key: By API key (per-developer), by user ID (per-account), by IP (anonymous). Tiered limits: free tier = 100/hr, paid = 5000/hr. Consider: authenticated vs anonymous, read vs write operations.

๐Ÿ“Š

Hard vs soft limits: Hard: reject immediately at the limit. Soft: allow some overflow, log it, maybe degrade quality (serve cached response). Soft limits are friendlier but harder to enforce fairness.

When to Use

  • "How do you prevent abuse?" โ€” Rate limiting is the first line of defense for any public API.
  • "How do you handle a DDoS?" โ€” Rate limiting + circuit breaker at the edge.
  • "Design an API for multi-tenant SaaS" โ€” Fair usage per tenant requires per-tenant rate limits.
  • "Design a chat app" โ€” Rate limit messages per user (e.g., 5 msg/sec, burst of 10) to prevent spam.

Interview signal: Lead with the algorithm choice and justify it. "I'd use token bucket because it handles bursts while maintaining average rate" is much stronger than just saying "I'd add rate limiting."

Real-World Examples

  • GitHub API โ€” 5,000 requests/hour per authenticated user. Token bucket with Redis. Returns X-RateLimit-Remaining headers so clients can self-throttle.
  • Stripe โ€” Rate limits per API key with different tiers. 100 req/sec for most endpoints, lower for resource-intensive operations. Graduated enforcement: warn, then throttle.
  • Cloudflare โ€” Edge rate limiting at 300+ PoPs worldwide. Rules based on URL path, IP, headers. Can block millions of req/sec at the edge before traffic reaches origin.
  • Discord โ€” Per-route rate limits (e.g., 5 msg/5sec per channel). Returns Retry-After header. Bots that ignore limits get globally rate-limited, then banned.

Back-of-Envelope Numbers

Metric Value
Redis memory per user (token bucket state)~64 bytes ร— 2M active users = ~128MB
Redis throughput for rate limiting~100K+ ops/sec (single instance)
Lua script latency (atomic check)~0.1 ms
GitHub rate limit5,000 req/hour per user
Stripe rate limit100 req/sec per API key
Fixed window boundary burst2ร— nominal rate in worst case
Cost of NOT rate limitingOne script at 10K req/sec can down a service