Distribute incoming traffic across multiple servers to maximize throughput, minimize response time, and ensure no single server bears too much load.
Easy Very High FrequencyA load balancer sits between clients and a pool of backend servers. Every incoming request is routed to one of the servers based on a chosen algorithm. The load balancer continuously monitors server health and removes unhealthy instances from the rotation.
Operates on TCP/UDP. Routes based on IP + port. Very fast — no packet inspection. Can't make content-based routing decisions. Examples: AWS NLB, HAProxy (TCP mode).
Operates on HTTP/HTTPS. Can route based on URL path, headers, cookies. Enables sticky sessions, A/B testing, canary deployments. Examples: AWS ALB, Nginx, Envoy.
The LB periodically pings each server (TCP connect, HTTP GET /health, or gRPC health check). If a server fails N consecutive checks, it's removed from the pool. Once it passes again, it's re-added. This is what makes load balancing provide high availability.
L4 vs L7: L4 is faster (no packet inspection, ~μs overhead) but blind to HTTP semantics. L7 adds latency (~1ms) but enables content routing, SSL termination, and request transformation. Most web apps need L7.
Sticky sessions vs Stateless: Sticky sessions (via cookies) pin a user to one server — simpler app code but kills even distribution and complicates scaling. Better approach: externalize state to Redis/DB and go fully stateless.
Single LB vs Multiple: A single LB is a SPOF. Use active-passive or active-active pairs with a floating IP (VRRP/keepalived) or DNS failover. Cloud LBs (ALB, GCP LB) handle this for you.
SSL termination at LB vs passthrough: Terminating SSL at the LB simplifies cert management and offloads crypto from app servers. But traffic between LB and backend is unencrypted unless you add mutual TLS (mTLS).
If an interviewer asks you to design any scalable web service, load balancing is step one. Mention it early.
/api vs /static to different servicesInterview signal: The interviewer wants to see you can separate traffic distribution from application logic and explain the tradeoffs of different algorithms.
| Metric | Value |
|---|---|
| Nginx max concurrent connections | ~10K–100K (event-driven) |
| HAProxy throughput | ~2M HTTP req/s (modern hardware) |
| AWS ALB latency overhead | ~1–5 ms |
| AWS NLB latency overhead | ~100 μs |
| Health check interval (typical) | 5–30 seconds |
| Failover detection time | 15–90 seconds (3 consecutive failures) |
| Google Maglev throughput | ~10M packets/s per machine |