Caching — System Design Pattern

Architecture Diagram — Cache-Aside Pattern

How It Works

The application sits between the cache and the database. On every read, it first checks the cache. On a hit, data is returned instantly (~μs). On a miss, the app queries the database, stores the result in the cache, and returns it to the client.

Caching Strategies

Cache-Aside (Lazy Loading)

App manages the cache explicitly. Read: check cache → miss → read DB → write cache. Write: update DB → invalidate cache. Most common strategy. Risk: stale data if invalidation fails.

Write-Through

Every write goes to cache AND database synchronously. Guarantees cache consistency. Downside: write latency increases (two writes per operation). Good for read-heavy workloads.

Write-Behind (Write-Back)

Write to cache immediately, async flush to DB. Ultra-low write latency. Risk: data loss if cache node dies before flush. Used in hardware (CPU caches) and disk controllers.

Read-Through

Cache itself handles DB reads on a miss (cache sits in front of DB). Simplifies app code — it only talks to cache. Requires cache library support (e.g., NCache, Hazelcast).

Eviction Policies

LRU (Least Recently Used) — Evict the item accessed longest ago. Default in Redis. Good general-purpose choice.
LFU (Least Frequently Used) — Evict the item accessed fewest times. Better for skewed access patterns.
TTL (Time to Live) — Expire entries after a fixed duration. Simple, prevents unbounded staleness.
Random — Surprisingly close to LRU performance with zero tracking overhead.

Cache Stampede Prevention

When a popular cache key expires, hundreds of requests simultaneously hit the DB. Solutions:

Locking: Only one request fetches from DB; others wait for the cache to be repopulated.
Early expiration (jitter): Randomly refresh before TTL expires. Each key gets TTL ± random offset.
Stale-while-revalidate: Serve stale data while one background thread refreshes.

Key Design Decisions

🧩

Local cache vs Distributed cache: Local (in-process) cache is fastest (~ns) but not shared across instances — leads to inconsistency. Distributed cache (Redis, Memcached) adds ~1ms network hop but is shared and consistent. Most systems use both: L1 local + L2 distributed.

⏱️

TTL tuning: Too short = high miss rate, defeating the purpose. Too long = stale data. Match TTL to how often the underlying data changes. User profiles: minutes. Stock prices: seconds. Static assets: hours/days.

🗑️

Invalidation vs TTL: Active invalidation (delete on write) gives freshness but adds complexity. TTL is simpler but allows staleness windows. Phil Karlton: "There are only two hard things in CS: cache invalidation and naming things."

💾

Redis vs Memcached: Redis: data structures (lists, sets, sorted sets), persistence, pub/sub, Lua scripting. Memcached: simpler, multi-threaded, slightly faster for plain key-value. Redis wins for 90% of use cases.

When to Use

Caching is relevant in almost every system design interview. Mention it whenever you see:

Read-heavy workloads — "Design a news feed" → cache pre-computed feeds
Expensive computations — "Design a search engine" → cache query results
Hot data — "Design a URL shortener" → cache popular short URLs
Rate limiting — Redis as a counter store with TTL
Session storage — User sessions in Redis instead of server memory

Interview signal: The interviewer wants to hear you discuss what to cache, where to cache it, when to invalidate, and how to handle failures (cache down ≠ system down).

Real-World Examples

Facebook (TAO) — Custom distributed cache for social graph. Billions of reads/sec with ~1ms p99. Write-through to MySQL.
Twitter — Redis for timeline caching. Fan-out on write: precompute home timelines into per-user caches.
Netflix — EVCache (Memcached-based) for session data, personalization, and video metadata. 30M+ ops/sec.
Stack Overflow — Serves 1B+ page views/month with only 9 web servers, thanks to aggressive Redis + in-memory caching.

Back-of-Envelope Numbers

Metric	Value
Redis GET latency	~0.1–0.5 ms
Redis throughput (single node)	~100K–200K ops/sec
Memcached throughput	~200K–700K ops/sec
L1 CPU cache access	~1 ns
In-process cache (HashMap)	~50–100 ns
Redis (same AZ, network)	~0.1–1 ms
Database query (indexed)	~1–10 ms
Database query (full scan)	~100–1000 ms
Typical cache hit ratio (healthy)	95–99%

🗄️ Caching