Maintain copies of data across multiple database instances for availability, fault tolerance, and read scalability. The fundamental tradeoff: consistency vs performance.
Medium High FrequencyReplication maintains copies of data across multiple database instances. The leader handles all writes and propagates changes to followers. Clients read from followers to distribute the read load.
One leader handles writes, followers replicate and serve reads. Simple, well-understood. The default choice for most applications. PostgreSQL streaming replication, MySQL replication.
Multiple nodes accept writes, replicate to each other. Used for multi-datacenter setups (each DC has a leader). Must handle write conflicts โ last-write-wins, merge, or app-level resolution.
Any node accepts reads/writes. Uses quorum: W+R > N for consistency. Cassandra, DynamoDB. More available but harder to reason about consistency.
Synchronous: write waits for follower ACK โ strong consistency, slower. Asynchronous: write returns immediately โ fast but followers may lag. Semi-sync: wait for 1 replica (best of both).
With async replication, followers may be seconds behind. A user updates their profile and immediately refreshes โ but reads from a lagging follower and sees old data. Fix: route the user's own reads to the leader for 5 seconds after a write (read-your-writes consistency).
Synchronous vs Asynchronous replication: Sync gives zero lag (strong consistency) but every write waits for replica ACK โ doubles write latency. If a replica is slow/down, the leader is stuck. Async is fast but risks data loss on failover (leader had writes replicas didn't receive). Answer: semi-synchronous โ wait for at least 1 replica, not all.
Leader failover: When the leader dies, a follower must be promoted. With async replication, the promoted follower may lack recent writes (data loss). Split-brain: two nodes both think they're leader โ data divergence. Use consensus-based failover (Orchestrator, Patroni) with fencing.
Read-your-writes consistency: After a write, route that user's reads to the leader (or a sync replica) for a brief period. ProxySQL can check SHOW SLAVE STATUS โ if replica lag > 1s, route to leader. This is the #1 operational headache of replication.
Multi-region replication: For global availability, replicate across data centers. Leader-follower works with a single leader (writes all go to one region). Multi-leader gives local writes everywhere but you must solve conflict resolution.
Interview signal: When you mention read replicas, immediately address replication lag and read-after-write consistency. This single concern shows you understand the #1 operational headache.
| Metric | Value |
|---|---|
| Shopify read/write ratio | ~50:1 (5 replicas handle it) |
| Replication lag (async) p50 / p99 | 10ms / 200ms |
| Max acceptable lag | 1 second (route to leader beyond this) |
| Leader write capacity (MySQL, 64-core) | ~10K writes/sec |
| Read replica capacity | ~40K QPS each |
| Failover time (automated) | 10โ30 seconds |
| Semi-sync write latency overhead | +2โ5ms per write |