CAP Theorem — System Design Pattern

The CAP Triangle

How It Works

The CAP theorem (Brewer, 2000) states that when a network partition occurs, a distributed system must choose between consistency and availability. In practice, consistency and availability exist on a spectrum — it's not a binary toggle.

The Consistency Spectrum

Strong Consistency (Linearizability) — Every read returns the latest write. Spanner, CockroachDB. Slowest but safest. Required for payments, balances.
Causal Consistency — Respects cause-and-effect ordering. Faster than strong, stronger than eventual. "If I see reply B, I must also see the original message A."
Read-Your-Writes — A user always sees their own recent writes (but others may see stale data). Good enough for most user-facing apps.
Eventual Consistency — Reads may return stale data, but all replicas will converge "eventually." DynamoDB, Cassandra. Fastest, most available.

PACELC — The Extended Model

CAP only covers the partition case. PACELC extends it: during a Partition, choose A or C; Else (normal operation), choose Latency or Consistency. This better reflects real-world tradeoffs: even without partitions, strong consistency costs latency.

Key Design Decisions

💰

Payments (CP) vs Timeline (AP): The real choice isn't "CP or AP for the whole system" — it's per operation. Payments = CP with Raft consensus (wrong balance is catastrophic). Activity feed = AP with eventual consistency (stale data for 2s is fine). Good systems mix both.

🌐

Single-region vs Multi-region: Single region avoids most partition issues (network within a DC is reliable). Multi-region introduces real partitions — cross-region consensus adds ~100-200ms latency. This is why most teams start single-region.

🔄

Conflict resolution strategy: In AP systems, writes can conflict. Last-write-wins (simple, data loss). Merge/CRDT (complex, preserves all data). Application-level (most correct, most work). Amazon's shopping cart used LWW — occasionally duplicates items, but never loses them.

📊

Quorum tuning: In leaderless systems (Cassandra), you control the tradeoff with W and R values. W=3,R=1 = strong writes, fast reads. W=1,R=3 = fast writes, strong reads. W+R > N = linearizable. Tune per query pattern.

When to Use

CAP comes up in every distributed system design. The interviewer wants to see you map specific operations to consistency levels.

"Design a payment system" — CP: Raft/Paxos consensus, strong consistency. Can't risk double-charging.
"Design a social media feed" — AP: eventual consistency. Seeing a post 5 seconds late is fine.
"Design a shopping cart" — AP: always available, merge conflicts later (Amazon Dynamo's original use case).
"What database would you use?" — Drives the SQL vs NoSQL decision: SQL = generally CP, NoSQL = often AP (configurable).

Interview signal: Don't just say "I'd use CP." Map specific operations: "payment ledger is CP with Raft consensus, user's activity feed is AP with eventual consistency, shopping cart is AP with LWW merge."

Real-World Examples

Google Spanner — Technically CP but with 99.999% availability. Uses TrueTime (atomic clocks + GPS) for globally consistent timestamps. ~100ms cross-continent writes.
Amazon DynamoDB — AP by default (eventual consistency). Optional "strongly consistent reads" at 2× cost. Shopping cart was the original Dynamo use case.
Cassandra — AP with tunable consistency. Per-query: CONSISTENCY QUORUM for strong reads, CONSISTENCY ONE for fast reads. Same cluster, different guarantees.
CockroachDB — CP with Raft consensus. Serializable isolation by default. Designed for global transactions with ~10ms writes within a region.

Back-of-Envelope Numbers

Metric	Value
Strong consistency overhead per write (Raft)	+5–10ms (within region)
Cross-region consensus latency	~100–200ms (speed of light)
Eventual consistency convergence (p99)	<3 seconds typical
Network partition frequency	~2–3 per year per region
Stripe API requests/day	~1B+ (strong consistency for payments)
DynamoDB eventually consistent read	~1ms (half the cost of strong)
Spanner global write latency	~100ms (TrueTime consensus)

🔺 CAP Theorem