โš ๏ธ This guide is AI-generated and may contain inaccuracies. Always verify against authoritative sources and real-world documentation.

The CAP Triangle

C Consistency Every read gets the latest write A Availability Every request gets a response P Partition Tolerance System works despite network splits CP Systems Consistent + Partition-tolerant โ€ข Google Spanner โ€ข CockroachDB โ€ข HBase, MongoDB May reject requests AP Systems Available + Partition-tolerant โ€ข Cassandra โ€ข DynamoDB โ€ข CouchDB, Riak May return stale data REAL CHOICE Partitions are inevitable. Choose CP or AP per operation. PACELC: During Partition โ†’ C or A Else (normal) โ†’ Latency or Consistency

How It Works

The CAP theorem (Brewer, 2000) states that when a network partition occurs, a distributed system must choose between consistency and availability. In practice, consistency and availability exist on a spectrum โ€” it's not a binary toggle.

The Consistency Spectrum

  1. Strong Consistency (Linearizability) โ€” Every read returns the latest write. Spanner, CockroachDB. Slowest but safest. Required for payments, balances.
  2. Causal Consistency โ€” Respects cause-and-effect ordering. Faster than strong, stronger than eventual. "If I see reply B, I must also see the original message A."
  3. Read-Your-Writes โ€” A user always sees their own recent writes (but others may see stale data). Good enough for most user-facing apps.
  4. Eventual Consistency โ€” Reads may return stale data, but all replicas will converge "eventually." DynamoDB, Cassandra. Fastest, most available.

PACELC โ€” The Extended Model

CAP only covers the partition case. PACELC extends it: during a Partition, choose A or C; Else (normal operation), choose Latency or Consistency. This better reflects real-world tradeoffs: even without partitions, strong consistency costs latency.

Key Design Decisions

๐Ÿ’ฐ

Payments (CP) vs Timeline (AP): The real choice isn't "CP or AP for the whole system" โ€” it's per operation. Payments = CP with Raft consensus (wrong balance is catastrophic). Activity feed = AP with eventual consistency (stale data for 2s is fine). Good systems mix both.

๐ŸŒ

Single-region vs Multi-region: Single region avoids most partition issues (network within a DC is reliable). Multi-region introduces real partitions โ€” cross-region consensus adds ~100-200ms latency. This is why most teams start single-region.

๐Ÿ”„

Conflict resolution strategy: In AP systems, writes can conflict. Last-write-wins (simple, data loss). Merge/CRDT (complex, preserves all data). Application-level (most correct, most work). Amazon's shopping cart used LWW โ€” occasionally duplicates items, but never loses them.

๐Ÿ“Š

Quorum tuning: In leaderless systems (Cassandra), you control the tradeoff with W and R values. W=3,R=1 = strong writes, fast reads. W=1,R=3 = fast writes, strong reads. W+R > N = linearizable. Tune per query pattern.

When to Use

CAP comes up in every distributed system design. The interviewer wants to see you map specific operations to consistency levels.

  • "Design a payment system" โ€” CP: Raft/Paxos consensus, strong consistency. Can't risk double-charging.
  • "Design a social media feed" โ€” AP: eventual consistency. Seeing a post 5 seconds late is fine.
  • "Design a shopping cart" โ€” AP: always available, merge conflicts later (Amazon Dynamo's original use case).
  • "What database would you use?" โ€” Drives the SQL vs NoSQL decision: SQL = generally CP, NoSQL = often AP (configurable).

Interview signal: Don't just say "I'd use CP." Map specific operations: "payment ledger is CP with Raft consensus, user's activity feed is AP with eventual consistency, shopping cart is AP with LWW merge."

Real-World Examples

  • Google Spanner โ€” Technically CP but with 99.999% availability. Uses TrueTime (atomic clocks + GPS) for globally consistent timestamps. ~100ms cross-continent writes.
  • Amazon DynamoDB โ€” AP by default (eventual consistency). Optional "strongly consistent reads" at 2ร— cost. Shopping cart was the original Dynamo use case.
  • Cassandra โ€” AP with tunable consistency. Per-query: CONSISTENCY QUORUM for strong reads, CONSISTENCY ONE for fast reads. Same cluster, different guarantees.
  • CockroachDB โ€” CP with Raft consensus. Serializable isolation by default. Designed for global transactions with ~10ms writes within a region.

Back-of-Envelope Numbers

Metric Value
Strong consistency overhead per write (Raft)+5โ€“10ms (within region)
Cross-region consensus latency~100โ€“200ms (speed of light)
Eventual consistency convergence (p99)<3 seconds typical
Network partition frequency~2โ€“3 per year per region
Stripe API requests/day~1B+ (strong consistency for payments)
DynamoDB eventually consistent read~1ms (half the cost of strong)
Spanner global write latency~100ms (TrueTime consensus)