โš ๏ธ This guide is AI-generated and may contain inaccuracies. Always verify against authoritative sources and real-world documentation.

Architecture Diagram โ€” Hash Ring

Before (3 Nodes) 0 / 2ยณยฒ A B C k1 k2 k3 k4 +Node D After (4 Nodes) A B C D NEW k1 โ†’D k2 โ†’C k3 โ†’A k4 โ†’B โ— Moved key โ— Unchanged key Only k1 remapped (1/4)

How It Works

Imagine a ring (circle) representing the entire hash space [0, 2ยณยฒ). Both nodes and keys are hashed onto this ring. Each key is assigned to the first node encountered clockwise from the key's position.

The Problem with Naive Hashing

With simple hash(key) % N, adding or removing a server changes N, causing nearly all keys to be remapped. For a cache cluster, this means a near-100% miss rate โ€” catastrophic.

Consistent Hashing Solution

When a node is added, only keys between the new node and its predecessor (counter-clockwise) need to move โ€” roughly K/N keys (where K = total keys, N = total nodes). When a node is removed, only its keys move to the next node clockwise.

Virtual Nodes (Vnodes)

With few physical nodes, the ring can be unbalanced (one node owns a huge arc). Virtual nodes solve this: each physical node creates 100โ€“200 virtual nodes spread around the ring. This gives near-uniform distribution.

  • Physical Node A โ†’ vnode_A_0, vnode_A_1, ..., vnode_A_149 (spread evenly)
  • More vnodes = better balance but more memory for the ring map
  • Heterogeneous servers: give more vnodes to more powerful nodes

Redistribution Math

With N nodes and a new node added:

  • Naive hashing: ~(N-1)/N keys remapped โ†’ for N=10, 90% move
  • Consistent hashing: ~K/N keys remapped โ†’ for N=10, 10% move
  • With virtual nodes (150 per node): standard deviation of load โ‰ˆ 5โ€“10% of mean

Key Design Decisions

๐Ÿ”ข

Number of virtual nodes: More vnodes = better load distribution but more memory and lookup time. 100โ€“200 vnodes per physical node is the sweet spot. Cassandra uses 256 by default (configurable).

๐Ÿ”‘

Hash function choice: Must be fast and uniform. MD5 works but is overkill for non-crypto use. MurmurHash3 or xxHash are better โ€” faster and well-distributed. Avoid CRC32 (poor distribution).

๐Ÿ“‹

Ring metadata storage: The ring map (sorted list of hash โ†’ node) must be available to all clients. Options: gossip protocol (Cassandra), centralized config (ZooKeeper), or client library with seed nodes.

๐Ÿ”„

Replication with consistent hashing: Store each key on N successive nodes clockwise (replication factor). Skip virtual nodes of the same physical node to ensure replicas are on different machines.

When to Use

Consistent hashing comes up whenever the interviewer asks about distributed data partitioning or scaling a cluster.

  • "Design a distributed cache" โ€” How do you decide which cache node holds which key?
  • "Design a key-value store" โ€” Data partitioning across nodes
  • "How does Cassandra/DynamoDB partition data?" โ€” Consistent hashing with vnodes
  • "Scale a CDN" โ€” Map content to edge servers
  • "Design a load balancer with session affinity" โ€” Hash client IP to backend server

Interview signal: The interviewer wants to see you understand why naive modulo hashing fails and how consistent hashing minimizes disruption during scaling events.

Real-World Examples

  • Amazon DynamoDB โ€” Uses consistent hashing for partition assignment. Described in the Dynamo paper (2007) that popularized the concept in industry.
  • Apache Cassandra โ€” Each node owns token ranges on the ring. Vnodes (default 256) allow automatic rebalancing when nodes join/leave.
  • Akamai CDN โ€” Invented consistent hashing (Karger et al., 1997). Used to map URLs to edge servers with minimal disruption.
  • Discord โ€” Uses consistent hashing to route messages to the correct guild (server) process across thousands of Elixir nodes.
  • Memcached clients โ€” libmemcached and most client libraries use consistent hashing (ketama algorithm) for server selection.

Back-of-Envelope Numbers

MetricValue
Keys remapped on node add (naive hash)~(N-1)/N โ‰ˆ 90% for N=10
Keys remapped on node add (consistent)~K/N โ‰ˆ 10% for N=10
Recommended virtual nodes per physical100โ€“200
Cassandra default vnodes256
Ring lookup time (sorted array + binary search)O(log V) where V = total vnodes
Ring memory (10 nodes ร— 200 vnodes)~2000 entries โ‰ˆ negligible
Load std deviation (150 vnodes)~5โ€“10% of mean
MurmurHash3 throughput~3โ€“5 GB/s