Generating unique, sortable IDs across distributed nodes without coordination. Snowflake, ULID, UUID โ each with different tradeoffs for size, sortability, and collision probability.
Medium Medium-High FrequencyIn distributed systems, auto-increment IDs from a single database don't work โ they create a bottleneck and single point of failure. You need IDs that are unique across all nodes, roughly sortable by time, and generated without coordination.
[1-bit unused][41-bit timestamp][10-bit machine ID][12-bit sequence]current_time_ms - custom_epoch. Twitter uses 2010-11-04 as epoch โ gives 69 years of IDs64-bit, time-sortable, compact. 4,096 IDs/ms/machine. Needs machine ID coordination. Clock skew risk. Used by Twitter, Discord, Instagram (variant).
128-bit random. Universally unique, zero coordination. But: not sortable, 36 chars as string, random writes cause B-tree page splits โ poor index performance.
128-bit = 48-bit timestamp + 80-bit random. Sortable, lexicographic ordering, Crockford Base32 encoded. String-friendly. Good UUID replacement when you need sortability.
Central server allocates ID blocks (e.g., Server A gets 1โ1000, B gets 1001โ2000). Simple but adds coordination. Flickr used two MySQL auto-increment servers with odd/even IDs.
Snowflake vs UUID: Snowflake: 64-bit (8 bytes), sortable, sequential index writes. UUID v4: 128-bit (16 bytes), random, causes index fragmentation. For databases with B-tree indexes, Snowflake is 2ร smaller and doesn't cause page splits. Use UUID only when you need zero coordination and don't care about sortability.
Clock skew risk: Snowflake depends on monotonic time. If a server's clock jumps backward (NTP correction), it could generate duplicate IDs or IDs out of order. Mitigations: wait until clock catches up, use monotonic clock, or add sequence bits. Twitter's Snowflake refuses to generate IDs if clock moves backward.
Embedded vs service: Separate Snowflake service = one more network hop (~1ms) but centralized machine ID management. Embedded in app = zero latency but needs machine ID assignment mechanism. Most companies embed it now (Instagram's PL/PGSQL function, Discord's in-process generator).
ID as security risk: Sequential IDs reveal volume โ a competitor can infer your order count by creating orders days apart. For public-facing IDs (order numbers, invoice IDs), use obfuscated or random-looking IDs. Keep Snowflake IDs for internal use.
Distributed ID generation is usually a sub-component of a larger system design, not the main problem. But getting it right matters for performance and correctness.
Interview signal: Sketch the 64-bit layout on the whiteboard and calculate the limits. This shows you understand the design constraints, not just the name.
1234567890123456789) is a Snowflake ID. Open-sourced in 2010, now the industry standard pattern.| Metric | Value |
|---|---|
| Snowflake IDs per ms per machine | 4,096 |
| Max machines (10-bit) | 1,024 |
| Theoretical max throughput (all machines) | ~4.2 billion IDs/sec |
| Snowflake epoch lifespan (41-bit ms) | ~69.7 years |
| Snowflake ID size | 64-bit (8 bytes) |
| UUID v4 size | 128-bit (16 bytes) |
| UUID v4 collision probability (1B IDs) | ~10โปยนโธ (negligible) |
| ID generation latency (embedded) | <0.1 ms (no network call) |