โš ๏ธ This guide is AI-generated and may contain inaccuracies. Always verify against authoritative sources and real-world documentation.

Architecture Diagram โ€” Snowflake ID

SNOWFLAKE ID โ€” 64-bit Layout 0 1 bit TIMESTAMP Milliseconds since custom epoch 41 bits โ†’ ~69.7 years MACHINE ID Data center + Worker 10 bits โ†’ 1,024 machines SEQUENCE Counter within millisecond 12 bits โ†’ 4,096 IDs/ms 0 | 10110010110011001010011010110101001001010 | 0000000101 | 000000000001 Independent ID Generation โ€” No Coordination Server A (ID: 001) ts:1711929600001 machine: 001 seq: 0001 โ†’ 0002 โ†’ 0003 Server B (ID: 002) ts:1711929600001 machine: 002 seq: 0001 โ†’ 0002 โ†’ 0003 Server C (ID: 003) ts:1711929600001 machine: 003 seq: 0001 โ†’ 0002 โ†’ 0003 7180578201600001025 7180578201600002049 7180578201600003073 โœ“ All unique โ€” different machine IDs guarantee no collisions Same timestamp + same sequence BUT different machine ID โ†’ different ID

How It Works

In distributed systems, auto-increment IDs from a single database don't work โ€” they create a bottleneck and single point of failure. You need IDs that are unique across all nodes, roughly sortable by time, and generated without coordination.

Snowflake ID Generation

  1. API server needs an ID โ†’ calls local Snowflake generator (embedded or sidecar service)
  2. Generator composes 64-bit ID: [1-bit unused][41-bit timestamp][10-bit machine ID][12-bit sequence]
  3. Timestamp = current_time_ms - custom_epoch. Twitter uses 2010-11-04 as epoch โ†’ gives 69 years of IDs
  4. Machine ID is pre-assigned per server (from ZooKeeper, config, or Kubernetes pod ordinal)
  5. Sequence increments within the same millisecond (0โ€“4095). Resets each millisecond
  6. If 4096 IDs exhausted in one ms: wait until next millisecond (extremely rare per machine)
  7. ID generated locally, no network call โ†’ <0.1ms per ID

ID Generation Approaches

Snowflake (Twitter)

64-bit, time-sortable, compact. 4,096 IDs/ms/machine. Needs machine ID coordination. Clock skew risk. Used by Twitter, Discord, Instagram (variant).

UUID v4

128-bit random. Universally unique, zero coordination. But: not sortable, 36 chars as string, random writes cause B-tree page splits โ†’ poor index performance.

ULID

128-bit = 48-bit timestamp + 80-bit random. Sortable, lexicographic ordering, Crockford Base32 encoded. String-friendly. Good UUID replacement when you need sortability.

Database Ticket Server

Central server allocates ID blocks (e.g., Server A gets 1โ€“1000, B gets 1001โ€“2000). Simple but adds coordination. Flickr used two MySQL auto-increment servers with odd/even IDs.

Key Design Decisions

๐Ÿ”ข

Snowflake vs UUID: Snowflake: 64-bit (8 bytes), sortable, sequential index writes. UUID v4: 128-bit (16 bytes), random, causes index fragmentation. For databases with B-tree indexes, Snowflake is 2ร— smaller and doesn't cause page splits. Use UUID only when you need zero coordination and don't care about sortability.

โฐ

Clock skew risk: Snowflake depends on monotonic time. If a server's clock jumps backward (NTP correction), it could generate duplicate IDs or IDs out of order. Mitigations: wait until clock catches up, use monotonic clock, or add sequence bits. Twitter's Snowflake refuses to generate IDs if clock moves backward.

๐Ÿ—๏ธ

Embedded vs service: Separate Snowflake service = one more network hop (~1ms) but centralized machine ID management. Embedded in app = zero latency but needs machine ID assignment mechanism. Most companies embed it now (Instagram's PL/PGSQL function, Discord's in-process generator).

๐Ÿ”’

ID as security risk: Sequential IDs reveal volume โ€” a competitor can infer your order count by creating orders days apart. For public-facing IDs (order numbers, invoice IDs), use obfuscated or random-looking IDs. Keep Snowflake IDs for internal use.

When to Use

Distributed ID generation is usually a sub-component of a larger system design, not the main problem. But getting it right matters for performance and correctness.

  • "Design a URL shortener" โ€” Need globally unique short codes. Snowflake ID โ†’ Base62 encode
  • "Design Twitter / Instagram" โ€” Every post needs a unique, time-sortable ID across all servers
  • "Design a distributed database" โ€” Sharded tables need IDs that don't collide across shards
  • "How do you generate 10K IDs/sec across 50 servers?" โ€” Snowflake: each server generates independently, machine ID ensures uniqueness

Interview signal: Sketch the 64-bit layout on the whiteboard and calculate the limits. This shows you understand the design constraints, not just the name.

Real-World Examples

  • Twitter Snowflake โ€” Created to generate ~10K unique tweet IDs per second per server. Every tweet ID (like 1234567890123456789) is a Snowflake ID. Open-sourced in 2010, now the industry standard pattern.
  • Instagram sharded IDs โ€” Similar scheme: 41 bits timestamp + 13 bits shard ID + 10 bits auto-increment. Each Postgres shard generates its own IDs independently using a PL/PGSQL function. No external service needed.
  • Discord Snowflakes โ€” Discord uses Snowflake IDs for messages, users, channels, guilds. The timestamp component lets them efficiently query "messages in this channel after time T" using the ID as a time-based filter.
  • Sony's Sonyflake โ€” Variant optimized for longer lifespan: 39-bit timestamp (in 10ms units, ~174 years) + 8-bit sequence + 16-bit machine ID (65,536 machines). Tradeoff: lower throughput (256/10ms = 25.6K/sec) for more machines and longer epoch.

Back-of-Envelope Numbers

Metric Value
Snowflake IDs per ms per machine4,096
Max machines (10-bit)1,024
Theoretical max throughput (all machines)~4.2 billion IDs/sec
Snowflake epoch lifespan (41-bit ms)~69.7 years
Snowflake ID size64-bit (8 bytes)
UUID v4 size128-bit (16 bytes)
UUID v4 collision probability (1B IDs)~10โปยนโธ (negligible)
ID generation latency (embedded)<0.1 ms (no network call)