System Design: Distributed ID Generation — Snowflake, UUID, ULID, Twitter, Auto-Increment, Database Sequences

Generating unique identifiers in a distributed system is a surprisingly complex problem. A simple auto-incrementing integer does not work when you have multiple database servers, microservices creating records independently, or requirements for time-ordered IDs. This guide covers the major approaches to distributed ID generation — from UUIDs to Twitter Snowflake — with tradeoffs and implementation details for system design interviews.

Requirements for Distributed IDs

A good distributed ID system must provide: (1) Global uniqueness — no two IDs are the same across all services and databases. Collisions cause data corruption. (2) High throughput — generate millions of IDs per second without becoming a bottleneck. (3) Low latency — ID generation should be local (no network round-trip) and sub-millisecond. (4) Sortability (often desired) — IDs should be roughly time-ordered so that sorting by ID approximates sorting by creation time. This enables efficient range queries and pagination using the ID as a cursor. (5) Compactness — shorter IDs use less storage and bandwidth. A 64-bit integer is more efficient than a 128-bit UUID. Additional considerations: no coordination required (each node generates IDs independently), no single point of failure (centralized ID services are availability risks), and information leakage (sequential IDs reveal business metrics like order volume).

UUID v4: Random IDs

UUID v4 generates 128-bit identifiers with 122 random bits. Example: 550e8400-e29b-41d4-a716-446655440000. Pros: no coordination required (each node generates independently), no single point of failure, extremely low collision probability (2^122 possibilities — generating 1 billion UUIDs per second, the probability of a single collision is negligible for thousands of years). Cons: (1) Not sortable — UUIDs are random, so sorting by UUID does not approximate sorting by creation time. Pagination by UUID requires a separate timestamp column. (2) Poor index performance — random UUIDs scatter inserts across the B-tree index, causing random I/O and page splits. This degrades write performance on B-tree indexed tables by 2-5x compared to sequential IDs. (3) Large size — 128 bits (16 bytes) is 2x the size of a 64-bit integer. In high-volume tables with multiple foreign keys, this adds significant storage overhead. (4) Not human-readable — debugging with UUIDs is harder than with sequential integers. UUID v4 is a reasonable default when sortability and index performance are not critical (low-to-medium write volume, no pagination by ID).

Twitter Snowflake: Time-Sorted 64-Bit IDs

Twitter Snowflake generates 64-bit IDs that are roughly time-ordered and globally unique. Bit layout: 1 bit (unused/sign) + 41 bits (timestamp in milliseconds since a custom epoch) + 10 bits (machine/worker ID) + 12 bits (sequence number). The 41-bit timestamp supports approximately 69 years from the epoch. The 10-bit machine ID supports 1024 unique workers. The 12-bit sequence supports 4096 IDs per millisecond per worker — 4 million IDs per second per worker. Properties: IDs are time-sorted (higher ID = later creation), 64-bit integer (efficient storage and indexing), no coordination (each worker generates independently using its assigned machine ID), and monotonically increasing within a single worker. Machine ID assignment: assign unique IDs to each worker at startup. Use ZooKeeper, etcd, or a configuration file. The machine ID is the coordination point — once assigned, ID generation is fully local. Snowflake is used by Twitter, Discord (with modifications), Instagram, and many others. It is the standard approach for high-throughput systems requiring sortable IDs.

ULID: Universally Unique Lexicographically Sortable Identifier

ULID combines the benefits of UUID (no coordination) with sortability. Format: 128 bits = 48-bit timestamp (millisecond precision) + 80-bit randomness. String representation: 26 characters in Crockford Base32 (e.g., 01ARZ3NDEKTSV4RRFFQ69G5FAV). The timestamp prefix makes ULIDs lexicographically sortable — sorting ULID strings sorts by creation time. ULIDs generated in the same millisecond are not ordered relative to each other (the random suffix determines order). Pros: sortable (unlike UUID v4), no coordination (unlike Snowflake — no machine ID assignment), compatible with UUID storage columns (same 128-bit size), and human-readable (shorter than UUID, case-insensitive). Cons: 128 bits (larger than Snowflake 64 bits), the random component still causes some B-tree fragmentation (though less than UUID v4 because the time prefix provides partial ordering), and no guaranteed monotonicity within a millisecond (two ULIDs generated in the same millisecond may not be in creation order). ULID is a good default for applications that need sortable IDs without the operational complexity of Snowflake machine ID assignment.

Database Sequences and Auto-Increment

Auto-increment (MySQL) and sequences (PostgreSQL) generate sequential integers from a single database. Simple and efficient for single-database systems. Problems in distributed systems: (1) Single point of failure — the sequence generator is on one database server. If it fails, no IDs can be generated. (2) Bottleneck — all ID generation funnels through one server. At very high throughput (100K+ inserts/sec), this becomes a bottleneck. (3) Sharding — if you shard the database, each shard needs its own sequence. Two shards may generate the same ID. Solution: use different starting points and step sizes. Shard 1: start=1, step=2 (1, 3, 5, …). Shard 2: start=2, step=2 (2, 4, 6, …). This works but adding shards requires reconfiguring all existing shards. (4) Information leakage — sequential IDs reveal business metrics. If your latest order ID is 1,000,000, competitors know your total order count. Flickr approach: use two MySQL servers with different auto-increment offsets as a dedicated ID service. Server A generates odd IDs, server B generates even IDs. The application round-robins between them. This provides high availability (either server can generate IDs) with simple infrastructure.

Choosing the Right ID Strategy

Decision framework: (1) Single database, low-to-medium volume — use auto-increment or database sequences. Simplest option. (2) Need sortable IDs at high throughput (>100K/sec) — use Snowflake or a Snowflake variant. 64-bit, time-sorted, no coordination after machine ID assignment. (3) Need globally unique IDs without coordination and sortability is nice-to-have — use ULID. 128-bit, sortable, no machine ID management. (4) Need globally unique IDs and do not care about sortability — use UUID v4. No coordination, well-supported in every language and database. (5) Multi-region with strict ordering requirements — use a centralized ID service with batching. Each application server requests a batch of 1000 IDs, uses them locally, and requests a new batch when exhausted. The centralized service is the coordination point. In system design interviews: mention Snowflake for high-throughput systems (URL shorteners, social feeds), UUID v4 for general-purpose use, and explain the tradeoffs between coordination, sortability, and size.

Scroll to Top