Question 1

How does Twitter Snowflake generate unique sortable IDs?

Accepted Answer

Twitter Snowflake generates 64-bit IDs with embedded timestamp information. Bit layout: 1 bit (unused sign bit) + 41 bits (millisecond timestamp since a custom epoch) + 10 bits (machine/datacenter ID) + 12 bits (per-millisecond sequence counter). The 41-bit timestamp gives approximately 69 years of range from the custom epoch. The 10-bit machine ID allows 1024 unique ID generators running simultaneously (no coordination needed between them). The 12-bit sequence counter allows 4096 unique IDs per millisecond per machine. Total capacity: over 4 million unique IDs per second per generator. IDs are naturally time-sorted because the most significant bits are the timestamp. Sorting by ID approximates sorting by creation time, which enables efficient cursor-based pagination (WHERE id > last_seen_id). The machine ID is the only coordination point: each generator must have a unique machine ID assigned at startup (via ZooKeeper, configuration file, or environment variable). Once assigned, ID generation is entirely local with no network calls. Clock synchronization: if the system clock moves backward (NTP adjustment), the generator must wait until the clock catches up to avoid duplicate IDs.

Question 2

Why are random UUIDs bad for database index performance?

Accepted Answer

B-tree indexes store keys in sorted order across pages. With sequential IDs (auto-increment), new inserts always go to the rightmost leaf page. The page fills up, splits, and the pattern continues -- sequential, predictable I/O. Only the rightmost pages need to be in memory. With random UUIDs, each new insert goes to a random position in the B-tree. This causes: (1) Random I/O -- inserting into a random page may require reading that page from disk if it is not in the buffer cache. Sequential IDs only write to pages already in memory. (2) Page splits -- random inserts cause frequent page splits throughout the tree (not just at the end), leading to fragmented storage and wasted space. (3) Cache inefficiency -- with sequential inserts, only the active pages need to be cached. With random inserts, the entire index must be cached for good performance. For a 1-billion-row table, the B-tree index might be 20GB -- too large for the buffer pool. Performance impact: random UUID inserts can be 2-5x slower than sequential inserts for write-heavy workloads. Alternatives that preserve sortability: ULID (time-prefixed random), Snowflake IDs (time-based 64-bit), or UUID v7 (draft RFC, time-ordered UUID).

Question 3

What is ULID and how does it compare to UUID and Snowflake?

Accepted Answer

ULID (Universally Unique Lexicographically Sortable Identifier) is a 128-bit identifier: 48-bit millisecond timestamp + 80-bit cryptographic randomness. Encoded as 26 Crockford Base32 characters (e.g., 01ARZ3NDEKTSV4RRFFQ69G5FAV). Compared to UUID v4: ULIDs are sortable (timestamp prefix orders them chronologically). UUID v4 is fully random and not sortable. Both are 128 bits and need no coordination. ULID has better B-tree index performance due to the time-ordered prefix (inserts are roughly sequential, not random). Compared to Snowflake: Snowflake is 64 bits (smaller, more efficient storage). ULID is 128 bits (same as UUID, fits in UUID database columns). Snowflake requires machine ID coordination. ULID needs no coordination (randomness provides uniqueness). Snowflake guarantees monotonic ordering within a machine. ULIDs generated in the same millisecond are randomly ordered (no intra-millisecond ordering). Recommendation: use ULID when you want sortable IDs without the operational complexity of machine ID assignment. Use Snowflake when you need compact 64-bit IDs at very high throughput. Use UUID v4 when sortability does not matter and you want maximum compatibility with existing systems.

Question 4

How do you choose the right ID generation strategy for a system design interview?

Accepted Answer

Match the strategy to the requirements: (1) Simple application with a single database -- use auto-increment (PostgreSQL SERIAL or BIGSERIAL, MySQL AUTO_INCREMENT). No additional infrastructure needed. (2) Distributed system needing sortable IDs at high throughput -- use Snowflake or a variant. 64-bit, time-sorted, generates millions of IDs per second with no network calls after machine ID assignment. Mention this for: URL shorteners (short, sortable IDs), social media feeds (chronological ordering by ID), and high-volume event systems. (3) Distributed system needing unique IDs without infrastructure -- use ULID or UUID v7. 128-bit, sortable, no coordination required. Good default when you want sortability without managing a Snowflake cluster. (4) System where IDs must not reveal information -- use UUID v4 (random, no timestamp embedded). Sequential or time-based IDs reveal creation time and volume. UUID v4 leaks no information. (5) Legacy system with UUID columns -- use ULID (same 128-bit size, compatible with UUID storage). In the interview: state the requirements (sortable? compact? no coordination?), choose the strategy, and explain the tradeoffs. Mentioning the B-tree performance impact of random UUIDs demonstrates deep understanding.

System Design: Distributed ID Generation — Snowflake, UUID, ULID, Twitter, Auto-Increment, Database Sequences

Requirements for Distributed IDs

UUID v4: Random IDs

Twitter Snowflake: Time-Sorted 64-Bit IDs

ULID: Universally Unique Lexicographically Sortable Identifier

Database Sequences and Auto-Increment

Choosing the Right ID Strategy