System Design Article

Design a Unique ID Generator

Difficulty: Medium

Design a service that generates globally unique, roughly time-sortable 64-bit IDs at 1M IDs per second across hundreds of application servers, without coordination on the hot path. The interview centerpiece is the trade-off between uniqueness, ordering, size, and coordination cost. We compare UUIDv4 (random, no coordination, 128 bits, no ordering), database AUTOINCREMENT (single point of contention), Twitter Snowflake (64 bits, time-ordered, requires worker_id assignment and clock discipline), Instagram's per-shard hybrid, and ULID/KSUID. We deep-dive into Snowflake: bit layout, clock skew handling, leader election for worker IDs, and the dreaded clock-rollback bug.

System Design
/

Design a Unique ID Generator

Design a Unique ID Generator

Design a service that generates globally unique, roughly time-sortable 64-bit IDs at 1M IDs per second across hundreds of application servers, without coordination on the hot path. The interview centerpiece is the trade-off between uniqueness, ordering, size, and coordination cost. We compare UUIDv4 (random, no coordination, 128 bits, no ordering), database AUTOINCREMENT (single point of contention), Twitter Snowflake (64 bits, time-ordered, requires worker_id assignment and clock discipline), Instagram's per-shard hybrid, and ULID/KSUID. We deep-dive into Snowflake: bit layout, clock skew handling, leader election for worker IDs, and the dreaded clock-rollback bug.

System Design
Medium
design-unique-id-generator
case-study
unique-specialized
snowflake-id
uuid
ulid
ksuid
distributed-id
clock-skew
leader-election
zookeeper
instagram
twitter
system-design
intermediate
premium

184 views

5

Requirements

Functional Requirements

  1. Generate unique IDs at high throughput (millions/sec across the fleet).
  2. 64-bit IDs preferred (compact, fits a database BIGINT, half the size of a UUID).
  3. Roughly time-sortable: sorting by ID approximates sorting by creation time.
  4. Decentralized: each application instance can generate IDs without a coordinator on the hot path.
  5. Configurable epoch: choose a service-specific zero point so timestamp bits last long enough.

Out of Scope (state explicitly)

  • ID-based sharding strategy (covered in design-key-value-store).
  • ID-based access control / capability tokens.
  • Public-facing hashes / short URLs (covered in design-url-shortener).

Non-Functional Requirements

  1. Throughput: 1M IDs/sec aggregate across the fleet; 4096 IDs/ms per worker (Snowflake-style).
  2. Uniqueness: never duplicate, even across worker restarts and clock skew within tolerance.
  3. Latency: ID generation must be local (no network call); <1 microsecond per ID.
  4. Service lifetime: ID space lasts >50 years from epoch.
  5. Operability: detect clock rollback, refuse to issue colliding IDs.

Back-of-the-Envelope Estimation

Throughput per Worker (Snowflake)

Text
---------- Snowflake bit budget ----------
Layout (64 bits, 1 reserved):
  41 bits = timestamp (ms since custom epoch)
  10 bits = worker_id (1024 workers max)
  12 bits = sequence (per ms per worker)
   1 bit  = sign (always 0 to keep IDs positive)

Max IDs per worker per ms: 2^12 = 4096
Max IDs per worker per sec: 4,096,000
Max IDs across 1024 workers per sec: ~4.2 billion

4096 IDs per worker per millisecond is plenty for almost any application; if a worker exceeds it, the algorithm waits for the next ms.

Service Lifetime

Text
---------- Lifetime calculation ----------
41 bits of milliseconds = 2^41 ms = ~69.7 years
With a custom epoch (e.g., 2020-01-01), service runs until ~2089.
UUIDv4 has no time bits; UUIDv7 uses 48 bits of ms = ~8900 years.

Storage Comparison

Text
---------- ID size ----------
UUIDv4:        128 bits (16 bytes)
Snowflake:     64 bits (8 bytes)
ULID:          128 bits (16 bytes; sortable)
KSUID:         160 bits (20 bytes; sortable)
Mongo ObjectID:96 bits (12 bytes)

For a high-volume table (10B rows), 64-bit IDs save ~80 GB vs UUIDs in the primary key column alone, and far more in indexes.

High-Level Design

There are two patterns to consider, and the right answer depends on your operational model.

Pattern A: Embedded Library (no service)

Each application process imports a Snowflake library; on startup, it gets a worker_id from a config store (Zookeeper, etcd, AWS Parameter Store) and generates IDs locally. There is no ID service in the request path.

Text
---------- Embedded library architecture ----------
  [ App Server 1 ]   [ App Server 2 ]   [ App Server 3 ]
        |                  |                  |
   has worker_id=42   has worker_id=51   has worker_id=104
        |                  |                  |
        v                  v                  v
   generate IDs locally; no network call
        |                  |                  |
        +--------+---------+---------+
                          |
                          v (only once at startup, to claim worker_id)
               +--------------------+
               | Zookeeper / etcd   |
               | (worker_id assign) |
               +--------------------+

Pattern B: Centralized ID Service

A dedicated ID-Gen Service exposes GET /id. App servers call it to obtain an ID. Simpler operationally (one fleet to monitor) but adds a network round trip per ID, which is a hard sell at high throughput.

Production systems almost always pick Pattern A. We design it.

API Design (Embedded Library)

// Library on each app process
class SnowflakeGenerator {
    constructor(workerId, epoch = 1577836800000) {
        this.workerId = workerId;
        this.epoch = epoch;
        this.sequence = 0n;
        this.lastTs = -1n;
    }
    nextId() {
        let ts = BigInt(Date.now()) - BigInt(this.epoch);
        if (ts < this.lastTs) {
            throw new Error('Clock moved backwards; refusing to issue ID.');
        }
        if (ts === this.lastTs) {
            this.sequence = (this.sequence + 1n) & 0xFFFn;
            if (this.sequence === 0n) {
                ts = this.waitNextMs(this.lastTs);
            }
        } else {
            this.sequence = 0n;
        }
        this.lastTs = ts;
        return (ts << 22n) | (BigInt(this.workerId) << 12n) | this.sequence;
    }
    waitNextMs(prev) {
        let ts;
        do {
            ts = BigInt(Date.now()) - BigInt(this.epoch);
        } while (ts <= prev);
        return ts;
    }
}

Detailed Design

The two interesting topics are the Snowflake bit layout choices and the clock skew / worker_id assignment problem.

Why 41 + 10 + 12?

The bit budget is a three-way trade among:

  • Service lifetime (timestamp bits): more bits = longer life, but constrains the others.
  • Worker count (worker_id bits): more bits = more workers, but constrains the others.
  • Per-worker throughput (sequence bits): more bits = higher per-worker IDs/ms.
Text
---------- Bit budget choices ----------
Twitter (default):    41 ts + 10 worker + 12 seq    = 4096 IDs/ms/worker, 1024 workers, 69 yr
Discord:              42 ts + 5 dc + 5 worker + 12 seq = same per-worker, 32 datacenters * 32 workers
Mongo ObjectID:       32 s ts + 40 random + 24 counter (96 bits, post-3.4 spec)
UUIDv7:               48 ts (ms) + 12 ver/var + 62 random  (128 bits)

10 worker bits gives you 1024 worker IDs - enough for almost any deployment. 12 sequence bits gives 4096 IDs/ms/worker, which is 4M/s/worker - more than any single process needs. The remaining 41 bits give a 69-year lifetime, comfortably past any service's planning horizon.

Clock Skew: The Production Killer

Snowflake assumes time moves forward. NTP can step the clock backwards (e.g., a leap second adjustment, a misconfigured clock fixing itself). If now() < last_ts, the next ID could collide with one already issued.

Text
---------- Three failure scenarios ----------
1. Small backward jump (a few ms):
   - Detect: ts < last_ts.
   - Action: pause until clock catches up (sleep then retry).

2. Large backward jump (>1 s):
   - Detect: ts < last_ts by more than tolerance.
   - Action: refuse to issue IDs; raise alert; engineer intervention.

3. Forward leap (clock jumps minutes ahead):
   - Detect: not directly harmful (IDs continue to be unique).
   - Action: reduces effective lifetime; usually self-corrects.

The canonical fix is to disable NTP step adjustments and only allow slew (gradual). On Linux: ntpd -x slews instead of stepping; with chrony the equivalent is configuring makestep (e.g., makestep 0 -1 to disable stepping entirely, or set a strict slew bound in chrony.conf). The Snowflake process refuses to issue IDs if the clock moves backwards regardless.

Worker ID Assignment

With 1024 worker IDs, you cannot manually configure each app server. Options:

  1. Zookeeper / etcd ephemeral nodes: the process registers, gets the next available worker_id, holds it for the process lifetime. On crash, the node disappears and the ID is reusable.
  2. Config service (e.g., AWS Parameter Store): centrally allocate worker_id ranges per service.
  3. Hash from instance ID: derive worker_id from hash(hostname) mod 1024. Risk of collision; fine for small fleets.

Production systems use Zookeeper for ephemeral assignment. The complication is reuse: if worker_id 42 was assigned to a process that crashed, can we reuse it immediately? Only if we are sure the dead process is dead (and not generating IDs we missed). Use a 'cool-down' (e.g., 30 s) before reassignment.

Why Not UUIDv4?

UUIDv4 is 128 bits of randomness with no coordination needed. It has two big problems for high-throughput databases:

  • Index fragmentation: random keys insert into random pages of a B-tree, causing page splits and poor cache locality. UUID-keyed inserts can be 5-10x slower than sequential keys at scale.
  • Size: 128 bits doubles index storage and eats RAM for cache.

UUIDv7 (drafted 2022) fixes the first problem by putting 48 bits of millisecond timestamp in the high bits, making them roughly sortable. It is the modern alternative to Snowflake when you want decentralized + roughly ordered + standardized format. Cost: still 128 bits.

Comparison Matrix

StrategySize (bits)CoordinationOrderedThroughputUse case
DB AUTOINCREMENT64Strong (DB)YesDB write QPSSingle-DB apps
UUIDv4128NoneNoUnlimitedDistributed, no ordering needed
UUIDv7128NoneRoughlyUnlimitedModern decentralized + ordered
Snowflake64Worker assignmentRoughly4M/s/workerDistributed, ordered, compact
ULID128NoneRoughlyUnlimitedUUID drop-in with ordering
KSUID160NoneRoughlyUnlimitedLarger time range than ULID
Mongo ObjectID96Process IDRoughlyHighMongoDB primary keys
Range allocator64PeriodicYesVery highPre-allocate ID blocks per worker

Range Allocator (Instagram Pattern)

Instagram needed time-ordered IDs but did not want a single Snowflake fleet. Their solution: each shard has its own auto-increment sequence; the high bits of the ID encode the shard. A central service hands out 'ID blocks' (e.g., 1000 IDs at a time) to each app server, which then consumes them locally without coordination.

Text
---------- Range allocator ----------
App server requests block:    GET /allocate?count=1000
Response:                     start=1000000, end=1000999
App server consumes locally;  uses the block for the next ~5 s
When block depletes:          request next block
Advantage:                    no per-ID coordination; resilient to brief allocator outages
Disadvantage:                 IDs may be skipped if the worker dies mid-block

This is the Snowflake idea inverted: amortize the coordination cost over many IDs. Used by Postgres cache_size on sequences and many internal systems.

Data Model

Worker Registry (Zookeeper / etcd)

Text
---------- ZK node layout ----------
/idgen/workers/0         ephemeral, owner=app-server-23
/idgen/workers/1         ephemeral, owner=app-server-77
...
/idgen/workers/1023      ephemeral

On process startup, attempt to create the lowest-numbered absent worker node. The first to succeed gets that worker_id. On process crash, the ephemeral node disappears (after session timeout, ~10 s) and the ID is available for reassignment after a cool-down.

Optional Audit Trail

SQL
CREATE TABLE id_workers (
    worker_id       INT PRIMARY KEY,
    last_owner      VARCHAR(64),
    last_assigned   TIMESTAMPTZ,
    last_released   TIMESTAMPTZ,
    cool_down_until TIMESTAMPTZ
);

Tracks history for ops; not on the hot path.

Persisted Last Timestamp (Defensive)

For extra safety against clock rollback, the worker can persist last_ts periodically (every 100 ms) to local disk. On restart, refuse to issue IDs until now() > last_ts_persisted + safety_margin.

SQL
-- Local SQLite per worker (optional)
CREATE TABLE snowflake_state (
    worker_id INT PRIMARY KEY,
    last_ts   BIGINT NOT NULL
);

This catches the case where a VM reboots after a clock change; the process refuses to issue IDs until time has moved forward.

Scaling and Bottlenecks

Single Worker Throughput

4096 IDs/ms = 4M/s. A single worker exceeding this is rare; if it does, the algorithm waits for the next ms (effectively 4M/s ceiling). Most app processes generate 100-10K IDs/s, far below the ceiling.

Fleet Throughput

1024 workers * 4M/s = ~4 billion IDs/s. More than any system needs.

Worker ID Exhaustion

If you have >1024 worker processes, you outgrow standard Snowflake. Options:

  • Repurpose datacenter bits (e.g., 5 dc + 5 worker = 32 dc * 32 = 1024) but you can have many fewer total IDs/ms.
  • Reduce sequence bits and add worker bits.
  • Use UUIDv7 instead.

Discord hit this at 5+ datacenters and split worker_id into datacenter + worker_in_dc.

Clock Skew Recovery

If a worker detects clock rollback >1 s, it refuses to issue IDs and alerts. Operations:

  1. Page on-call.
  2. Drain the affected worker (move traffic to others).
  3. Investigate NTP / hardware clock.
  4. Restart the worker once time is monotonic.

Most incidents resolve in <30 minutes; capacity is sized so 1-2 workers offline is invisible.

Adoption Path

Most systems start with database autoincrement. When you outgrow a single DB or need decentralized writes, switch to Snowflake. For greenfield modern systems, UUIDv7 is the simplest path: standard format, decentralized, roughly ordered; you give up the 64-bit compactness.

Trade-offs and Alternatives

Snowflake vs UUIDv4

UUIDv4 wins on simplicity (no infra, no coordination, just call uuid4()). Snowflake wins on size (8 vs 16 bytes), ordering (better index locality, easier 'recent' queries), and the ability to extract the timestamp from an ID for debugging. For high-throughput systems, the ordering and size benefits compound.

Snowflake vs UUIDv7

UUIDv7 is the modern alternative: 128 bits, standardized, roughly ordered, no coordination needed. Snowflake is more compact (64 bits) but requires worker_id assignment infrastructure. For new projects, UUIDv7 is simpler; for systems where every byte matters (high-volume primary keys), Snowflake is still worth it.

Snowflake vs database AUTOINCREMENT

AUTOINCREMENT is fine when you have one logical database. It becomes a bottleneck when you scale writes to multiple primaries (sharding) because each shard has its own counter and IDs collide. Snowflake works across sharded writes by construction.

Strict ordering vs roughly ordered

Snowflake IDs from the same worker are strictly ordered. Across workers, they are only roughly ordered (within the same ms, worker_id determines order, not arrival order). For most use cases (timeline of recent posts) roughly ordered is fine. For strict ordering you need a single counter (or Lamport timestamps), which is hard at scale.

Time-ordering vs randomness for security

If IDs are guessable (Snowflake's timestamp is visible in the high bits), an attacker can enumerate recent IDs to scrape user data or guess pending records. For public-facing IDs (e.g., URLs), use a separate slug or hash; reserve Snowflake for internal IDs only.

Centralized ID service vs embedded library

Centralized: one fleet to operate, but adds a network call per ID (1-5 ms latency, single point of failure). Embedded library: zero call overhead but every app must include the library and handle worker_id assignment. Embedded wins for any high-throughput system.

Range allocator vs Snowflake

Range allocator works when you have a centralized authority that hands out blocks (e.g., one service per shard). Snowflake works when you do not (any worker can generate without external state once it has its worker_id). For Postgres-style sequences, range allocator is the standard implementation; for distributed app fleets, Snowflake is the standard.

Real-World Examples

How real systems implement this in production

Twitter Snowflake

Twitter open-sourced Snowflake in 2010 to generate tweet IDs across thousands of servers without coordination on the hot path. The bit layout (41 ts + 10 worker + 12 seq) became the industry default and is the inspiration for Discord, Sony, and Mastodon ID schemes.

Trade-off: Snowflake gives compact, time-ordered IDs at the cost of running infrastructure for worker_id assignment; Twitter eventually retired the standalone Snowflake service in favor of an embedded-library variant for lower latency.

Instagram sharded sequences

Instagram uses a hybrid: each Postgres shard has an auto-increment sequence; the high bits of the ID encode the shard, allowing globally unique IDs without a central service. They allocate 'ID blocks' per shard via PL/pgSQL functions.

Trade-off: Per-shard sequences are simpler to reason about than Snowflake but tie ID generation to database availability; if the shard is down, you cannot issue IDs for that shard's writes.

Discord Snowflake

Discord uses Snowflake-style IDs but split the 10 worker bits into 5 datacenter + 5 worker to support multi-DC deployment (32 datacenters, 32 workers each). Their snowflake epoch is 2015-01-01 (the year Discord launched).

Trade-off: Splitting worker bits into datacenter + worker lets Discord scale horizontally across DCs but caps each DC at 32 ID-generating workers; they offset by running multiple ID-generating processes per host.

MongoDB ObjectID

MongoDB ObjectIDs are 96 bits: 32-bit timestamp (seconds) + 24-bit machine ID + 16-bit process ID + 24-bit counter. Generated client-side by the driver, requiring no server round trip.

Trade-off: 96 bits gives more headroom than Snowflake (8 thousand years from 1970) but is 50% larger than 64-bit IDs in storage and indexes; the second-resolution timestamp also gives weaker ordering than Snowflake's millisecond resolution.

Quick Interview Phrases

Key terms to use in your answer

Snowflake bit layout: timestamp + worker_id + sequence
Refuse to issue IDs on clock rollback
Zookeeper ephemeral nodes for worker_id assignment
UUIDv7 for decentralized roughly-ordered alternative
Range allocator amortizes coordination over a block
64 bits beats 128 for index locality and storage

Common Interview Questions

Questions you might be asked about this topic

App process starts; library connects to Zookeeper, creates the lowest-numbered ephemeral node under /idgen/workers/, gets worker_id (e.g., 42). Library initializes with epoch (e.g., 2020-01-01), worker_id=42, sequence=0, last_ts=-1. On nextId(): compute ts = now() - epoch in ms. If ts < last_ts: throw (clock moved backwards). If ts == last_ts: increment sequence; if sequence overflows 12 bits, busy-wait until next ms. If ts > last_ts: reset sequence to 0. Compose: id = (ts << 22) | (worker_id << 12) | sequence. Update last_ts = ts. Return 64-bit ID. Total wall time: <1 microsecond when not waiting on clock.

Interview Tips

How to discuss this topic effectively

1

Open with the trade-off triangle: uniqueness, ordering, coordination cost. Saying 'we want all three; the cheapest path is local generation with a small per-process worker_id' frames the design.

2

Explain the Snowflake bit layout from first principles. Walk through 41 ts + 10 worker + 12 seq = 64 bits total, what each gives you, and what happens if you reallocate. This shows you can reason about bit budgets, not just memorize the layout.

3

Bring up clock skew explicitly. Saying 'NTP can step the clock backwards; we detect and refuse to issue IDs, configure NTP slew-only mode' is the operational signal that distinguishes a senior answer.

4

Mention UUIDv7 as the modern alternative. Many candidates only know UUIDv4 vs Snowflake; knowing UUIDv7 (timestamp-prefixed UUID, standardized 2022) shows you keep current.

5

Quantify storage. 'For 10 billion rows, 64-bit IDs save ~80 GB in just the primary key column vs UUID' is the kind of concrete number that lands.

Common Mistakes

Pitfalls to avoid in interviews

Suggesting UUIDv4 as the primary key for a high-throughput sharded database

UUIDv4 is random with no time component, so inserts hit random pages of a B-tree, causing page splits and poor cache locality. Postgres benchmarks show 5-10x slower insert rates with UUIDv4 keys vs sequential. Use Snowflake or UUIDv7 (time-prefixed) for high-throughput tables; use UUIDv4 only when ordering is irrelevant or for security-sensitive public IDs.

Ignoring clock skew in the Snowflake design

NTP can step the clock backwards (leap seconds, misconfigured clock fixing itself, VM time drift). If now() < last_ts, the next ID could collide with one already issued. The fix is to detect rollback and refuse to issue IDs (raise an alert), and configure the OS to use NTP slew-only mode (no step adjustments). Many production Snowflake bugs are clock-skew related.

Trying to make Snowflake IDs strictly globally ordered

Snowflake guarantees strict ordering only WITHIN a single worker. Across workers, two IDs in the same ms compare by worker_id, not by physical wall-clock arrival. For most use cases this is fine; if you need strict global ordering you need a single counter or hybrid logical clocks, which has fundamental scalability limits.

Using a centralized ID service in the request hot path

A centralized service adds a network call per ID. At 1M IDs/sec across 100 servers, that is 1M extra RPCs/sec to one service. Use an embedded library that generates locally with a worker_id obtained at startup; the centralized component (Zookeeper) is consulted only when a worker starts, not per ID.

Hardcoding worker IDs in config

Hardcoding worker_id per app server forces you to manage 1024 unique IDs by hand and breaks blue-green deploys (two processes claim the same ID transiently). Use Zookeeper or etcd ephemeral nodes: the process claims the lowest unused ID at startup, releases on shutdown, with a cool-down before reassignment.