System Design Article

Design a URL Shortener (TinyURL)

Difficulty: Easy

Design a URL shortening service like TinyURL or bit.ly that maps a long URL to a 7 character code, redirects clicks in under 50 ms, and survives a 100:1 read-to-write ratio. This lesson walks through capacity estimation, the choice between counter based and hash based key generation, the database split between a key store and an analytics store, and the caching strategy that lets a single mid-tier service handle 10K redirects per second on commodity hardware.

Design a URL Shortener (TinyURL)

System Design

Easy

design-tinyurl

url-shortener

case-study

social-content-platforms

base62-encoding

read-heavy

consistent-hashing

caching

cdn

system-design

beginner

free

690 views

Requirements

Before writing code, pin down what the system must and must not do. In a real interview, ask clarifying questions; here we assume the standard product.

Functional Requirements

Shorten a long URL into a short URL of the form https://tiny.url/aZbK9eR.
Redirect any short URL to the original long URL with HTTP 301 (or 302).
Custom alias support: a user can request tiny.url/my-talk if it is free.
Expiration support: links can have a TTL (default: never expire).
Basic analytics: count of clicks per short URL, plus optional country and referrer breakdowns.

Non-Functional Requirements

Highly available - 99.99% (about 53 minutes of downtime per year). A broken redirect kills user trust.
Low latency - p99 redirect under 50 ms globally. Users notice anything slower than 100 ms.
Scalable - 100M new URLs per month, 10B existing URLs after 5 years.
Read-heavy - 100:1 read to write ratio. Optimize for reads.
Unguessable short codes - codes should not be sequential, otherwise scraping is trivial.
Eventual consistency on analytics is fine; the redirect itself must be correct.

Out of Scope (state explicitly in the interview)

Full user auth and dashboards (assume an API key per account).
Spam and malware detection (real services use Google Safe Browsing; we'd integrate but not design it here).
Link previews / Open Graph scraping.

Back-of-the-Envelope Estimation

Numbers are the difference between a junior and senior interview. Always do the math out loud.

Traffic

Text

---------- Traffic estimation ----------
New URLs per month:    100M
Days per month:        ~30
Writes per second:     100M / (30 * 86400) ~= 38 writes/sec
Reads per second:      38 * 100         ~= 3,800 reads/sec

Peak (3x average):     ~11,500 reads/sec

Writing at 40 QPS is trivial. The story is about the read path: 12K QPS sustained, with bursts higher when a popular link goes viral.

Storage

Assume:

Long URL: average 500 bytes (URLs can be huge with query strings).
Short code: 7 bytes.
Metadata (created_at, expires_at, owner_id, click_count): ~50 bytes.
Total per row: ~600 bytes.

Text

---------- Storage over 5 years ----------
New URLs per year:     100M * 12      = 1.2B
New URLs in 5 years:   1.2B * 5       = 6B
Storage:               6B * 600 bytes = 3.6 TB

Plus replicas (3x):                   = 10.8 TB
Plus indexes (~30%):                  = 14 TB

Fits comfortably in a few sharded databases. Not the bottleneck.

Bandwidth

Text

---------- Bandwidth ----------
Write bandwidth:   40 * 600 bytes = 24 KB/s   (negligible)
Read bandwidth:    12,000 * 500 bytes = 6 MB/s (just URL metadata, the redirect is a header)

The redirect is a 301 with a Location: header. The browser then fetches the long URL directly from the destination. We never proxy content.

Cache Memory

Reads follow a power law (a few links get millions of clicks; most get few). 80% of reads hit the top 20% of URLs. Cache the top 1% in memory:

Text

---------- Cache sizing ----------
Top 1% of 6B URLs:        60M URLs
Per-entry size:           ~600 bytes (long URL + key)
Total cache memory:       60M * 600 = ~36 GB

This fits in a single Redis cluster (or 2-3 nodes for headroom and HA). With a 95% cache hit rate, the database sees only ~600 QPS of reads, which a single replica can serve.

Short Code Length

With base62 (a-z, A-Z, 0-9 = 62 characters):

6 chars: 62^6 = ~57 billion
7 chars: 62^7 = ~3.5 trillion
8 chars: 62^8 = ~218 trillion

For a 5 year horizon at 100M URLs/month (= 6B total) with collision room, 7 chars is the sweet spot.

High-Level Design

Text

---------- High-level architecture ----------
             +-------------+
             |   Client    |
             +-------------+
                    |
                    v
             +--------------+
             |  Anycast DNS |
             +--------------+
                    |
                    v
         +----------------------+
         |    CDN / Edge POP    |  <- caches popular 301s
         +----------------------+
                    |
                    v
         +----------------------+
         |    Load Balancer     |
         +----------------------+
              |             |
              v             v
   +-----------------+  +------------------+
   | Write Service   |  |  Read Service    |
   |  (POST /shorten)|  |  (GET /:code)    |
   +-----------------+  +------------------+
              |             |        |
              v             v        v
   +-----------------+  +-----------+ +----------+
   | ID Generator    |  |   Redis   | | Kafka    |
   | (Snowflake)     |  |   Cache   | | (clicks) |
   +-----------------+  +-----------+ +----------+
              \           /              |
               v         v               v
           +------------------+  +-------------------+
           |  Key-Value Store |  |  Analytics Store  |
           |  (Cassandra)     |  |  (Clickhouse)     |
           +------------------+  +-------------------+

API Design

Keep the API surface tiny. Two endpoints do 99% of the work.

Jsonc

// Create a short URL
POST /api/v1/shorten
Authorization: Bearer <api_key>
Content-Type: application/json

{
    "long_url": "https://example.com/very/long/path?with=params",
    "custom_alias": "my-talk",          // optional
    "expires_at": "2026-12-31T23:59:59Z" // optional
}

// Response 201 Created
{
    "short_url": "https://tiny.url/aZbK9eR",
    "short_code": "aZbK9eR",
    "long_url": "https://example.com/very/long/path?with=params",
    "created_at": "2026-04-26T10:00:00Z",
    "expires_at": null
}

Jsonc

// Redirect (the hot path)
GET /:short_code

// Response 301 Moved Permanently
Location: https://example.com/very/long/path?with=params
Cache-Control: public, max-age=3600

Jsonc

// Optional: get analytics
GET /api/v1/links/:short_code/stats

// Response 200 OK
{
    "short_code": "aZbK9eR",
    "total_clicks": 12345,
    "clicks_last_24h": 612,
    "top_countries": ["US", "DE", "BR"]
}

The Redirect Path (Hot Path)

This runs ~12K times per second. Make it boring and fast.

CDN edge checks its own cache for the short code; if hit, return 301 immediately. (~5 ms RTT.)
CDN miss -> request hits a regional Read Service.
Read Service checks Redis (cache-aside). 95%+ hit rate.
Cache miss -> read from Cassandra by primary key (the short code), fill cache with TTL = 1 hour.
Return 301 to client. Asynchronously publish a click event to Kafka (fire-and-forget; the redirect does NOT wait).

Text

---------- Redirect latency budget (p99 < 50 ms) ----------
Network (client -> edge):     ~10 ms
CDN cache hit:                ~2 ms
  -> total:                   ~12 ms (best case)

If CDN miss:
Network (edge -> region):     ~10 ms
Redis hit:                    ~1 ms
Return:                       ~2 ms
  -> total:                   ~25 ms (typical)

If Redis miss:
Cassandra read:               ~5 ms
Cache fill + return:          ~2 ms
  -> total:                   ~32 ms (worst common case)

Detailed Design

The two interesting components are key generation and the cache layer. Both have non-obvious failure modes.

Key Generation: Three Strategies

This is the design decision interviewers love to drill on. There is no universally right answer; defend your choice.

Strategy 1: Hash the long URL (e.g., MD5/SHA, take first 7 chars)

JavaScript

Python

function shortenByHash(longUrl) {
    const hash = sha256(longUrl + salt); // 256-bit
    const code = base62Encode(hash).slice(0, 7);
    // collision check: if code exists with a different long URL, retry with longer slice
    return code;
}

Pros: Idempotent. Same URL always gets the same code (saves storage, cute deduplication).

Cons: Collisions are real at scale. With 6B URLs in a 3.5T space, the birthday-problem collision probability is non-trivial; you must SELECT before INSERT for every write, doubling DB ops on the write path. Custom aliases also need a separate code path.

Strategy 2: Counter + base62 (Twitter Snowflake style)

A central or distributed counter generates a monotonically increasing 64-bit ID. Encode it in base62.

JavaScript

Python

function shortenByCounter(longUrl) {
    const id = snowflake.nextId();      // 64-bit unique ID
    const code = base62Encode(id);      // 11 chars max for 64-bit
    db.put(code, longUrl);              // never collides
    return code;
}

Pros: No collisions, ever. Single insert per URL. Simple.

Cons: Codes are guessable and incrementing; scrapers can walk the keyspace. Fix: XOR with a per-deploy secret, or shuffle bits, before base62 encoding. Now the codes look random while the underlying counter is still unique.

Strategy 3: Pre-generate a pool of random codes

A background job generates millions of random base62 strings into a unused_codes table. The Write Service pops one at insert time.

JavaScript

Python

function shorten(longUrl) {
    const code = db.popUnusedCode();    // atomic SELECT + DELETE
    db.put(code, longUrl);
    return code;
}

Pros: Codes are unguessable. No collisions (the generator dedupes when filling the pool). Predictable write latency.

Cons: Requires a background generator that never falls behind. The popUnusedCode operation must be atomic (SELECT FOR UPDATE in SQL, or atomic dequeue from Redis).

The Decision

For an interview, Strategy 2 with a secret XOR is the strongest answer:

Constant-time generation (no collision check).
Unguessable codes after the XOR shuffle.
64-bit IDs are already unique by construction.
Snowflake gives you free distributed generation across regions.

The Cache Layer

The cache is what lets a single mid-tier service handle 12K reads/sec without melting Cassandra. Two questions matter: what to cache, and how to invalidate.

What to Cache

Cache the short_code to long_url mapping with an LRU eviction policy. Set TTL = 1 hour (long enough for hot URLs to stay hot, short enough that an updated/expired URL isn't served stale forever).

Cache-Aside Pattern

JavaScript

Python

async function lookup(shortCode) {
    const cached = await redis.get(`url:${shortCode}`);
    if (cached) return cached;                       // 95% of requests stop here

    const row = await db.get(shortCode);             // 5% miss
    if (!row) return null;                           // 404

    await redis.setex(`url:${shortCode}`, 3600, row.long_url);
    return row.long_url;
}

Hot Key Problem

A single short URL going viral (1M QPS) can saturate the Redis shard that holds its key. Mitigation:

Edge cache the 301 at the CDN with Cache-Control: public, max-age=3600. The CDN absorbs the burst geographically.
Consistent hashing across multiple Redis nodes so the hot key isn't isolated to one node by misfortune (modulo sharding can pile multiple hot keys on one node).
Local in-process cache with 60-second TTL on the Read Service. Even with 100 servers each caching the hot key, that's only 100 DB reads per minute even if Redis loses it.

Cache Invalidation on Delete/Update

If a user deletes a link or it expires, the cache may serve a stale 301. Strategies:

Write-through: on delete, immediately DEL url:<code> from Redis. Simple but doesn't help the CDN.
Short TTL: 1 hour cap on staleness; acceptable for most use cases.
Soft delete + 410 Gone: store a deleted_at flag; the read path checks it and returns 410, which CDNs cache too.

Data Model

We split the workload across two storage systems because they have very different access patterns.

Primary URL Store (Cassandra)

The redirect path is a simple key lookup with no joins. NoSQL wins.

Text

---------- Cassandra schema ----------
CREATE TABLE urls (
    short_code   text PRIMARY KEY,    -- partition key, also lookup key
    long_url     text,
    created_at   timestamp,
    expires_at   timestamp,            -- nullable
    owner_id     uuid,
    deleted_at   timestamp             -- soft delete
) WITH compaction = {'class': 'LeveledCompactionStrategy'};

Partition key = short_code. This gives perfect read distribution (every key hashes uniformly), and reads are O(1). Writes are O(1) too. No secondary indexes needed for the hot path.

Why not a SQL database? PostgreSQL handles a few billion rows fine, but at 6B rows we'd need to shard manually. Cassandra shards transparently and gives 5 ms p99 reads at this scale. We don't need joins or transactions on the URL table.

Custom Alias Conflict Check

Custom aliases (tiny.url/my-talk) need a uniqueness check before insert. In Cassandra:

SQL

INSERT INTO urls (short_code, long_url, created_at, owner_id)
VALUES ('my-talk', 'https://...', toTimestamp(now()), <uuid>)
IF NOT EXISTS;

The IF NOT EXISTS clause uses a Paxos-based lightweight transaction (about 4x slower than a normal write). That's fine because custom aliases are a small fraction of writes.

Click Events (Kafka -> Clickhouse)

Every redirect publishes a fire-and-forget event:

Jsonc

{
    "short_code": "aZbK9eR",
    "timestamp": 1714134000000,
    "ip_country": "US",
    "referrer": "https://twitter.com",
    "user_agent": "Mozilla/5.0..."
}

Kafka buffers bursts (a viral link can do 100K events/sec for a minute). A consumer batches into Clickhouse, which is a columnar OLAP store optimized for aggregations like SELECT short_code, COUNT(*) FROM clicks WHERE timestamp > now() - INTERVAL 24 HOUR GROUP BY short_code.

Why not increment a counter on the URL row? Two reasons. (1) Hot keys would melt the Cassandra row (10K writes/sec to one partition). (2) We want richer analytics later (geo, referrer, time bucket); a single counter throws all that away.

Scaling and Bottlenecks

The design above handles 12K reads/sec comfortably. What breaks at 10x and 100x?

10x growth (120K reads/sec)

CDN cache hit ratio becomes the dominant factor. Tune Cache-Control to maximize edge hits. At 95% CDN hit rate, only 6K reads/sec touch your origin.
Add Redis nodes with consistent hashing. Memory grows linearly with hot URL count, not total reads.
Read replicas on Cassandra; the write replica is fine for 400 writes/sec.

100x growth (1.2M reads/sec)

Multi-region deployment. Geo-DNS routes users to the nearest region. Each region has its own Cassandra cluster (sharded by short_code) replicated across regions for durability.
Active-active replication on Cassandra is built in (Cassandra is AP). Conflicts on the same key are extremely rare because keys are unique by construction.
CDN tiered caching: edge -> regional shield -> origin. Reduces origin load by another 10x.
Bypass the application for hot redirects: program the CDN (Cloudflare Workers, Lambda@Edge) to read from a global KV store like Cloudflare KV or DynamoDB Global Tables. The 301 never reaches your service.

Failure Modes to Anticipate

Failure	Impact	Mitigation
Redis cluster down	All reads hit Cassandra; latency spikes from 25 ms to 50 ms	Cassandra can absorb it for short periods; auto-recover on Redis restart
Cassandra region partition	Some keys unreadable in that region	Multi-region replication, fail over reads to nearest region
Snowflake clock skew	Duplicate IDs possible	NTP sync + per-node sequence bits; reject writes if clock goes backward
Viral link hot shard	One Redis node at 100% CPU	CDN absorbs it; in-process LRU absorbs the rest
Kafka consumer lag	Analytics delayed, redirects unaffected	Add consumers; redirects don't depend on analytics path

Trade-offs and Alternatives

Why Cassandra over MySQL/Postgres?

MySQL with sharding works but you build the sharding logic yourself. Cassandra ships sharding, replication, and tunable consistency. The downside is no joins or transactions, which we don't need here. Verdict: Cassandra for greenfield; Postgres + Vitess if you already run Postgres.

Why a separate analytics store?

We could put a counter on the URL row, but writes to a single hot row don't scale, and we lose dimensional analytics. Splitting also lets analytics fail (Kafka consumer dies, Clickhouse rebuilds) without affecting the redirect SLO. Verdict: always split the hot transactional path from analytics in any system that does both.

301 vs 302 redirect?

301 (permanent): browsers and CDNs cache aggressively. Lower load on your service, but if the underlying URL changes you cannot 'unredirect' a cached 301.
302 (temporary): browsers do not cache. Every click hits your service. Higher load, more accurate analytics.

Verdict: 301 with a Cache-Control: max-age=3600 is the correct production choice. Analytics from your service catch most of the traffic; the 1 hour cache window is acceptable staleness.

Counter-based vs hash-based keys (revisited)

Many interview blogs recommend hash-based because it 'deduplicates'. In practice, dedup saves 5-10% of storage at the cost of a SELECT on every write. For a 100M writes/month system, skipping that SELECT is worth more than the storage. Use counter-based with bit shuffling.

Pre-generated key pool: when does it win?

When you need codes that look truly random and cannot leak any ordering information (e.g., access tokens, share links for sensitive content). For URL shorteners, a XOR-shuffled snowflake ID looks random enough.

What if we needed strong consistency on click counts?

We wouldn't, for ad-tech style 'show clicks now' use cases. If we did (e.g., billing per click), we'd add an exactly-once Kafka consumer with idempotency keys and a transactional write to the analytics store. The redirect path remains unchanged - that's the point of decoupling.

Real-World Examples

How real systems implement this in production

bit.ly

The original commercial URL shortener. Uses a custom counter-based ID generator, MySQL for the link store, and a separate event pipeline for analytics. Serves billions of clicks per month with sub-50 ms global p99 redirects via Fastly CDN.

Trade-off: Bit.ly historically used MySQL with sharding because they started before NoSQL was mature. The lesson: pick the database you already operate well over the theoretically optimal one; sharding MySQL is harder than picking Cassandra fresh.

TinyURL

The pioneer of URL shortening, founded in 2002. Uses hash-based code generation with collision retry, and historically allowed user-chosen aliases. Much simpler architecture than bit.ly; proves that for moderate scale you do not need elaborate distributed systems.

Trade-off: Hash-based generation works fine when your scale is hundreds of QPS, not tens of thousands. The right design depends on your scale; do not over-engineer a hobby-scale service.

Twitter t.co

Twitter wraps every link posted on its platform with a t.co short URL. It runs at hundreds of thousands of QPS, integrates spam and phishing checks, and is colocated with Twitter's main infrastructure. Heavy use of edge caching and pre-resolved redirects in the Twitter app itself.

Trade-off: When the URL shortener is a feature inside a larger product, you can pre-resolve the link in the client and skip the redirect entirely for the most common cases. This is how Twitter handles 'click latency' on mobile.

Firebase Dynamic Links

Google's URL shortener with deep-linking (open the right app on the right OS, fall back to web). Adds a routing layer on top of the basic redirect to detect platform and cookie state. Sunset in 2025 in favor of standard App Links and Universal Links.

Trade-off: Adding intelligence (platform detection, A/B testing, conditional redirects) to the redirect path increases latency and operational complexity. Dynamic Links shows the limit: when redirects need to be smart, you sacrifice the simplicity that makes URL shorteners cheap to operate.

Quick Interview Phrases

Key terms to use in your answer

base62 encoding

read-heavy workload

cache-aside pattern

hot key problem

snowflake ID generation

fire-and-forget analytics

Common Interview Questions

Questions you might be asked about this topic

How would you handle a custom alias request when two users ask for the same alias at the same time?

Use a conditional insert (Cassandra IF NOT EXISTS, Postgres unique constraint). The first commit wins; the second gets a 409 Conflict. Lightweight transactions are slow but fine for the small fraction of writes that are custom aliases. Pre-validate against the cache for friendly UX, then rely on the database for authoritative uniqueness.

A short URL goes viral and gets 100K requests per second. Walk me through what happens.

How do you prevent abuse: bots creating millions of short URLs?

Your design has 95% cache hit rate. How would you reach 99%?

How would you design link expiration without scanning the whole table?

Interview Tips

How to discuss this topic effectively

Lead with the read:write ratio (100:1). Every later decision (caching, replicas, CDN) flows from this single number.

Defend your key generation strategy explicitly. Saying 'I'd use a counter with XOR shuffle' is much stronger than 'I'd use a hash'.

Show the latency budget out loud. 'p99 50 ms means 10 ms network + 5 ms compute + 35 ms headroom.' Interviewers love numbers.

Always separate the redirect hot path from analytics. The interviewer is looking for the words 'fire-and-forget' or 'asynchronous' here.

Mention edge caching the 301 at the CDN. This is the single highest-leverage optimization and most candidates miss it.

Common Mistakes

Pitfalls to avoid in interviews

Using only MD5/SHA hash for key generation without considering collision overhead

Hash-based generation requires a SELECT before every INSERT to detect collisions. At scale this doubles your write load. Counter-based with bit shuffling avoids the collision check entirely while keeping codes unguessable.

Storing click counts as a column on the URL row

Hot URLs would generate thousands of writes per second to a single row, which kills any database. Decouple via a Kafka topic and aggregate in a columnar OLAP store. The redirect path never increments a counter.

Forgetting CDN edge caching of the 301 response

A 301 with Cache-Control: max-age=3600 lets the CDN absorb 90%+ of traffic geographically. Without it your origin handles every click, and a viral link can DDoS your own service.

Picking SQL with a single primary database 'because URLs need transactions'

URL lookups are independent key-value reads; there are no joins or multi-row transactions. Cassandra (or DynamoDB) shards transparently and gives 5 ms p99 reads at billions of rows. Use SQL only if you also need rich querying you cannot offload to analytics.

Treating the design as 'just a hash table' and skipping availability discussion

A URL shortener is a 99.99% availability service used as a critical dependency by other systems. You must discuss multi-region, replication, what happens during a regional outage, and how the CDN keeps redirects flowing even if your origin is down.

Back to System Design