System Design Article
Design a URL Shortener (TinyURL)
Difficulty: Easy
Design a URL shortening service like TinyURL or bit.ly that maps a long URL to a 7 character code, redirects clicks in under 50 ms, and survives a 100:1 read-to-write ratio. This lesson walks through capacity estimation, the choice between counter based and hash based key generation, the database split between a key store and an analytics store, and the caching strategy that lets a single mid-tier service handle 10K redirects per second on commodity hardware.
Design a URL Shortener (TinyURL)
Design a URL shortening service like TinyURL or bit.ly that maps a long URL to a 7 character code, redirects clicks in under 50 ms, and survives a 100:1 read-to-write ratio. This lesson walks through capacity estimation, the choice between counter based and hash based key generation, the database split between a key store and an analytics store, and the caching strategy that lets a single mid-tier service handle 10K redirects per second on commodity hardware.
690 views
4
Requirements
Before writing code, pin down what the system must and must not do. In a real interview, ask clarifying questions; here we assume the standard product.
Functional Requirements
- Shorten a long URL into a short URL of the form
https://tiny.url/aZbK9eR. - Redirect any short URL to the original long URL with HTTP 301 (or 302).
- Custom alias support: a user can request
tiny.url/my-talkif it is free. - Expiration support: links can have a TTL (default: never expire).
- Basic analytics: count of clicks per short URL, plus optional country and referrer breakdowns.
Non-Functional Requirements
- Highly available - 99.99% (about 53 minutes of downtime per year). A broken redirect kills user trust.
- Low latency - p99 redirect under 50 ms globally. Users notice anything slower than 100 ms.
- Scalable - 100M new URLs per month, 10B existing URLs after 5 years.
- Read-heavy - 100:1 read to write ratio. Optimize for reads.
- Unguessable short codes - codes should not be sequential, otherwise scraping is trivial.
- Eventual consistency on analytics is fine; the redirect itself must be correct.
Out of Scope (state explicitly in the interview)
- Full user auth and dashboards (assume an API key per account).
- Spam and malware detection (real services use Google Safe Browsing; we'd integrate but not design it here).
- Link previews / Open Graph scraping.
Back-of-the-Envelope Estimation
Numbers are the difference between a junior and senior interview. Always do the math out loud.
Traffic
---------- Traffic estimation ----------
New URLs per month: 100M
Days per month: ~30
Writes per second: 100M / (30 * 86400) ~= 38 writes/sec
Reads per second: 38 * 100 ~= 3,800 reads/sec
Peak (3x average): ~11,500 reads/secWriting at 40 QPS is trivial. The story is about the read path: 12K QPS sustained, with bursts higher when a popular link goes viral.
Storage
Assume:
- Long URL: average 500 bytes (URLs can be huge with query strings).
- Short code: 7 bytes.
- Metadata (created_at, expires_at, owner_id, click_count): ~50 bytes.
- Total per row: ~600 bytes.
---------- Storage over 5 years ----------
New URLs per year: 100M * 12 = 1.2B
New URLs in 5 years: 1.2B * 5 = 6B
Storage: 6B * 600 bytes = 3.6 TB
Plus replicas (3x): = 10.8 TB
Plus indexes (~30%): = 14 TBFits comfortably in a few sharded databases. Not the bottleneck.
Bandwidth
---------- Bandwidth ----------
Write bandwidth: 40 * 600 bytes = 24 KB/s (negligible)
Read bandwidth: 12,000 * 500 bytes = 6 MB/s (just URL metadata, the redirect is a header)The redirect is a 301 with a Location: header. The browser then fetches the long URL directly from the destination. We never proxy content.
Cache Memory
Reads follow a power law (a few links get millions of clicks; most get few). 80% of reads hit the top 20% of URLs. Cache the top 1% in memory:
---------- Cache sizing ----------
Top 1% of 6B URLs: 60M URLs
Per-entry size: ~600 bytes (long URL + key)
Total cache memory: 60M * 600 = ~36 GBThis fits in a single Redis cluster (or 2-3 nodes for headroom and HA). With a 95% cache hit rate, the database sees only ~600 QPS of reads, which a single replica can serve.
Short Code Length
With base62 (a-z, A-Z, 0-9 = 62 characters):
- 6 chars: 62^6 = ~57 billion
- 7 chars: 62^7 = ~3.5 trillion
- 8 chars: 62^8 = ~218 trillion
For a 5 year horizon at 100M URLs/month (= 6B total) with collision room, 7 chars is the sweet spot.
High-Level Design
---------- High-level architecture ----------
+-------------+
| Client |
+-------------+
|
v
+--------------+
| Anycast DNS |
+--------------+
|
v
+----------------------+
| CDN / Edge POP | <- caches popular 301s
+----------------------+
|
v
+----------------------+
| Load Balancer |
+----------------------+
| |
v v
+-----------------+ +------------------+
| Write Service | | Read Service |
| (POST /shorten)| | (GET /:code) |
+-----------------+ +------------------+
| | |
v v v
+-----------------+ +-----------+ +----------+
| ID Generator | | Redis | | Kafka |
| (Snowflake) | | Cache | | (clicks) |
+-----------------+ +-----------+ +----------+
\ / |
v v v
+------------------+ +-------------------+
| Key-Value Store | | Analytics Store |
| (Cassandra) | | (Clickhouse) |
+------------------+ +-------------------+API Design
Keep the API surface tiny. Two endpoints do 99% of the work.
// Create a short URL
POST /api/v1/shorten
Authorization: Bearer <api_key>
Content-Type: application/json
{
"long_url": "https://example.com/very/long/path?with=params",
"custom_alias": "my-talk", // optional
"expires_at": "2026-12-31T23:59:59Z" // optional
}
// Response 201 Created
{
"short_url": "https://tiny.url/aZbK9eR",
"short_code": "aZbK9eR",
"long_url": "https://example.com/very/long/path?with=params",
"created_at": "2026-04-26T10:00:00Z",
"expires_at": null
}// Redirect (the hot path)
GET /:short_code
// Response 301 Moved Permanently
Location: https://example.com/very/long/path?with=params
Cache-Control: public, max-age=3600// Optional: get analytics
GET /api/v1/links/:short_code/stats
// Response 200 OK
{
"short_code": "aZbK9eR",
"total_clicks": 12345,
"clicks_last_24h": 612,
"top_countries": ["US", "DE", "BR"]
}The Redirect Path (Hot Path)
This runs ~12K times per second. Make it boring and fast.
- CDN edge checks its own cache for the short code; if hit, return 301 immediately. (~5 ms RTT.)
- CDN miss -> request hits a regional Read Service.
- Read Service checks Redis (cache-aside). 95%+ hit rate.
- Cache miss -> read from Cassandra by primary key (the short code), fill cache with TTL = 1 hour.
- Return 301 to client. Asynchronously publish a click event to Kafka (fire-and-forget; the redirect does NOT wait).
---------- Redirect latency budget (p99 < 50 ms) ----------
Network (client -> edge): ~10 ms
CDN cache hit: ~2 ms
-> total: ~12 ms (best case)
If CDN miss:
Network (edge -> region): ~10 ms
Redis hit: ~1 ms
Return: ~2 ms
-> total: ~25 ms (typical)
If Redis miss:
Cassandra read: ~5 ms
Cache fill + return: ~2 ms
-> total: ~32 ms (worst common case)Detailed Design
The two interesting components are key generation and the cache layer. Both have non-obvious failure modes.
Key Generation: Three Strategies
This is the design decision interviewers love to drill on. There is no universally right answer; defend your choice.
Strategy 1: Hash the long URL (e.g., MD5/SHA, take first 7 chars)
function shortenByHash(longUrl) {
const hash = sha256(longUrl + salt); // 256-bit
const code = base62Encode(hash).slice(0, 7);
// collision check: if code exists with a different long URL, retry with longer slice
return code;
}Pros: Idempotent. Same URL always gets the same code (saves storage, cute deduplication).
Cons: Collisions are real at scale. With 6B URLs in a 3.5T space, the birthday-problem collision probability is non-trivial; you must SELECT before INSERT for every write, doubling DB ops on the write path. Custom aliases also need a separate code path.
Strategy 2: Counter + base62 (Twitter Snowflake style)
A central or distributed counter generates a monotonically increasing 64-bit ID. Encode it in base62.
function shortenByCounter(longUrl) {
const id = snowflake.nextId(); // 64-bit unique ID
const code = base62Encode(id); // 11 chars max for 64-bit
db.put(code, longUrl); // never collides
return code;
}Pros: No collisions, ever. Single insert per URL. Simple.
Cons: Codes are guessable and incrementing; scrapers can walk the keyspace. Fix: XOR with a per-deploy secret, or shuffle bits, before base62 encoding. Now the codes look random while the underlying counter is still unique.
Strategy 3: Pre-generate a pool of random codes
A background job generates millions of random base62 strings into a unused_codes table. The Write Service pops one at insert time.
function shorten(longUrl) {
const code = db.popUnusedCode(); // atomic SELECT + DELETE
db.put(code, longUrl);
return code;
}Pros: Codes are unguessable. No collisions (the generator dedupes when filling the pool). Predictable write latency.
Cons: Requires a background generator that never falls behind. The popUnusedCode operation must be atomic (SELECT FOR UPDATE in SQL, or atomic dequeue from Redis).
The Decision
For an interview, Strategy 2 with a secret XOR is the strongest answer:
- Constant-time generation (no collision check).
- Unguessable codes after the XOR shuffle.
- 64-bit IDs are already unique by construction.
- Snowflake gives you free distributed generation across regions.
The Cache Layer
The cache is what lets a single mid-tier service handle 12K reads/sec without melting Cassandra. Two questions matter: what to cache, and how to invalidate.
What to Cache
Cache the short_code to long_url mapping with an LRU eviction policy. Set TTL = 1 hour (long enough for hot URLs to stay hot, short enough that an updated/expired URL isn't served stale forever).
Cache-Aside Pattern
async function lookup(shortCode) {
const cached = await redis.get(`url:${shortCode}`);
if (cached) return cached; // 95% of requests stop here
const row = await db.get(shortCode); // 5% miss
if (!row) return null; // 404
await redis.setex(`url:${shortCode}`, 3600, row.long_url);
return row.long_url;
}Hot Key Problem
A single short URL going viral (1M QPS) can saturate the Redis shard that holds its key. Mitigation:
- Edge cache the 301 at the CDN with
Cache-Control: public, max-age=3600. The CDN absorbs the burst geographically. - Consistent hashing across multiple Redis nodes so the hot key isn't isolated to one node by misfortune (modulo sharding can pile multiple hot keys on one node).
- Local in-process cache with 60-second TTL on the Read Service. Even with 100 servers each caching the hot key, that's only 100 DB reads per minute even if Redis loses it.
Cache Invalidation on Delete/Update
If a user deletes a link or it expires, the cache may serve a stale 301. Strategies:
- Write-through: on delete, immediately
DEL url:<code>from Redis. Simple but doesn't help the CDN. - Short TTL: 1 hour cap on staleness; acceptable for most use cases.
- Soft delete + 410 Gone: store a
deleted_atflag; the read path checks it and returns 410, which CDNs cache too.
Data Model
We split the workload across two storage systems because they have very different access patterns.
Primary URL Store (Cassandra)
The redirect path is a simple key lookup with no joins. NoSQL wins.
---------- Cassandra schema ----------
CREATE TABLE urls (
short_code text PRIMARY KEY, -- partition key, also lookup key
long_url text,
created_at timestamp,
expires_at timestamp, -- nullable
owner_id uuid,
deleted_at timestamp -- soft delete
) WITH compaction = {'class': 'LeveledCompactionStrategy'};Partition key = short_code. This gives perfect read distribution (every key hashes uniformly), and reads are O(1). Writes are O(1) too. No secondary indexes needed for the hot path.
Why not a SQL database? PostgreSQL handles a few billion rows fine, but at 6B rows we'd need to shard manually. Cassandra shards transparently and gives 5 ms p99 reads at this scale. We don't need joins or transactions on the URL table.
Custom Alias Conflict Check
Custom aliases (tiny.url/my-talk) need a uniqueness check before insert. In Cassandra:
INSERT INTO urls (short_code, long_url, created_at, owner_id)
VALUES ('my-talk', 'https://...', toTimestamp(now()), <uuid>)
IF NOT EXISTS;The IF NOT EXISTS clause uses a Paxos-based lightweight transaction (about 4x slower than a normal write). That's fine because custom aliases are a small fraction of writes.
Click Events (Kafka -> Clickhouse)
Every redirect publishes a fire-and-forget event:
{
"short_code": "aZbK9eR",
"timestamp": 1714134000000,
"ip_country": "US",
"referrer": "https://twitter.com",
"user_agent": "Mozilla/5.0..."
}Kafka buffers bursts (a viral link can do 100K events/sec for a minute). A consumer batches into Clickhouse, which is a columnar OLAP store optimized for aggregations like SELECT short_code, COUNT(*) FROM clicks WHERE timestamp > now() - INTERVAL 24 HOUR GROUP BY short_code.
Why not increment a counter on the URL row? Two reasons. (1) Hot keys would melt the Cassandra row (10K writes/sec to one partition). (2) We want richer analytics later (geo, referrer, time bucket); a single counter throws all that away.
Scaling and Bottlenecks
The design above handles 12K reads/sec comfortably. What breaks at 10x and 100x?
10x growth (120K reads/sec)
- CDN cache hit ratio becomes the dominant factor. Tune
Cache-Controlto maximize edge hits. At 95% CDN hit rate, only 6K reads/sec touch your origin. - Add Redis nodes with consistent hashing. Memory grows linearly with hot URL count, not total reads.
- Read replicas on Cassandra; the write replica is fine for 400 writes/sec.
100x growth (1.2M reads/sec)
- Multi-region deployment. Geo-DNS routes users to the nearest region. Each region has its own Cassandra cluster (sharded by short_code) replicated across regions for durability.
- Active-active replication on Cassandra is built in (Cassandra is AP). Conflicts on the same key are extremely rare because keys are unique by construction.
- CDN tiered caching: edge -> regional shield -> origin. Reduces origin load by another 10x.
- Bypass the application for hot redirects: program the CDN (Cloudflare Workers, Lambda@Edge) to read from a global KV store like Cloudflare KV or DynamoDB Global Tables. The 301 never reaches your service.
Failure Modes to Anticipate
| Failure | Impact | Mitigation |
|---|---|---|
| Redis cluster down | All reads hit Cassandra; latency spikes from 25 ms to 50 ms | Cassandra can absorb it for short periods; auto-recover on Redis restart |
| Cassandra region partition | Some keys unreadable in that region | Multi-region replication, fail over reads to nearest region |
| Snowflake clock skew | Duplicate IDs possible | NTP sync + per-node sequence bits; reject writes if clock goes backward |
| Viral link hot shard | One Redis node at 100% CPU | CDN absorbs it; in-process LRU absorbs the rest |
| Kafka consumer lag | Analytics delayed, redirects unaffected | Add consumers; redirects don't depend on analytics path |
Trade-offs and Alternatives
Why Cassandra over MySQL/Postgres?
MySQL with sharding works but you build the sharding logic yourself. Cassandra ships sharding, replication, and tunable consistency. The downside is no joins or transactions, which we don't need here. Verdict: Cassandra for greenfield; Postgres + Vitess if you already run Postgres.
Why a separate analytics store?
We could put a counter on the URL row, but writes to a single hot row don't scale, and we lose dimensional analytics. Splitting also lets analytics fail (Kafka consumer dies, Clickhouse rebuilds) without affecting the redirect SLO. Verdict: always split the hot transactional path from analytics in any system that does both.
301 vs 302 redirect?
- 301 (permanent): browsers and CDNs cache aggressively. Lower load on your service, but if the underlying URL changes you cannot 'unredirect' a cached 301.
- 302 (temporary): browsers do not cache. Every click hits your service. Higher load, more accurate analytics.
Verdict: 301 with a Cache-Control: max-age=3600 is the correct production choice. Analytics from your service catch most of the traffic; the 1 hour cache window is acceptable staleness.
Counter-based vs hash-based keys (revisited)
Many interview blogs recommend hash-based because it 'deduplicates'. In practice, dedup saves 5-10% of storage at the cost of a SELECT on every write. For a 100M writes/month system, skipping that SELECT is worth more than the storage. Use counter-based with bit shuffling.
Pre-generated key pool: when does it win?
When you need codes that look truly random and cannot leak any ordering information (e.g., access tokens, share links for sensitive content). For URL shorteners, a XOR-shuffled snowflake ID looks random enough.
What if we needed strong consistency on click counts?
We wouldn't, for ad-tech style 'show clicks now' use cases. If we did (e.g., billing per click), we'd add an exactly-once Kafka consumer with idempotency keys and a transactional write to the analytics store. The redirect path remains unchanged - that's the point of decoupling.
Real-World Examples
How real systems implement this in production
The original commercial URL shortener. Uses a custom counter-based ID generator, MySQL for the link store, and a separate event pipeline for analytics. Serves billions of clicks per month with sub-50 ms global p99 redirects via Fastly CDN.
Trade-off: Bit.ly historically used MySQL with sharding because they started before NoSQL was mature. The lesson: pick the database you already operate well over the theoretically optimal one; sharding MySQL is harder than picking Cassandra fresh.
The pioneer of URL shortening, founded in 2002. Uses hash-based code generation with collision retry, and historically allowed user-chosen aliases. Much simpler architecture than bit.ly; proves that for moderate scale you do not need elaborate distributed systems.
Trade-off: Hash-based generation works fine when your scale is hundreds of QPS, not tens of thousands. The right design depends on your scale; do not over-engineer a hobby-scale service.
Twitter wraps every link posted on its platform with a t.co short URL. It runs at hundreds of thousands of QPS, integrates spam and phishing checks, and is colocated with Twitter's main infrastructure. Heavy use of edge caching and pre-resolved redirects in the Twitter app itself.
Trade-off: When the URL shortener is a feature inside a larger product, you can pre-resolve the link in the client and skip the redirect entirely for the most common cases. This is how Twitter handles 'click latency' on mobile.
Google's URL shortener with deep-linking (open the right app on the right OS, fall back to web). Adds a routing layer on top of the basic redirect to detect platform and cookie state. Sunset in 2025 in favor of standard App Links and Universal Links.
Trade-off: Adding intelligence (platform detection, A/B testing, conditional redirects) to the redirect path increases latency and operational complexity. Dynamic Links shows the limit: when redirects need to be smart, you sacrifice the simplicity that makes URL shorteners cheap to operate.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Use a conditional insert (Cassandra IF NOT EXISTS, Postgres unique constraint). The first commit wins; the second gets a 409 Conflict. Lightweight transactions are slow but fine for the small fraction of writes that are custom aliases. Pre-validate against the cache for friendly UX, then rely on the database for authoritative uniqueness.
First the CDN absorbs ~95% via the cached 301. The remaining 5K reach the regional load balancer, which routes to Read Services. Each Read Service has an in-process LRU with the hot key, so most are served from local memory. The few that miss go to Redis (one shard handles the key by consistent hashing). If Redis melts, fall back to Cassandra read replicas. Write a runbook: enable 'super hot key' mode that pins the entry in every Read Service's local cache.
Rate limit by API key at the load balancer (token bucket: 100 writes/min per key for free tier, higher for paid). Add CAPTCHA on the public 'shorten' form. Track per-account write velocity and auto-suspend accounts that exceed thresholds. Asynchronously scan new URLs against Google Safe Browsing API and quarantine matches.
Three levers. (1) Increase TTL from 1 hour to 24 hours; staleness is acceptable for non-deleted URLs. (2) Pre-warm cache: when a new URL is created, push it to the cache so its first read is a hit. (3) Use a Bloom filter on Read Services to skip Redis lookups for known-nonexistent codes (404s); this isn't a hit rate improvement but it removes one network round trip per miss.
Two parts. (1) Lazy expiry: on read, check expires_at and return 410 Gone if expired; this catches most cases. (2) Background sweeper: a daily job scans an expires_at index for keys to hard-delete and to invalidate in cache. In Cassandra you can use TTL on the row itself and let it expire automatically (no sweeper needed), but you lose the ability to render a friendly 'expired link' page; the read just returns 404.
Interview Tips
How to discuss this topic effectively
Lead with the read:write ratio (100:1). Every later decision (caching, replicas, CDN) flows from this single number.
Defend your key generation strategy explicitly. Saying 'I'd use a counter with XOR shuffle' is much stronger than 'I'd use a hash'.
Show the latency budget out loud. 'p99 50 ms means 10 ms network + 5 ms compute + 35 ms headroom.' Interviewers love numbers.
Always separate the redirect hot path from analytics. The interviewer is looking for the words 'fire-and-forget' or 'asynchronous' here.
Mention edge caching the 301 at the CDN. This is the single highest-leverage optimization and most candidates miss it.
Common Mistakes
Pitfalls to avoid in interviews
Using only MD5/SHA hash for key generation without considering collision overhead
Hash-based generation requires a SELECT before every INSERT to detect collisions. At scale this doubles your write load. Counter-based with bit shuffling avoids the collision check entirely while keeping codes unguessable.
Storing click counts as a column on the URL row
Hot URLs would generate thousands of writes per second to a single row, which kills any database. Decouple via a Kafka topic and aggregate in a columnar OLAP store. The redirect path never increments a counter.
Forgetting CDN edge caching of the 301 response
A 301 with Cache-Control: max-age=3600 lets the CDN absorb 90%+ of traffic geographically. Without it your origin handles every click, and a viral link can DDoS your own service.
Picking SQL with a single primary database 'because URLs need transactions'
URL lookups are independent key-value reads; there are no joins or multi-row transactions. Cassandra (or DynamoDB) shards transparently and gives 5 ms p99 reads at billions of rows. Use SQL only if you also need rich querying you cannot offload to analytics.
Treating the design as 'just a hash table' and skipping availability discussion
A URL shortener is a 99.99% availability service used as a critical dependency by other systems. You must discuss multi-region, replication, what happens during a regional outage, and how the CDN keeps redirects flowing even if your origin is down.
