System Design Article

Design a Content Delivery Network

Difficulty: Medium

Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).

Design a Content Delivery Network

System Design

Medium

design-cdn

case-study

infrastructure-storage

cdn

edge-caching

origin-shield

anycast

cache-invalidation

stale-while-revalidate

ddos-protection

system-design

intermediate

premium

865 views

Requirements

Functional Requirements

Cache static assets (images, JS, CSS, fonts, videos, downloads) at edge POPs near the user.
Pull-on-demand: on cache miss, fetch from origin, populate cache, deliver to user (lazy population).
Cache control via headers: respect Cache-Control: max-age, s-maxage, Vary, ETag, and Last-Modified.
Purge: customer can invalidate cached objects within seconds (e.g., after deploying a new version).
Stale-while-revalidate: serve stale content while asynchronously refreshing from origin.
TLS termination at edge: clients connect over HTTPS to the nearest POP; origin connection can be HTTP or HTTPS internally.
Compression and image optimization: gzip/brotli on text, modern image formats negotiated per client.
Geo / device routing: serve different content based on user country, device type.
DDoS absorption: large traffic floods are absorbed at the edge without reaching origin.
Edge functions: customer code runs at the edge to customize requests/responses (Cloudflare Workers, Lambda@Edge).

Out of Scope (state explicitly)

Origin storage (origin is the customer's own server or object storage; CDN doesn't host).
Live video transcoding (separate live-streaming product).
DNS authoritative service (we offer caching DNS, not zone hosting).
Application logic beyond simple edge functions.

Non-Functional Requirements

Cache hit ratio: 95%+ for static assets (well-tuned). Each percentage point off saves origin cost.
Edge latency: p99 < 30 ms first-byte for cached content; ideally < 10 ms.
Throughput: sustained 10 Tbps globally; bursts to 100+ Tbps during events (sport finals, product launches).
Availability: 99.99% globally; 99.9% per POP (POPs fail; traffic re-routes).
Purge latency: customer purge propagates to all POPs within ~1 minute (best case under 10 seconds).
DDoS capacity: absorb 10+ Tbps attacks at the edge without backend impact.

Back-of-the-Envelope Estimation

Footprint

Text

---------- Global footprint ----------
POPs:                       ~300 worldwide (top tier covers most of internet)
Servers per POP:            ~50 to 500 (depends on POP class)
Total edge servers:         ~50,000
Per server cache:           ~2 TB SSD
Total edge cache:           ~100 PB across the fleet

Traffic

Text

---------- Traffic ----------
Monthly egress:             100 PB
Daily peak Gbps:            ~30 Tbps
Objects served per day:     ~5 trillion
Unique objects in active rotation: ~10B
Most-popular object cached at every POP (300 copies)
Long-tail object cached at one regional shield only

Hit Ratio Math

Text

---------- Hit ratio impact ----------
Without CDN: 100% origin egress at $80/TB cloud egress = $8M/month
At 95% cache hit: 5% origin = $400K/month  (95% savings)
At 99% cache hit: 1% origin = $80K/month   (99% savings)
Every 1% of hit-rate gain saves $80K/month at this scale.

Cache Memory and Disk

Text

---------- Per-POP cache ----------
A Tier-1 POP (e.g., NYC) handles ~100 Gbps with 50 servers, 100 TB cache total.
Hot objects stay in RAM (~500 GB across the POP), warm on SSD (~100 TB).
Cold misses go to regional shield, then origin.

High-Level Design

Text

---------- Architecture overview ----------
   +-----------+
   |  User     |  (DNS resolves CDN domain to Anycast IP)
   +-----------+
        |
        v (BGP routes to nearest POP via Anycast)
   +------------------+
   |  Edge POP        | (TLS, cache check, edge functions)
   +------------------+
        |  miss
        v
   +-------------------+
   |  Regional Shield  | (one or two per geographic region; consolidates POP misses)
   +-------------------+
        |  miss
        v
   +-------------------+
   |  Origin Server    | (customer's S3 / load balancer)
   +-------------------+

Client -> Edge -> Regional Shield -> Origin. The Shield collapses fan-out so the origin sees one fetch per object even if 50 POPs miss simultaneously.

Routing: How Users Reach the Nearest POP

Anycast: the same IP is announced from every POP via BGP. The internet's routing table delivers each user's packets to the BGP-closest POP (usually network-closest to geographic-closest). No DNS magic required.

Text

---------- Anycast in action ----------
DNS lookup for cdn.example.com: returns 1.2.3.4 to ALL users globally.
User in Tokyo: their ISP's BGP table says '1.2.3.4 is reachable via Tokyo POP, 5 ms away'.
User in NYC: their ISP's BGP table says '1.2.3.4 is reachable via NYC POP, 5 ms away'.
No CDN-side routing decision; the global BGP table does the work.

Alternative: DNS-based routing (returns different IPs based on user location). Slower (DNS lookup overhead, less granular). Anycast is the modern default.

Cache Key

The cache key is what determines whether a request matches a cached object. Default: full URL.

Text

---------- Cache key with Vary ----------
URL: GET https://cdn.example.com/image.jpg
Key base: GET|cdn.example.com/image.jpg
If origin returns Vary: Accept-Encoding (gzip vs brotli):
  Key becomes: GET|cdn.example.com/image.jpg|Accept-Encoding=gzip
             or GET|cdn.example.com/image.jpg|Accept-Encoding=br
  (One cache entry per encoding)
If Vary: User-Agent:
  THOUSANDS of cache entries per object (every distinct UA string).
  This destroys cache hit rate.

Common misuse: customers set Vary: User-Agent to serve different content to different browsers; cache hit rate collapses because every UA string creates a separate entry. CDN config typically lets customers override Vary or normalize headers (e.g., 'classify UA into mobile/desktop/tablet' to keep the cache key small).

Detailed Design

The two interesting components are the cache hierarchy with origin shielding and purge / invalidation.

Cache Hierarchy: Edge, Regional Shield, Origin

Text

---------- Three-tier cache ----------
Tier 1: Edge POP cache
  - hit ratio for popular objects: 95%+
  - miss ratio: ~5%
  - misses go to Regional Shield

Tier 2: Regional Shield
  - one or two per region (US-East shield, EU-West shield, APAC shield)
  - consolidates misses from many edges
  - hit ratio for warm objects: ~80% (objects already pulled by another edge)
  - misses go to Origin

Tier 3: Origin
  - customer's server / S3 / load balancer
  - sees only the long tail of cold objects
  - typical origin offload: 99%+ at well-tuned setups

Why a shield? Without it, every edge POP independently fetches from origin on its own first miss. For a popular object, 300 POPs each fetch once = 300 origin requests for a single new object. With a shield, only the shield fetches; subsequent edge misses pull from the shield. Origin sees one request per object per region.

Text

---------- Cache fan-in math ----------
New object goes viral; 300 POPs all see misses simultaneously.
Without shield: origin gets 300 GET requests for the same object.
With shield: 300 edges request from regional shields (~3 of them); 3 shields request from origin; origin gets 3 requests.
Reduction: 100x.

Cache Lifecycle

An object enters the cache when an edge fetches it from origin (or shield). It stays cached until:

TTL expires (Cache-Control: max-age).
Eviction (LRU when SSD is full).
Explicit purge by the customer.

On TTL expiry, the next request triggers a revalidation: edge sends a conditional request to origin (If-None-Match: <etag>); origin returns 304 (still fresh, refresh TTL) or 200 (changed, replace cache).

Stale-While-Revalidate

A powerful pattern: serve stale content immediately while asynchronously fetching the fresh version.

Text

---------- SWR flow ----------
Cache-Control: max-age=3600, stale-while-revalidate=86400

Request at t=0:           cache hit, 100% fresh
Request at t=4000s:       cache hit (4000s past max-age but within SWR window)
                          - serve stale immediately
                          - asynchronously refetch from origin
                          - update cache with fresh response
Request at t=4001s:       cache hit, freshly updated

Benefits:

Tail latency stays low: every request is a hit (no origin wait).
Origin outage tolerance: even if origin is down, serve stale until SWR window expires.
Reduces thundering herd: only one revalidate per object even under high concurrent miss.

Purge: Hard and Soft

When a customer needs to evict a cached object before TTL (deployed new version), they call the purge API.

Hard Purge

Message propagates to every POP via the CDN's control plane (typically a global pub/sub like fastly's Varnish + custom or Cloudflare's Quicksilver). Each POP deletes the cache entry. Time to global propagation: typically 100ms to 60s depending on the CDN's design.

Text

---------- Hard purge propagation ----------
1. Customer: POST /purge with URL or tag.
2. Control plane writes the purge to a global event log.
3. Each POP subscribes to the log; on receipt, deletes the matching cache entries.
4. Subsequent requests at any POP miss and refetch.

Fastly's Varnish-based system claims ~150ms global purge; Cloudflare's Quicksilver is similar.

Soft Purge (Mark Stale)

More graceful: mark the object stale but don't delete. Subsequent requests serve stale-while-revalidate, fetching the fresh version asynchronously. No hit on the origin from the purge itself; clients never see slow first-byte.

Most customers use soft purge by default; hard purge is reserved for sensitive content (security fixes, takedowns).

Cache Tag Purge

Objects can be tagged at cache time (Surrogate-Key: product:42 product:hot). A tag purge invalidates every object with that tag. Useful for batch invalidations: 'purge everything related to product 42' rather than enumerating URLs.

Text

---------- Tag purge example ----------
When ingesting a response with header Surrogate-Key: product:42:
  index_entry: cache_key -> tags: ['product:42']
  reverse_index: tag 'product:42' -> [cache_key1, cache_key2, ...]
On purge tag 'product:42':
  walk reverse_index, delete or mark stale all cache_keys.

TLS Termination at Edge

Clients connect to the edge over TLS. The edge holds the customer's certificate (or a CDN-managed cert via SNI). After decryption, the cache logic operates on cleartext; the edge fetches origin over a separate connection (HTTP/2, HTTP/3, or HTTP/1.1 keep-alive pooled).

Text

---------- Connection pooling savings ----------
Without pooling: 1 TLS handshake per origin fetch (50ms each).
With pooling at the edge: 1 long-lived connection per origin per POP, reused for all misses.
Reduces origin handshake load 1000x.

For mTLS (origin requires client certs), the edge presents its own client cert; the origin trusts the CDN's cert and trusts that the CDN authenticated the user.

DDoS Absorption

A 5 Tbps DDoS attack hitting a single origin would obliterate it. Hitting a CDN, the attack is absorbed across hundreds of POPs simultaneously: each POP absorbs ~17 Gbps, well within the POP's NIC capacity. The cache hit ratio for legitimate traffic stays high; the attack traffic (typically static patterns, e.g., 'GET /' from millions of IPs) is filtered at the edge.

Defenses:

Per-IP rate limiting at the edge.
WAF rules (block known attack signatures).
Challenge for suspicious traffic (CAPTCHA, JS challenge).
Origin shielding ensures origin never sees the flood directly.

Programmatic Use

JavaScript

Python

// Customer purges a CDN URL after deploying a new version
async function purgeCdn(url, apiToken) {
    return fetch('https://api.cdn.com/v1/purge', {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${apiToken}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({ urls: [url], soft: true })
    });
}

Data Model

Per-Edge Cache Index

Text

---------- Edge cache layout ----------
In-memory hash table:
  cache_key -> { offset_on_disk, size, expires_at, etag, last_revalidated }
On-disk circular log:
  large append-only file; cache entries written sequentially
Tag reverse-index:
  tag -> set of cache_keys for fast tag purges

LRU eviction when disk fills (or LFU on some CDNs); evicted entries can be refetched from shield or origin.

Control Plane: Configuration

SQL

CREATE TABLE cdn_configs (
    customer_id      VARCHAR(64),
    domain           VARCHAR(255),
    origin_url       VARCHAR(512),
    cache_rules      JSONB,         -- per-path TTLs, behaviors
    purge_tokens     VARCHAR[],     -- API tokens for purge
    edge_functions   TEXT,          -- customer code
    cert_id          VARCHAR(64),
    PRIMARY KEY (customer_id, domain)
);

Config is pushed to all POPs via a global config distribution system (typically a dedicated event log with sub-second propagation, e.g., Cloudflare's Quicksilver).

Logs and Analytics

Every request is logged at the edge (request URL, status, bytes, cache hit/miss, latency). Logs aggregate centrally for analytics dashboards. At 5 trillion requests/day, log volume is ~50-500 TB/day; aggressive sampling and aggregation keep this affordable.

Scaling and Bottlenecks

Cache Disk vs Memory

Hot objects must serve from RAM (sub-millisecond). The active working set is typically the top few hundred GB per POP; the long tail goes to SSD (a few ms). Cold misses go upstream.

Cache Stampede on Cold Object

A suddenly viral object that's not yet cached: 1M requests/sec hit one POP, all miss, all go upstream. Without coordination, the regional shield gets 1M requests, the origin gets 1M.

Protection: request coalescing. The first miss is forwarded; subsequent misses wait for the in-flight request to complete and share its response. This is the cache equivalent of singleflight: one fetch, many waiters.

Text

---------- Request coalescing ----------
Edge gets request 1 for /viral.jpg: miss; start fetch from upstream; record key in 'in-flight' map.
Edge gets request 2 for /viral.jpg: miss; key is in-flight; subscribe to result.
Edge gets request 100,000 for /viral.jpg: same.
Upstream fetch completes, response written to cache, all subscribers get the response.
Result: 1 upstream request instead of 100,001.

Origin Capacity Planning

Origins commonly oversize for cache misses, but the long tail of misses can still saturate. CDNs offer 'always cache' modes that bypass TTL for specific paths, accept-once-cache-forever for static assets. Some workloads use prefetching: the CDN fetches new assets at deploy time so the first user request is a hit.

Multi-Region Origin

Origin can be multi-region (one in US, one in EU). The CDN selects the regional origin based on POP location (US POPs fetch from US origin). Reduces origin RTT and costs.

TLS Connection Limits

TLS handshakes are CPU-intensive (~10 ms each on commodity hardware). At 100K new HTTPS connections/sec to one POP, TLS CPU dominates. Mitigations: session resumption, TLS 1.3 (1-RTT or 0-RTT), connection pooling at edge.

POP Failure

If a POP goes down (power, fiber cut), Anycast withdraws the BGP announcement and traffic re-routes within seconds to neighboring POPs. Cache contents at the failed POP are lost; neighbors absorb the additional load with cache misses for objects they don't have. Within an hour, cache rebalances.

Streaming and Live Video

Live video adds segments every few seconds; CDN must cache new segments, expire old ones, and deliver chunks to many viewers. Origin pull through shield + edge works fine for VOD; live needs lower-latency strategies (multicast within POP, regional origin shields close to encoders).

Edge Compute Scaling

Customer code at the edge (Cloudflare Workers, Lambda@Edge) runs in lightweight isolates per request (V8, Wasm). Per-request cost ~1ms; throughput per CPU ~1000 requests/sec. Each POP must provision compute for peak load; idle compute is essentially free.

Trade-offs and Alternatives

Anycast vs DNS-Based Routing

Anycast: one IP, BGP picks the path; clients reach the network-closest POP automatically. Lower latency, no DNS round trip per resolution. Cons: BGP changes can cause transient routing anomalies (a session unexpectedly shifts to a different POP mid-flight); affects long-lived connections (rare for HTTP, problematic for WebSocket).

DNS-based: returns different IPs based on DNS-resolver location (GeoDNS). Coarser (resolves at the resolver, not at the user). Used historically; Anycast is now standard.

Hard vs Soft Purge

Hard purge instantly removes; soft purge marks stale and refetches asynchronously. Soft is gentler on origin and user latency; hard guarantees no stale content (necessary for legal takedowns, security patches). Most customer workflows default to soft.

Pull CDN vs Push CDN

Pull (lazy): edge fetches on first request. Easy to set up; cold starts impact first user. Default for most CDNs.

Push: customer pushes objects to all POPs upfront. Eliminates cold-start latency for scheduled launches (Black Friday, product release). More setup; usually used as a 'pre-warm' on top of pull.

Single Edge Tier vs Hierarchy with Shield

Flat edge (no shield) is simpler; every POP miss goes to origin. Hierarchy adds shield latency (~5-30ms per shield hop) but reduces origin load 10x-100x. For most customers, the latency cost is invisible (still under 100ms total), and the origin savings are huge.

Cache Key Granularity

Including query strings in the cache key respects per-query content (?lang=fr differs from ?lang=en); excluding them ignores parameters (every query string serves the same cached version). Get this wrong and either cache busts on irrelevant query strings (UTM tags) or serves wrong content per query. Modern CDNs let you specify which query keys are significant.

Stateful Edge vs Pure Cache

Pure caching CDNs serve cached content; some CDNs (Cloudflare, Fastly) added programmable edge runtimes for dynamic content. The trade-off: more flexibility (rewrite responses, A/B test at edge) vs operational complexity (customer code at the edge can introduce bugs that take down many POPs).

When to NOT Use a CDN

For truly low-volume, low-latency-sensitive workloads where cache hit rate is poor (every request unique), CDN adds latency without offload. Examples: highly personalized API responses, long-tail enterprise apps with thousands of users. Direct origin is fine; CDN is overkill.

Real-World Examples

How real systems implement this in production

Cloudflare

Cloudflare runs ~300 POPs and uses Anycast for routing plus their custom Quicksilver control plane for sub-second config and purge propagation. They serve significant fractions of global web traffic and pioneered pricing the CDN as part of a security/perf bundle. Workers (V8 isolates) provide programmable edge compute.

Trade-off: Cloudflare's Workers run in V8 isolates rather than full containers, getting per-request startup near zero but limiting language choice and the ability to use native libraries. The lesson: edge compute economics force novel runtime choices; the trade is flexibility for cold-start latency and density.

Akamai

Akamai is the original CDN, founded in 1998 and operating ~4000 edge nodes across thousands of small POPs (often inside ISPs). Their tiered distribution architecture (edge, parent, origin) was the canonical hierarchy that newer CDNs copied. Heavy use for high-bitrate video and downloads.

Trade-off: Akamai's deep ISP integration gives unmatched proximity to users but the operational complexity (4000 deployment sites with different network configurations) is enormous. The lesson: deeper edge presence improves performance but the operational tax compounds with each location; modern CDNs trade some proximity for simpler ops with fewer, larger POPs.

Fastly

Fastly built their CDN on Varnish with custom extensions, exposing the VCL configuration language to customers for fine-grained edge logic. Known for sub-second purge (~150 ms) and aggressive image optimization. Powers parts of GitHub, Stripe, Pinterest.

Trade-off: Fastly's VCL exposes more power than competitors but requires customers to learn a domain-specific language; mistakes can take down their site. The lesson: more powerful edge configuration unlocks more value but raises the floor of operational expertise required from customers.

AWS CloudFront

CloudFront is AWS's CDN, integrated with S3 (origin) and Lambda@Edge (compute). Smaller POP count than Cloudflare/Akamai but tight integration with the AWS ecosystem. Used heavily for delivering S3-backed assets, video, and software downloads.

Trade-off: CloudFront's tight AWS integration is convenient for AWS-centric architectures but creates lock-in; companies multi-cloud or off AWS often pick a third-party CDN for portability. The lesson: vendor integration is a feature for committed users and a liability for the portability-conscious; picking a CDN is partly an architectural commitment to a cloud.

Quick Interview Phrases

Key terms to use in your answer

Anycast routes users to the nearest POP

regional shield consolidates edge misses

stale-while-revalidate for graceful invalidation

soft purge by default, hard purge for takedowns

request coalescing prevents cache stampede

Vary header normalization preserves hit rate

Common Interview Questions

Questions you might be asked about this topic

Walk me through what happens when a user in Tokyo loads an image hosted on your CDN.

User's browser resolves cdn.example.com via DNS, gets a single Anycast IP (same for everyone globally). The user's ISP's BGP table directs the packets to the network-closest POP, which happens to be Tokyo (5 ms RTT). The Tokyo POP receives the HTTPS request, terminates TLS using the customer's cert (or a CDN-managed SNI cert). Edge cache lookup: hit on /image.jpg with a 1-hour TTL? Stream the bytes from local SSD; total time from Anycast routing to first byte: ~10 ms. If miss: edge sends request to its regional shield (Asia-Pacific shield, maybe Singapore, +30 ms). Shield checks its cache; if hit, returns to edge; if miss, shield fetches from origin (in US East, +150 ms one-way), populates shield cache, returns to edge, edge populates its cache and returns to user. Subsequent requests to the same Tokyo POP for the same URL serve from cache in 5-10 ms. Subsequent requests at OTHER Asian POPs miss locally but hit the shield, never bothering the origin.

How do you handle a customer who needs to invalidate a cached page within seconds after a deploy?

How do you keep cache hit rate high with HTTPS and varied client behaviors?

How does a CDN absorb a 5 Tbps DDoS attack?

Why use a regional shield instead of just edge POPs?

Interview Tips

How to discuss this topic effectively

Lead with hit ratio economics. Saying 'every 1% of cache hit rate at 100 PB scale saves $80K/month' frames why the architecture is worth all this engineering.

Explain Anycast routing explicitly. Many candidates default to GeoDNS; Anycast is faster, cheaper to operate, and what every modern CDN actually uses.

Bring up the regional shield. Without it, a viral object fetches origin once per POP; with it, once per region. The math (300x to 3x) is the senior insight.

Default to soft purge. Hard purge has its place but most workflows benefit from stale-while-revalidate; mentioning the difference shows operational maturity.

Talk about cache key Vary explicitly. Mismanaged Vary headers (`Vary: User-Agent`) destroy hit rate; mentioning normalization is a senior signal.

Common Mistakes

Pitfalls to avoid in interviews

Using `Vary: User-Agent` without normalization

Every distinct User-Agent string becomes a separate cache entry; hit rate collapses to near zero. Either normalize User-Agent at the edge into a few classes (mobile/desktop/tablet) or omit Vary entirely and serve one canonical version.

Designing without a regional shield

Without a shield, a viral object missing at every edge POP simultaneously sends one origin request per POP; the origin gets crushed. A shield consolidates these into one fetch per region, cutting origin load by 100x for popular content.

Using hard purge for routine deploys

Hard purge instantly drops cached objects; the next user request waits for the origin fetch (slow first-byte). Soft purge marks stale and serves stale-while-revalidate, so users see no latency hit and origin load stays smooth. Reserve hard purge for security/takedown scenarios.

Ignoring the cache stampede problem on cold popular objects

When a viral object first appears, every edge sees a miss simultaneously. Without request coalescing, all those misses go upstream. The fix is singleflight-style request coalescing: the first miss kicks off the upstream fetch; subsequent concurrent misses wait for that fetch to complete and share its response.

Serving HTTPS without connection pooling to origin

Each origin fetch over a fresh TCP+TLS connection costs ~50-100ms in handshakes. At a POP doing 10K origin fetches/second, that's millions of handshakes/day. Pool persistent HTTP/2 connections from edge to origin so handshakes amortize to near-zero per request.

Back to System Design