System Design Article
Design a Content Delivery Network
Difficulty: Medium
Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).
Design a Content Delivery Network
Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).
865 views
15
Requirements
Functional Requirements
- Cache static assets (images, JS, CSS, fonts, videos, downloads) at edge POPs near the user.
- Pull-on-demand: on cache miss, fetch from origin, populate cache, deliver to user (lazy population).
- Cache control via headers: respect
Cache-Control: max-age,s-maxage,Vary, ETag, andLast-Modified. - Purge: customer can invalidate cached objects within seconds (e.g., after deploying a new version).
- Stale-while-revalidate: serve stale content while asynchronously refreshing from origin.
- TLS termination at edge: clients connect over HTTPS to the nearest POP; origin connection can be HTTP or HTTPS internally.
- Compression and image optimization: gzip/brotli on text, modern image formats negotiated per client.
- Geo / device routing: serve different content based on user country, device type.
- DDoS absorption: large traffic floods are absorbed at the edge without reaching origin.
- Edge functions: customer code runs at the edge to customize requests/responses (Cloudflare Workers, Lambda@Edge).
Out of Scope (state explicitly)
- Origin storage (origin is the customer's own server or object storage; CDN doesn't host).
- Live video transcoding (separate live-streaming product).
- DNS authoritative service (we offer caching DNS, not zone hosting).
- Application logic beyond simple edge functions.
Non-Functional Requirements
- Cache hit ratio: 95%+ for static assets (well-tuned). Each percentage point off saves origin cost.
- Edge latency: p99 < 30 ms first-byte for cached content; ideally < 10 ms.
- Throughput: sustained 10 Tbps globally; bursts to 100+ Tbps during events (sport finals, product launches).
- Availability: 99.99% globally; 99.9% per POP (POPs fail; traffic re-routes).
- Purge latency: customer purge propagates to all POPs within ~1 minute (best case under 10 seconds).
- DDoS capacity: absorb 10+ Tbps attacks at the edge without backend impact.
Back-of-the-Envelope Estimation
Footprint
---------- Global footprint ----------
POPs: ~300 worldwide (top tier covers most of internet)
Servers per POP: ~50 to 500 (depends on POP class)
Total edge servers: ~50,000
Per server cache: ~2 TB SSD
Total edge cache: ~100 PB across the fleetTraffic
---------- Traffic ----------
Monthly egress: 100 PB
Daily peak Gbps: ~30 Tbps
Objects served per day: ~5 trillion
Unique objects in active rotation: ~10B
Most-popular object cached at every POP (300 copies)
Long-tail object cached at one regional shield onlyHit Ratio Math
---------- Hit ratio impact ----------
Without CDN: 100% origin egress at $80/TB cloud egress = $8M/month
At 95% cache hit: 5% origin = $400K/month (95% savings)
At 99% cache hit: 1% origin = $80K/month (99% savings)
Every 1% of hit-rate gain saves $80K/month at this scale.Cache Memory and Disk
---------- Per-POP cache ----------
A Tier-1 POP (e.g., NYC) handles ~100 Gbps with 50 servers, 100 TB cache total.
Hot objects stay in RAM (~500 GB across the POP), warm on SSD (~100 TB).
Cold misses go to regional shield, then origin.High-Level Design
---------- Architecture overview ----------
+-----------+
| User | (DNS resolves CDN domain to Anycast IP)
+-----------+
|
v (BGP routes to nearest POP via Anycast)
+------------------+
| Edge POP | (TLS, cache check, edge functions)
+------------------+
| miss
v
+-------------------+
| Regional Shield | (one or two per geographic region; consolidates POP misses)
+-------------------+
| miss
v
+-------------------+
| Origin Server | (customer's S3 / load balancer)
+-------------------+Client -> Edge -> Regional Shield -> Origin. The Shield collapses fan-out so the origin sees one fetch per object even if 50 POPs miss simultaneously.
Routing: How Users Reach the Nearest POP
Anycast: the same IP is announced from every POP via BGP. The internet's routing table delivers each user's packets to the BGP-closest POP (usually network-closest to geographic-closest). No DNS magic required.
---------- Anycast in action ----------
DNS lookup for cdn.example.com: returns 1.2.3.4 to ALL users globally.
User in Tokyo: their ISP's BGP table says '1.2.3.4 is reachable via Tokyo POP, 5 ms away'.
User in NYC: their ISP's BGP table says '1.2.3.4 is reachable via NYC POP, 5 ms away'.
No CDN-side routing decision; the global BGP table does the work.Alternative: DNS-based routing (returns different IPs based on user location). Slower (DNS lookup overhead, less granular). Anycast is the modern default.
Cache Key
The cache key is what determines whether a request matches a cached object. Default: full URL.
---------- Cache key with Vary ----------
URL: GET https://cdn.example.com/image.jpg
Key base: GET|cdn.example.com/image.jpg
If origin returns Vary: Accept-Encoding (gzip vs brotli):
Key becomes: GET|cdn.example.com/image.jpg|Accept-Encoding=gzip
or GET|cdn.example.com/image.jpg|Accept-Encoding=br
(One cache entry per encoding)
If Vary: User-Agent:
THOUSANDS of cache entries per object (every distinct UA string).
This destroys cache hit rate.Common misuse: customers set Vary: User-Agent to serve different content to different browsers; cache hit rate collapses because every UA string creates a separate entry. CDN config typically lets customers override Vary or normalize headers (e.g., 'classify UA into mobile/desktop/tablet' to keep the cache key small).
Detailed Design
The two interesting components are the cache hierarchy with origin shielding and purge / invalidation.
Cache Hierarchy: Edge, Regional Shield, Origin
---------- Three-tier cache ----------
Tier 1: Edge POP cache
- hit ratio for popular objects: 95%+
- miss ratio: ~5%
- misses go to Regional Shield
Tier 2: Regional Shield
- one or two per region (US-East shield, EU-West shield, APAC shield)
- consolidates misses from many edges
- hit ratio for warm objects: ~80% (objects already pulled by another edge)
- misses go to Origin
Tier 3: Origin
- customer's server / S3 / load balancer
- sees only the long tail of cold objects
- typical origin offload: 99%+ at well-tuned setupsWhy a shield? Without it, every edge POP independently fetches from origin on its own first miss. For a popular object, 300 POPs each fetch once = 300 origin requests for a single new object. With a shield, only the shield fetches; subsequent edge misses pull from the shield. Origin sees one request per object per region.
---------- Cache fan-in math ----------
New object goes viral; 300 POPs all see misses simultaneously.
Without shield: origin gets 300 GET requests for the same object.
With shield: 300 edges request from regional shields (~3 of them); 3 shields request from origin; origin gets 3 requests.
Reduction: 100x.Cache Lifecycle
An object enters the cache when an edge fetches it from origin (or shield). It stays cached until:
- TTL expires (
Cache-Control: max-age). - Eviction (LRU when SSD is full).
- Explicit purge by the customer.
On TTL expiry, the next request triggers a revalidation: edge sends a conditional request to origin (If-None-Match: <etag>); origin returns 304 (still fresh, refresh TTL) or 200 (changed, replace cache).
Stale-While-Revalidate
A powerful pattern: serve stale content immediately while asynchronously fetching the fresh version.
---------- SWR flow ----------
Cache-Control: max-age=3600, stale-while-revalidate=86400
Request at t=0: cache hit, 100% fresh
Request at t=4000s: cache hit (4000s past max-age but within SWR window)
- serve stale immediately
- asynchronously refetch from origin
- update cache with fresh response
Request at t=4001s: cache hit, freshly updatedBenefits:
- Tail latency stays low: every request is a hit (no origin wait).
- Origin outage tolerance: even if origin is down, serve stale until SWR window expires.
- Reduces thundering herd: only one revalidate per object even under high concurrent miss.
Purge: Hard and Soft
When a customer needs to evict a cached object before TTL (deployed new version), they call the purge API.
Hard Purge
Message propagates to every POP via the CDN's control plane (typically a global pub/sub like fastly's Varnish + custom or Cloudflare's Quicksilver). Each POP deletes the cache entry. Time to global propagation: typically 100ms to 60s depending on the CDN's design.
---------- Hard purge propagation ----------
1. Customer: POST /purge with URL or tag.
2. Control plane writes the purge to a global event log.
3. Each POP subscribes to the log; on receipt, deletes the matching cache entries.
4. Subsequent requests at any POP miss and refetch.Fastly's Varnish-based system claims ~150ms global purge; Cloudflare's Quicksilver is similar.
Soft Purge (Mark Stale)
More graceful: mark the object stale but don't delete. Subsequent requests serve stale-while-revalidate, fetching the fresh version asynchronously. No hit on the origin from the purge itself; clients never see slow first-byte.
Most customers use soft purge by default; hard purge is reserved for sensitive content (security fixes, takedowns).
Cache Tag Purge
Objects can be tagged at cache time (Surrogate-Key: product:42 product:hot). A tag purge invalidates every object with that tag. Useful for batch invalidations: 'purge everything related to product 42' rather than enumerating URLs.
---------- Tag purge example ----------
When ingesting a response with header Surrogate-Key: product:42:
index_entry: cache_key -> tags: ['product:42']
reverse_index: tag 'product:42' -> [cache_key1, cache_key2, ...]
On purge tag 'product:42':
walk reverse_index, delete or mark stale all cache_keys.TLS Termination at Edge
Clients connect to the edge over TLS. The edge holds the customer's certificate (or a CDN-managed cert via SNI). After decryption, the cache logic operates on cleartext; the edge fetches origin over a separate connection (HTTP/2, HTTP/3, or HTTP/1.1 keep-alive pooled).
---------- Connection pooling savings ----------
Without pooling: 1 TLS handshake per origin fetch (50ms each).
With pooling at the edge: 1 long-lived connection per origin per POP, reused for all misses.
Reduces origin handshake load 1000x.For mTLS (origin requires client certs), the edge presents its own client cert; the origin trusts the CDN's cert and trusts that the CDN authenticated the user.
DDoS Absorption
A 5 Tbps DDoS attack hitting a single origin would obliterate it. Hitting a CDN, the attack is absorbed across hundreds of POPs simultaneously: each POP absorbs ~17 Gbps, well within the POP's NIC capacity. The cache hit ratio for legitimate traffic stays high; the attack traffic (typically static patterns, e.g., 'GET /' from millions of IPs) is filtered at the edge.
Defenses:
- Per-IP rate limiting at the edge.
- WAF rules (block known attack signatures).
- Challenge for suspicious traffic (CAPTCHA, JS challenge).
- Origin shielding ensures origin never sees the flood directly.
Programmatic Use
// Customer purges a CDN URL after deploying a new version
async function purgeCdn(url, apiToken) {
return fetch('https://api.cdn.com/v1/purge', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ urls: [url], soft: true })
});
}Data Model
Per-Edge Cache Index
---------- Edge cache layout ----------
In-memory hash table:
cache_key -> { offset_on_disk, size, expires_at, etag, last_revalidated }
On-disk circular log:
large append-only file; cache entries written sequentially
Tag reverse-index:
tag -> set of cache_keys for fast tag purgesLRU eviction when disk fills (or LFU on some CDNs); evicted entries can be refetched from shield or origin.
Control Plane: Configuration
CREATE TABLE cdn_configs (
customer_id VARCHAR(64),
domain VARCHAR(255),
origin_url VARCHAR(512),
cache_rules JSONB, -- per-path TTLs, behaviors
purge_tokens VARCHAR[], -- API tokens for purge
edge_functions TEXT, -- customer code
cert_id VARCHAR(64),
PRIMARY KEY (customer_id, domain)
);Config is pushed to all POPs via a global config distribution system (typically a dedicated event log with sub-second propagation, e.g., Cloudflare's Quicksilver).
Logs and Analytics
Every request is logged at the edge (request URL, status, bytes, cache hit/miss, latency). Logs aggregate centrally for analytics dashboards. At 5 trillion requests/day, log volume is ~50-500 TB/day; aggressive sampling and aggregation keep this affordable.
Scaling and Bottlenecks
Cache Disk vs Memory
Hot objects must serve from RAM (sub-millisecond). The active working set is typically the top few hundred GB per POP; the long tail goes to SSD (a few ms). Cold misses go upstream.
Cache Stampede on Cold Object
A suddenly viral object that's not yet cached: 1M requests/sec hit one POP, all miss, all go upstream. Without coordination, the regional shield gets 1M requests, the origin gets 1M.
Protection: request coalescing. The first miss is forwarded; subsequent misses wait for the in-flight request to complete and share its response. This is the cache equivalent of singleflight: one fetch, many waiters.
---------- Request coalescing ----------
Edge gets request 1 for /viral.jpg: miss; start fetch from upstream; record key in 'in-flight' map.
Edge gets request 2 for /viral.jpg: miss; key is in-flight; subscribe to result.
Edge gets request 100,000 for /viral.jpg: same.
Upstream fetch completes, response written to cache, all subscribers get the response.
Result: 1 upstream request instead of 100,001.Origin Capacity Planning
Origins commonly oversize for cache misses, but the long tail of misses can still saturate. CDNs offer 'always cache' modes that bypass TTL for specific paths, accept-once-cache-forever for static assets. Some workloads use prefetching: the CDN fetches new assets at deploy time so the first user request is a hit.
Multi-Region Origin
Origin can be multi-region (one in US, one in EU). The CDN selects the regional origin based on POP location (US POPs fetch from US origin). Reduces origin RTT and costs.
TLS Connection Limits
TLS handshakes are CPU-intensive (~10 ms each on commodity hardware). At 100K new HTTPS connections/sec to one POP, TLS CPU dominates. Mitigations: session resumption, TLS 1.3 (1-RTT or 0-RTT), connection pooling at edge.
POP Failure
If a POP goes down (power, fiber cut), Anycast withdraws the BGP announcement and traffic re-routes within seconds to neighboring POPs. Cache contents at the failed POP are lost; neighbors absorb the additional load with cache misses for objects they don't have. Within an hour, cache rebalances.
Streaming and Live Video
Live video adds segments every few seconds; CDN must cache new segments, expire old ones, and deliver chunks to many viewers. Origin pull through shield + edge works fine for VOD; live needs lower-latency strategies (multicast within POP, regional origin shields close to encoders).
Edge Compute Scaling
Customer code at the edge (Cloudflare Workers, Lambda@Edge) runs in lightweight isolates per request (V8, Wasm). Per-request cost ~1ms; throughput per CPU ~1000 requests/sec. Each POP must provision compute for peak load; idle compute is essentially free.
Trade-offs and Alternatives
Anycast vs DNS-Based Routing
Anycast: one IP, BGP picks the path; clients reach the network-closest POP automatically. Lower latency, no DNS round trip per resolution. Cons: BGP changes can cause transient routing anomalies (a session unexpectedly shifts to a different POP mid-flight); affects long-lived connections (rare for HTTP, problematic for WebSocket).
DNS-based: returns different IPs based on DNS-resolver location (GeoDNS). Coarser (resolves at the resolver, not at the user). Used historically; Anycast is now standard.
Hard vs Soft Purge
Hard purge instantly removes; soft purge marks stale and refetches asynchronously. Soft is gentler on origin and user latency; hard guarantees no stale content (necessary for legal takedowns, security patches). Most customer workflows default to soft.
Pull CDN vs Push CDN
Pull (lazy): edge fetches on first request. Easy to set up; cold starts impact first user. Default for most CDNs.
Push: customer pushes objects to all POPs upfront. Eliminates cold-start latency for scheduled launches (Black Friday, product release). More setup; usually used as a 'pre-warm' on top of pull.
Single Edge Tier vs Hierarchy with Shield
Flat edge (no shield) is simpler; every POP miss goes to origin. Hierarchy adds shield latency (~5-30ms per shield hop) but reduces origin load 10x-100x. For most customers, the latency cost is invisible (still under 100ms total), and the origin savings are huge.
Cache Key Granularity
Including query strings in the cache key respects per-query content (?lang=fr differs from ?lang=en); excluding them ignores parameters (every query string serves the same cached version). Get this wrong and either cache busts on irrelevant query strings (UTM tags) or serves wrong content per query. Modern CDNs let you specify which query keys are significant.
Stateful Edge vs Pure Cache
Pure caching CDNs serve cached content; some CDNs (Cloudflare, Fastly) added programmable edge runtimes for dynamic content. The trade-off: more flexibility (rewrite responses, A/B test at edge) vs operational complexity (customer code at the edge can introduce bugs that take down many POPs).
When to NOT Use a CDN
For truly low-volume, low-latency-sensitive workloads where cache hit rate is poor (every request unique), CDN adds latency without offload. Examples: highly personalized API responses, long-tail enterprise apps with thousands of users. Direct origin is fine; CDN is overkill.
Real-World Examples
How real systems implement this in production
Cloudflare runs ~300 POPs and uses Anycast for routing plus their custom Quicksilver control plane for sub-second config and purge propagation. They serve significant fractions of global web traffic and pioneered pricing the CDN as part of a security/perf bundle. Workers (V8 isolates) provide programmable edge compute.
Trade-off: Cloudflare's Workers run in V8 isolates rather than full containers, getting per-request startup near zero but limiting language choice and the ability to use native libraries. The lesson: edge compute economics force novel runtime choices; the trade is flexibility for cold-start latency and density.
Akamai is the original CDN, founded in 1998 and operating ~4000 edge nodes across thousands of small POPs (often inside ISPs). Their tiered distribution architecture (edge, parent, origin) was the canonical hierarchy that newer CDNs copied. Heavy use for high-bitrate video and downloads.
Trade-off: Akamai's deep ISP integration gives unmatched proximity to users but the operational complexity (4000 deployment sites with different network configurations) is enormous. The lesson: deeper edge presence improves performance but the operational tax compounds with each location; modern CDNs trade some proximity for simpler ops with fewer, larger POPs.
Fastly built their CDN on Varnish with custom extensions, exposing the VCL configuration language to customers for fine-grained edge logic. Known for sub-second purge (~150 ms) and aggressive image optimization. Powers parts of GitHub, Stripe, Pinterest.
Trade-off: Fastly's VCL exposes more power than competitors but requires customers to learn a domain-specific language; mistakes can take down their site. The lesson: more powerful edge configuration unlocks more value but raises the floor of operational expertise required from customers.
CloudFront is AWS's CDN, integrated with S3 (origin) and Lambda@Edge (compute). Smaller POP count than Cloudflare/Akamai but tight integration with the AWS ecosystem. Used heavily for delivering S3-backed assets, video, and software downloads.
Trade-off: CloudFront's tight AWS integration is convenient for AWS-centric architectures but creates lock-in; companies multi-cloud or off AWS often pick a third-party CDN for portability. The lesson: vendor integration is a feature for committed users and a liability for the portability-conscious; picking a CDN is partly an architectural commitment to a cloud.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
User's browser resolves cdn.example.com via DNS, gets a single Anycast IP (same for everyone globally). The user's ISP's BGP table directs the packets to the network-closest POP, which happens to be Tokyo (5 ms RTT). The Tokyo POP receives the HTTPS request, terminates TLS using the customer's cert (or a CDN-managed SNI cert). Edge cache lookup: hit on /image.jpg with a 1-hour TTL? Stream the bytes from local SSD; total time from Anycast routing to first byte: ~10 ms. If miss: edge sends request to its regional shield (Asia-Pacific shield, maybe Singapore, +30 ms). Shield checks its cache; if hit, returns to edge; if miss, shield fetches from origin (in US East, +150 ms one-way), populates shield cache, returns to edge, edge populates its cache and returns to user. Subsequent requests to the same Tokyo POP for the same URL serve from cache in 5-10 ms. Subsequent requests at OTHER Asian POPs miss locally but hit the shield, never bothering the origin.
Customer calls purge API with the URL or tag. The CDN's control plane (e.g., Cloudflare's Quicksilver, Fastly's purge fabric on top of Varnish) writes the purge command to a global event log replicated to every POP within ~150 ms. Each POP processes the command: removes the cache entry (hard purge) or marks it stale (soft purge). Default is soft purge: subsequent requests serve stale-while-revalidate, asynchronously fetch fresh from origin, and update the cache. This avoids a miss-storm against the origin. For hard purge (security takedown), the next request misses everywhere and goes upstream; rate-limit how many concurrent hard purges a customer can issue to protect origin capacity. End-to-end: from purge API call to global propagation, ~1 second; users see fresh content immediately on soft purge (might briefly see stale during the short window) or after refresh on hard purge.
Several practices. (1) Cache key normalization: strip query string parameters that don't affect content (UTM tags, session ids), keep ones that do (?lang=fr). Customers often configure this per-path. (2) Vary header discipline: avoid Vary: User-Agent; if the origin needs to vary on device, normalize at the edge into a few classes (mobile/desktop/tablet) so the cache key has at most 3-4 variants. (3) Strip cookies from cache key for static assets (the cookie doesn't affect /logo.png). (4) Honor Cache-Control: public to enable caching even when Set-Cookie is present (with care). (5) Image optimization at edge negotiates format per Accept header but caches a small set of variants per image. (6) Compression: cache both gzip and brotli variants if Vary: Accept-Encoding (only 2-3 variants, not user-specific). Result: hit rate moves from ~50% (naive) to 95%+ for static workloads.
5 Tbps spread across 300 POPs is ~17 Gbps per POP, well within a POP's typical 100 Gbps NIC capacity. The attack hits the edge tier directly. Each POP applies multiple defenses: per-IP rate limiting (drop traffic from any IP exceeding 1000 req/sec); WAF rules (drop known attack signatures, e.g., requests with no User-Agent or with malicious headers); challenge suspicious traffic (CAPTCHA or JS challenge for sources behaving like bots). Cached content continues to serve legitimate users from local SSD without consulting origin. Origin shielding ensures the attack never reaches the customer's origin server: the shield only forwards cache misses, and most attack traffic targets either non-existent paths (drop at edge) or popular cached paths (serve from cache). The customer sees normal traffic patterns at origin during the attack; the attacker burns bandwidth at the edge with no impact. Modern CDNs (Cloudflare, Akamai) have absorbed attacks > 25 Tbps without origin impact.
Without a shield, every edge POP independently fetches from origin on its first miss. For a popular object newly published: 300 POPs all see misses around the same time and each fetches once -> 300 origin requests for the same object. With a regional shield (say, 1-3 per continent): 300 POP misses fan in to 3 shield misses, which fan in to 3 origin requests. Reduction: 100x. The cost is one extra hop on misses (~5-30 ms shield latency); for cached content the shield is bypassed, so 95%+ of requests don't see it. Shield itself maintains a larger cache than any individual edge (it sees more of the long tail), so its hit ratio for warm objects is high. Some CDNs (Akamai's tiered distribution, Fastly's Origin Shield) make this configurable; modern defaults always include a shield tier.
Interview Tips
How to discuss this topic effectively
Lead with hit ratio economics. Saying 'every 1% of cache hit rate at 100 PB scale saves $80K/month' frames why the architecture is worth all this engineering.
Explain Anycast routing explicitly. Many candidates default to GeoDNS; Anycast is faster, cheaper to operate, and what every modern CDN actually uses.
Bring up the regional shield. Without it, a viral object fetches origin once per POP; with it, once per region. The math (300x to 3x) is the senior insight.
Default to soft purge. Hard purge has its place but most workflows benefit from stale-while-revalidate; mentioning the difference shows operational maturity.
Talk about cache key Vary explicitly. Mismanaged Vary headers (`Vary: User-Agent`) destroy hit rate; mentioning normalization is a senior signal.
Common Mistakes
Pitfalls to avoid in interviews
Using `Vary: User-Agent` without normalization
Every distinct User-Agent string becomes a separate cache entry; hit rate collapses to near zero. Either normalize User-Agent at the edge into a few classes (mobile/desktop/tablet) or omit Vary entirely and serve one canonical version.
Designing without a regional shield
Without a shield, a viral object missing at every edge POP simultaneously sends one origin request per POP; the origin gets crushed. A shield consolidates these into one fetch per region, cutting origin load by 100x for popular content.
Using hard purge for routine deploys
Hard purge instantly drops cached objects; the next user request waits for the origin fetch (slow first-byte). Soft purge marks stale and serves stale-while-revalidate, so users see no latency hit and origin load stays smooth. Reserve hard purge for security/takedown scenarios.
Ignoring the cache stampede problem on cold popular objects
When a viral object first appears, every edge sees a miss simultaneously. Without request coalescing, all those misses go upstream. The fix is singleflight-style request coalescing: the first miss kicks off the upstream fetch; subsequent concurrent misses wait for that fetch to complete and share its response.
Serving HTTPS without connection pooling to origin
Each origin fetch over a fresh TCP+TLS connection costs ~50-100ms in handshakes. At a POP doing 10K origin fetches/second, that's millions of handshakes/day. Pool persistent HTTP/2 connections from edge to origin so handshakes amortize to near-zero per request.
