Case Study
case-study
System Design
Design a URL Shortener (TinyURL)
Design a URL shortening service like TinyURL or bit.ly that maps a long URL to a 7 character code, redirects clicks in under 50 ms, and survives a 100:1 read-to-write ratio. This lesson walks through capacity estimation, the choice between counter based and hash based key generation, the database split between a key store and an analytics store, and the caching strategy that lets a single mid-tier service handle 10K redirects per second on commodity hardware.
Design Pastebin
Design a service like Pastebin or GitHub Gist where users dump up to 10 MB of text and share a link. The interview twist over a URL shortener: pastes are big, so you store them in object storage (S3) and only keep metadata in your database. This lesson covers the metadata vs blob split, expiration via S3 lifecycle policies, presigned URLs for direct uploads, syntax highlighting strategy, and how to handle the read pattern when most pastes are read once and never again.
Design Instagram (Photo Sharing)
Design a photo sharing service like Instagram with 500M daily active users uploading 100M photos a day, served as personalized feeds at sub-200 ms p99. The interview centerpiece is the news feed: fan-out on write versus fan-out on read, the celebrity problem, and the hybrid pull-on-read model that real Instagram uses. We also cover photo upload pipelines (presigned URLs, multi-resolution generation, CDN), the metadata data model, and how to scale follow graphs that go from a few friends to hundreds of millions of followers.
Design Twitter / X (Social Feed)
Design a microblogging service like Twitter or X with 250M daily active users posting 500M tweets a day, served as a personalized timeline at sub-200 ms p99. The interview centerpiece is the home timeline: hybrid fan-out at the celebrity boundary, write amplification math, and how Twitter built Manhattan and the Timeline Service to make 250M people see fresh tweets within seconds. We also cover trending topics, the search index, retweet semantics, and how Twitter handles 50,000 tweets per second when a major event happens.
Design Reddit (Forum / Voting)
Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.
Design YouTube (Video Platform)
Design a video platform like YouTube with 2 billion users, 500 hours of video uploaded every minute, and 1 billion hours watched per day. The interview centerpiece is the video pipeline: chunked uploads, parallel transcoding to 8 resolutions and 3 codecs, HLS/DASH adaptive streaming over a global CDN, and the metadata service that ties it all together. We also cover recommendations (the secondary feed problem), comment scaling, view-counter accuracy, and how YouTube serves 200 Tbps of egress without melting the internet.
Design TikTok (Short-Form Video)
Design TikTok with 1.5B monthly active users, 100M short videos uploaded daily, and the For You Page that decides which video plays next for every viewer in under 100 ms. Unlike Instagram and Twitter, TikTok has no follower-driven feed - the For You Page is pure ML recommendation from a global pool. The interview centerpiece is the recommendation system architecture: candidate retrieval, two-tower models, online ranking with engagement signals, and how to keep video pre-loaded so the next swipe is instant. We also cover content moderation at scale, edge caching for the long-form-of-short-form access pattern, and why TikTok's product choice eliminated the celebrity fan-out problem entirely.
Design Facebook News Feed
Design Facebook's News Feed for 2 billion daily active users where every feed open reads from a personalized, ML-ranked timeline assembled from thousands of candidate posts in real time. Unlike Instagram's chronological precomputed feed or TikTok's pure recommendation, Facebook blends a friend graph, group memberships, page follows, and ads into one ranked stream via the legendary EdgeRank-and-successor algorithms. The interview centerpiece is the aggregator pattern: parallel candidate retrieval from many sources, real-time feature lookup, ML scoring, and online filtering, all under a 200 ms p99 budget. We also cover real-time updates (push notifications when a friend posts), edge ranking signals, and how Meta keeps the feed fresh with no precomputed timeline.
Design a Chat System (WhatsApp)
Design a real-time chat system like WhatsApp serving 2B users sending 100B messages per day with sub-second delivery, presence indicators, and read receipts. The interview centerpiece is the persistent WebSocket connection layer: how many connections per server, how to route a message to a recipient who may be on a different server, and how to guarantee delivery when the recipient is offline. We cover the message delivery state machine (sent, delivered, read), the connection routing layer that maps user_id to a chat server, the message store for offline delivery, and presence/typing indicators that operate at a higher write rate than messages themselves.
Design a Notification Service
Design a multi-channel notification service that delivers 10B push, email, and SMS notifications per day across three independent provider networks (APNs, FCM, SendGrid, Twilio) with priority queues, per-user rate limits, and idempotent retries. The interview centerpiece is the fan-out from a single application event to multiple channels and providers, each with its own rate limits, failure modes, and delivery semantics. We cover priority queues for transactional vs marketing traffic, retry policies with exponential backoff, deduplication of duplicate triggers, user preference enforcement, and the device token lifecycle that quietly invalidates tens of millions of tokens per day.
Design an Email Service (Gmail)
Design an email service like Gmail handling 1.8B users storing 500EB of email, accepting ~300B inbound messages per day from the public SMTP network while filtering 90%+ as spam, and serving full-text search over a user's entire inbox in sub-200ms. The interview centerpiece is the asymmetric architecture: SMTP is an untrusted public protocol with hostile traffic patterns (spam, phishing, sender forgery) that needs heavy gateway-side filtering, while the user-facing IMAP/web layer needs cheap reads, pagination of huge mailboxes, and per-user inverted indexes for search. We cover the SMTP MX gateway, the spam pipeline (SPF/DKIM/DMARC + ML), the per-user inverted index for search, and how mailboxes scale when one user holds 50GB of email.
Design Video Conferencing (Zoom)
Design a real-time video conferencing system like Zoom that supports 1-on-1 calls and meetings of up to 1000 participants with sub-200ms glass-to-glass latency, adapts to user bandwidth, and runs reliably across mobile networks. The interview centerpiece is the choice of media topology: peer-to-peer mesh (small calls), MCU mixing (centralized, expensive), or SFU forwarding (the modern standard). We cover the WebRTC stack (signaling vs media planes, ICE/STUN/TURN), simulcast and SVC for adaptive quality, recording pipelines, and how to keep latency low when participants span multiple continents.
Design Discord (Real-time Communities)
Design Discord, a real-time community platform with 200M monthly active users, organized into 'guilds' (servers) of up to 500K members each, with persistent text channels storing trillions of messages and live voice channels with sub-100ms latency. The interview centerpiece is the dual architecture: a sharded text-message store (Cassandra/ScyllaDB) with billions of messages per guild and per-channel ordering, plus a real-time voice infrastructure with regional voice servers and custom UDP transport. We cover guild sharding by Snowflake ID, the Elixir/Erlang gateway that holds millions of WebSocket connections, presence at the guild scale, and how Discord migrated from MongoDB to Cassandra to ScyllaDB as message volume crossed trillions.
Design Typeahead / Autocomplete
Design a typeahead/autocomplete service like Google Search's suggestion bar that returns the top 10 ranked completions for a query prefix in under 100ms p99, scaling to 5B searches per day with a multi-billion-entry suggestion index. The interview centerpiece is the data structure choice (trie vs sorted strings vs ngram index) and the offline pipeline that ranks suggestions by frequency, recency, personalization, and click-through rate. We cover the trie with precomputed top-K per node, edge n-gram indexes for typo tolerance, the MapReduce/Spark batch pipeline that rebuilds suggestions nightly, and the per-region edge cache that absorbs 99% of traffic.
Design a Web Crawler
Design a distributed web crawler that fetches 5 billion pages per month from the public web while respecting robots.txt, applying per-host politeness limits, deduplicating URLs and content across a 50PB corpus, and feeding the indexer pipeline downstream. The interview centerpiece is the URL frontier: a priority-aware queue of pending URLs sharded by host so politeness rules can be enforced per domain, plus content deduplication via hashing and shingling. We cover the fetcher worker pool, DNS caching, content extraction, the bloom-filter URL seen set, and how to handle hostile sites (large pages, redirect loops, slow responses, deliberate spam).
Design a Search Engine
Design a web-scale search engine that indexes 50B documents and serves 100K queries per second with sub-200ms p99 latency, ranking results by relevance (BM25), authority (PageRank), and personalization. The interview centerpiece is the inverted index sharded across thousands of nodes with scatter-gather query execution, plus the multi-stage ranking pipeline (cheap candidate generation, expensive learned-to-rank rerank). We cover document parsing and tokenization, the offline indexing pipeline (Spark MapReduce), term-partitioned vs document-partitioned sharding, query understanding and expansion, snippet generation, and how to keep the index fresh as the web changes.
Design Nearby / Location Service (Yelp)
Design a 'nearby' service like Yelp that returns the top businesses within a search radius of the user's location, ranking by distance, rating, and category, scaling to 200M monthly users querying 100M businesses. The interview centerpiece is the geospatial index: how to find 'all businesses within 5 km of (lat, lng)' efficiently. We compare bounding-box scans, geohashes, quadtrees, R-trees, and PostGIS GIST indexes; we recommend geohash + secondary index for write-heavy systems and quadtree/R-tree for read-heavy. We cover business storage and search, review ranking, the infrequent-update vs frequent-query asymmetry, and how to handle the long tail of remote regions.
Design a Rate Limiter
Design a distributed rate limiter that protects an API platform from abuse and uneven load while staying fast and accurate at 1B requests per day. The interview centerpiece is choosing among the five canonical algorithms (fixed window, sliding window log, sliding window counter, token bucket, leaky bucket) and explaining how to make the chosen one atomic across a Redis cluster. We cover where to place the limiter (edge, gateway, in-process), per-IP vs per-user vs per-API-key keys, returning 429 with Retry-After, the hot key problem, and fail-open vs fail-closed under cache outages.
Design an E-Commerce Platform (Amazon)
Design an Amazon-scale e-commerce platform that lets 200M monthly users browse 100M SKUs, add items to a cart, check out, and have orders fulfilled from regional warehouses. The interview centerpiece is the order lifecycle: how to reserve inventory atomically while a customer is on the checkout page, how to chain cart-to-payment-to-fulfillment as a saga with compensating actions, and how to make checkout idempotent so a flaky network never charges a customer twice. We also cover catalog browse at scale, multi-warehouse fulfillment routing, and the asymmetric read/write workload that makes aggressive catalog caching the right call.
Design a Ticketing System (Ticketmaster)
Design a Ticketmaster-style ticketing platform that sells reserved seats for concerts and sports events, with the central challenge being a flash onsale where 1M users compete for 50K seats in five minutes. The interview centerpiece is the seat reservation lock: each unique seat (Section A, Row 12, Seat 7) cannot be split or sub-bucketed like fungible inventory, so contention is unavoidable. We cover seat-level pessimistic holds with TTL, the virtual waiting room that randomizes queue position to absorb flash demand fairly, anti-bot defenses, dynamic pricing tiers, and the read-replica explosion that interactive seat maps cause.
Design a Payment System (Stripe)
Design a Stripe-style payment platform that processes 100M payments per day across 50 currencies and dozens of payment methods, where the central requirement is financial correctness: never charge a customer twice, never lose a payment, always reconcile to the cent. The interview centerpiece is the trio of idempotency keys, the payment intent state machine, and the immutable double-entry ledger - together they make the system safe in the face of network failures, partial outages, and adversarial retries. We also cover webhook delivery with signing and exponential backoff, PCI scope minimization through tokenization, multi-region availability, and the reconciliation jobs that compare our ledger to the bank's settlement files every night.
Design a Key-Value Store (DynamoDB)
Design a Dynamo-style distributed key-value store that scales linearly to thousands of nodes, stays available during partitions, and offers tunable consistency through a quorum (N, W, R). The interview centerpiece is the trio that makes this work at scale: consistent hashing with virtual nodes for partitioning, N/W/R quorums for replication and consistency, and vector clocks for resolving concurrent writes. We cover the gossip protocol for membership, Merkle trees for anti-entropy, hinted handoff for transient failures, sloppy quorum for write availability during partitions, and the LSM-tree storage engine that powers each node.
Design a Distributed Cache (Redis)
Design a Redis-style in-memory distributed cache that serves billions of GET/SET operations per day at sub-millisecond latency, with sharding across hundreds of nodes and explicit eviction when memory fills. The interview centerpiece is the eviction-and-partitioning combination: how LRU and LFU choose what to drop, and how a cluster picks which node owns each key without a central coordinator. We compare client-side hashing, proxy-based partitioning (twemproxy), and Redis Cluster's hash-slot model; we cover cache-aside as the dominant access pattern, replica failover, optional persistence, and the sub-ms latency budget that makes this design fundamentally different from the durable KV store covered in the previous case study.
Design Object Storage (S3)
Design an S3-style object storage service that stores trillions of immutable blobs ranging from 1 KB to 5 TB at eleven nines of durability and a fraction of the cost of triple replication. The interview centerpiece is the trio that makes this economical: erasure coding (typically 12 data shards plus 4 parity shards) instead of full replicas; a separate metadata service that maps object keys to chunk locations; and multi-part upload that lets a 5 TB object stream from many sources in parallel. We also cover the bucket/object namespace, lifecycle policies that move cold objects to colder tiers, immutability with versioning, pre-signed URLs for direct client transfer, and the move from eventual to strong read-after-write consistency that AWS shipped in 2020.
Design a Distributed File System (GFS/HDFS)
Design a Google-File-System or HDFS-style distributed file system that stores petabytes across commodity hardware, optimized for batch analytics workloads where files are large (gigabytes), reads are sequential, and writes are append-mostly. The interview centerpiece is the leader-based architecture: one strongly-consistent master node holds the entire file namespace and chunk locations in memory, while many chunkservers store the actual data in 64-128 MB chunks replicated three times across racks. We cover the lease-based primary-replica protocol that lets the master stay out of the data path, the heartbeat-and-chunk-report mechanism that keeps cluster state fresh, and the federation strategy for scaling beyond a single master's memory.
Design a Content Delivery Network
Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).
Design Uber / Lyft (Ride-Sharing)
Design a ride-sharing service like Uber that matches a rider's request to a nearby driver in under 5 seconds, streams driver locations every 4 seconds, computes ETAs, and applies surge pricing in real time at 1M concurrent active drivers and 100K rides/min globally. The interview centerpiece is the dispatch path: how to find the nearest available driver, hold them briefly, and confirm the match without race conditions. We compare geohash, S2, and H3 for the driver index and recommend H3 hex grid for ride-sharing because hex neighbors are equidistant. We cover the trip state machine, surge multipliers per cell, and how location updates fan out without melting the network.
Design Google Maps
Design Google Maps: a global mapping service that renders the Earth from 256x256 tiles, computes the shortest driving route in under 200 ms, and folds live traffic into routing for 1B users issuing 5B route requests per day. The interview centerpiece is the routing engine: how Dijkstra is too slow on a continent-scale graph and how Contraction Hierarchies (CH) precompute shortcuts so the live query is logarithmic. We cover the tile pyramid (zoom 0-20, ~1 trillion possible tiles at zoom 20), how live traffic from 100M Android phones updates edge weights every minute, and how to keep navigation latency under 1 second when re-routing.
Design Food Delivery (DoorDash)
Design a food delivery service like DoorDash that links three actors (customer, restaurant, courier) with an end-to-end SLA of <40 minutes per order at 10M orders per day across 500K restaurants. The interview centerpiece is the courier dispatch problem, which is fundamentally different from ride-sharing: it is a 3-leg trip (courier-to-restaurant, wait for food, restaurant-to-customer) and the platform routinely batches multiple orders onto one courier to cut cost. We compare Uber's 1:1 matching to DoorDash's many-to-1 batching, design the ETA composition (prep time + assignment time + drive time + handoff), and walk through the order state machine that coordinates three independent humans.
Design a Unique ID Generator
Design a service that generates globally unique, roughly time-sortable 64-bit IDs at 1M IDs per second across hundreds of application servers, without coordination on the hot path. The interview centerpiece is the trade-off between uniqueness, ordering, size, and coordination cost. We compare UUIDv4 (random, no coordination, 128 bits, no ordering), database AUTOINCREMENT (single point of contention), Twitter Snowflake (64 bits, time-ordered, requires worker_id assignment and clock discipline), Instagram's per-shard hybrid, and ULID/KSUID. We deep-dive into Snowflake: bit layout, clock skew handling, leader election for worker IDs, and the dreaded clock-rollback bug.
Design Google Docs (Collaborative Editing)
Design a real-time collaborative document editor like Google Docs where 1B+ users can co-edit the same document with sub-200 ms latency, never lose a keystroke, and converge to the same state across all clients regardless of network conditions. The interview centerpiece is concurrency control: how to merge two users' simultaneous edits without conflicts. We compare Operational Transformation (OT, used by Google Docs) and Conflict-free Replicated Data Types (CRDT, used by Figma, Notion, Linear), explain the convergence problem (TP1, TP2 properties), walk through cursor presence, and design the document storage as an append-only operation log compacted into snapshots.
Design a Stock Exchange
Design a stock exchange like NASDAQ that matches buy and sell orders for thousands of symbols at sub-100-microsecond latency, handles 200K orders per second per symbol at peak, and produces a deterministic, replayable trade history with regulatory audit guarantees. The interview centerpiece is the matching engine: a deliberately single-threaded, in-memory order book that processes orders sequentially in price-time priority. We design the order book data structures (price-indexed levels with FIFO queues), the gateway path (ultra-low-latency parsing and rate-limit), the event-sourced persistence (every order and trade as an append-only event), and how to scale by sharding per symbol.
