System Design
system-design
System Design
SQL vs NoSQL - Choosing the Right Database
SQL vs NoSQL is the most common storage decision in system design interviews. SQL databases give you ACID guarantees, joins, and a fixed relational schema; NoSQL databases give you flexible schemas, horizontal scaling, and specialized data models. This lesson teaches you the four NoSQL families, the real engineering trade-offs, and a clear decision framework so you can defend your database choice in any interview.
Database Indexing & Query Optimization
Indexes turn O(N) full-table scans into O(log N) lookups, but every index costs storage and slows writes. This lesson teaches how B-tree and hash indexes work, when to use composite or covering indexes, how to read an EXPLAIN plan, and the common indexing mistakes that cause production outages. By the end you can defend any indexing decision in an interview and diagnose a slow query in production.
Database Replication (Leader-Follower, Multi-Leader)
Replication keeps copies of your data on multiple servers so you can survive failures, scale reads, and serve users from the nearest region. This lesson covers the three replication topologies (leader-follower, multi-leader, leaderless), the mechanics of synchronous and asynchronous replication, the consistency surprises that come with replication lag, and how to design failover and conflict resolution. By the end you can pick a topology and defend it in an interview, and recognize the bug class behind 'I just wrote it but the read says it does not exist'.
Database Sharding & Partitioning Strategies
Sharding splits a database into many smaller pieces (shards) so writes and storage can scale across servers. The hard part is not the splitting; it is choosing a shard key that avoids hot shards, supporting cross-shard queries, and rebalancing as the data grows. This lesson covers the four sharding strategies, how to pick a shard key, the operational realities of resharding, and when sharding is the wrong answer.
Blob Storage, Object Stores & CDNs
Databases are wrong for storing large unstructured files - photos, videos, backups, logs. Object stores like S3 give you cheap, durable, infinitely scalable storage for blobs, while CDNs cache that content at edges close to users. This lesson covers the object-storage data model, multi-part upload, storage classes, presigned URLs, and how a CDN turns a globally slow origin into a globally fast experience. By the end you can design the media layer for any social, video, or e-commerce system.
Data Warehousing, Data Lakes & OLAP vs OLTP
OLTP databases are built for fast single-row reads and writes; analytical queries against them choke. This lesson covers why analytics needs its own storage stack: column-oriented warehouses, lake formats, and lakehouse engines that scan billions of rows in seconds. You'll learn the OLTP versus OLAP trade-off, dimensional modeling (star schema), ETL versus ELT, change data capture, and how a modern data platform separates compute from storage so you can query petabytes for the cost of a coffee.
Caching Fundamentals (Write-Through, Write-Back, Write-Around)
A cache is a small, fast store that holds copies of data so the next request does not pay the cost of fetching it from the source of truth. This lesson covers what a cache is, where it lives in a stack, the four read and write patterns you will be asked about (cache-aside, read-through, write-through, write-back, write-around), eviction policies, and the failure modes (stampedes, hot keys, stale data) that bite real systems. By the end you can pick a caching strategy and defend it in an interview.
Distributed Caching (Redis, Memcached)
A single-node cache eventually runs out of RAM, CPU, or network. Distributed caching spreads keys across many nodes so total capacity and throughput scale horizontally. This lesson covers how Redis and Memcached partition data, replicate it for availability, fail over when nodes die, and how to choose between them. By the end you can design a multi-node cache layer for a real workload, defend the topology in an interview, and recognize the bug class behind 'why is one cache node maxed at 100% CPU while the others are idle?'.
Cache Invalidation Strategies & Consistency
There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. This lesson tackles the first one. We cover TTL-based, write-driven, and event-driven invalidation; the canonical race conditions (lost-update, double-write inconsistency, stale-after-failover); the consistency models a cache can offer; and the patterns that real systems (Facebook, Stripe, AWS) use to keep cached data trustworthy. By the end you can pick an invalidation strategy, defend it under interviewer pressure, and explain exactly why your cache will not silently serve yesterday's data.
Horizontal vs Vertical Scaling
When traffic grows, you have two choices: make the box bigger (vertical) or add more boxes (horizontal). This lesson lays out the cost, complexity, and ceiling of each approach, why stateless services scale horizontally with almost no thought, why stateful services require sharding or replication, and how real teams pick a default. By the end you can answer 'how would you scale this?' with a defensible answer instead of an instinct.
Load Balancing Algorithms & Patterns
A load balancer is the traffic cop in front of every horizontally scaled service. This lesson covers the four scheduling algorithms you need to know (round-robin, least-connections, weighted, hash), the difference between Layer 4 and Layer 7 load balancing, how health checks pull dead nodes out of rotation, the role of sticky sessions and connection draining, and the tools (NGINX, HAProxy, ELB/ALB, Envoy) that implement all of this. By the end you can pick the right algorithm for a workload and explain to an interviewer exactly how a request finds its way from the load balancer to a healthy backend.
Reverse Proxy & API Gateway
A reverse proxy sits at the edge of your infrastructure and terminates client connections so backends never see them directly. An API gateway is a reverse proxy with opinions: authentication, rate limiting, request transformation, and per-route policies. This lesson covers what each does, when one is enough and when you need the other, the canonical features (TLS termination, response caching, request shaping, JWT validation, circuit breaking), and the tools that implement them (NGINX, Envoy, Kong, AWS API Gateway, Apigee). By the end you can place either in a real architecture and articulate the boundary between them in an interview.
Auto-Scaling, Elasticity & Capacity Planning
Auto-scaling lets your fleet grow when traffic surges and shrink when it ebbs, so you pay for the load you actually have. This lesson covers reactive metric-based scaling, predictive (schedule-based) scaling, and the gotchas that turn auto-scaling into auto-outage: warm-up time, scale-down storms, downstream throttling, and cost runaway. We also walk through capacity planning: how to estimate the fleet size you need from QPS, latency targets, and headroom, before relying on the scaler to fix mistakes at 3 a.m. By the end you can configure an auto-scaling policy with confidence and explain to an interviewer why simply 'putting it on auto-scale' is not the actual answer.
CAP Theorem & Trade-offs
The CAP theorem says any distributed data store must trade off Consistency, Availability, or Partition tolerance during a network split, and you only get to keep two. This lesson cuts through the textbook version with the practical engineer's reading: partitions are non-negotiable, so the real choice is between consistency and availability when the network breaks. We cover what each property actually means, why CAP is misleading without PACELC, and how real systems (MongoDB, DynamoDB, Cassandra, Spanner) place themselves on the spectrum. By the end you can defend a system's CAP choice in an interview without falling into the common 'I picked CA' trap.
Consistency Models (Strong, Eventual, Causal)
Consistency models are the contract between a distributed data store and its clients about what they can and cannot observe. This lesson walks the spectrum from strict serializability at the strong end to eventual consistency at the relaxed end, with stops at linearizability, sequential, causal, read-your-writes, monotonic reads, and monotonic writes. We focus on what each model promises, what bugs it prevents, what it costs in latency and availability, and which production systems implement it. By the end you can name the model your system needs and explain why - the senior-level move that interviewers reward.
Consistent Hashing & Data Distribution
Consistent hashing is the trick that lets distributed caches and databases add or remove nodes without remapping every key in the cluster. This lesson explains why naive `hash(key) % N` is broken, how the hash ring works, why you need virtual nodes to keep load balanced, and how real systems (DynamoDB, Cassandra, Memcached, Discord) implement it. We finish with the modern alternatives (rendezvous hashing, jump consistent hash, Maglev) and the trade-offs that make consistent hashing the answer in interviews 90% of the time.
Leader Election & Consensus (Raft, Paxos)
Leader election is how a distributed cluster picks one node to be in charge so the others can stop arguing. This lesson covers the consensus problem (FLP impossibility), Paxos in concept, Raft in detail (leader election + log replication + safety), the role of quorum, and the operational pitfalls of split brain and network partitions. We also tour the systems that ship Raft or Paxos in production: etcd, ZooKeeper, Consul, CockroachDB, MongoDB, Spanner. By the end you can explain why every modern distributed database has a consensus protocol at its core, and you can sketch Raft on a whiteboard.
Distributed Transactions (2PC, Saga Pattern)
When a single business operation spans multiple services or databases, you cannot rely on a single ACID transaction. This lesson covers the two dominant patterns for keeping consistency across services: Two-Phase Commit (2PC) for synchronous, atomic, blocking transactions, and the Saga pattern (orchestration vs choreography) for long-running asynchronous workflows with compensating actions. We also cover Three-Phase Commit, idempotency keys, the outbox pattern, and the trade-offs that explain why 2PC is rare in microservices and Sagas are everywhere. By the end you can pick the right pattern for an order checkout, a money transfer, or a multi-step booking flow.
Message Queues (Kafka, RabbitMQ, SQS)
Message queues let one service hand work to another without waiting, smoothing traffic spikes, decoupling services, and surviving downstream outages. This lesson covers the two queue families (broker-based like RabbitMQ and SQS vs log-based like Kafka), the delivery semantics (at-most-once, at-least-once, exactly-once), the operational essentials (DLQs, consumer groups, backpressure, ordering), and the trade-offs that decide between Kafka, RabbitMQ, and SQS for any given workload. By the end you can pick a queue and defend the choice with the per-property reasoning interviewers reward.
Event-Driven Architecture & Pub/Sub
Event-driven architecture (EDA) is a style where services communicate by emitting and reacting to immutable events instead of calling each other directly. This lesson covers the publish/subscribe pattern, the difference between event notification and event-carried state transfer, the role of an event bus, and how EDA reshapes coupling, scalability, and consistency. We compare it with request/response, walk through real implementations on Kafka, Kinesis, EventBridge, and SNS, and end with the operational pitfalls (event versioning, ordering, schema drift, observability) that bite teams who adopt EDA without preparation.
Stream Processing (Kafka Streams, Flink)
Stream processing is the discipline of computing on continuous, unbounded data as it arrives, instead of in periodic batches. This lesson covers the core stream-processing primitives: stateful operators, event time vs processing time, watermarks, windowing (tumbling, sliding, session), exactly-once semantics, and stateful checkpointing. We compare the leading engines (Kafka Streams, Apache Flink, Spark Structured Streaming) and walk through real production patterns: real-time analytics, fraud detection, ML feature pipelines, and CDC-driven materialized views. By the end you can sketch a Flink pipeline on a whiteboard and defend the windowing and checkpointing choices.
Fault Tolerance, Redundancy & Failover
Fault tolerance is the property that lets a system keep working when components fail - and at any reasonable scale, components are always failing. This lesson covers the building blocks: redundancy (active-active, active-passive), failure detection (health checks, heartbeats), failover (automatic, manual), and the patterns that make systems gracefully degrade instead of catastrophically crash (circuit breakers, retries with backoff, bulkheads, timeouts). We finish with the operational disciplines that turn architecture into reality: chaos engineering, runbooks, blast-radius analysis, and disaster recovery (RTO/RPO). By the end you can design a system that survives the failure modes interviewers love to throw at you.
Monitoring, Logging, Alerting & SLAs
Observability is what lets you know whether your system is working before customers do. This lesson covers the three pillars (metrics, logs, traces), the SRE-grade definitions of SLI / SLO / SLA, and the operational practices that turn raw telemetry into actionable alerts (RED method, USE method, error budgets, alert fatigue control). We tour the standard production stack (Prometheus, Grafana, OpenTelemetry, ELK, Datadog) and the pitfalls that cause teams to either drown in alerts or miss real incidents. By the end you can design an observability strategy and defend it in an interview against the question 'how would you know if this system was broken?'.
Design a URL Shortener (TinyURL)
Design a URL shortening service like TinyURL or bit.ly that maps a long URL to a 7 character code, redirects clicks in under 50 ms, and survives a 100:1 read-to-write ratio. This lesson walks through capacity estimation, the choice between counter based and hash based key generation, the database split between a key store and an analytics store, and the caching strategy that lets a single mid-tier service handle 10K redirects per second on commodity hardware.
Design Pastebin
Design a service like Pastebin or GitHub Gist where users dump up to 10 MB of text and share a link. The interview twist over a URL shortener: pastes are big, so you store them in object storage (S3) and only keep metadata in your database. This lesson covers the metadata vs blob split, expiration via S3 lifecycle policies, presigned URLs for direct uploads, syntax highlighting strategy, and how to handle the read pattern when most pastes are read once and never again.
Design Instagram (Photo Sharing)
Design a photo sharing service like Instagram with 500M daily active users uploading 100M photos a day, served as personalized feeds at sub-200 ms p99. The interview centerpiece is the news feed: fan-out on write versus fan-out on read, the celebrity problem, and the hybrid pull-on-read model that real Instagram uses. We also cover photo upload pipelines (presigned URLs, multi-resolution generation, CDN), the metadata data model, and how to scale follow graphs that go from a few friends to hundreds of millions of followers.
Design Twitter / X (Social Feed)
Design a microblogging service like Twitter or X with 250M daily active users posting 500M tweets a day, served as a personalized timeline at sub-200 ms p99. The interview centerpiece is the home timeline: hybrid fan-out at the celebrity boundary, write amplification math, and how Twitter built Manhattan and the Timeline Service to make 250M people see fresh tweets within seconds. We also cover trending topics, the search index, retweet semantics, and how Twitter handles 50,000 tweets per second when a major event happens.
Design Reddit (Forum / Voting)
Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.
Design YouTube (Video Platform)
Design a video platform like YouTube with 2 billion users, 500 hours of video uploaded every minute, and 1 billion hours watched per day. The interview centerpiece is the video pipeline: chunked uploads, parallel transcoding to 8 resolutions and 3 codecs, HLS/DASH adaptive streaming over a global CDN, and the metadata service that ties it all together. We also cover recommendations (the secondary feed problem), comment scaling, view-counter accuracy, and how YouTube serves 200 Tbps of egress without melting the internet.
Design TikTok (Short-Form Video)
Design TikTok with 1.5B monthly active users, 100M short videos uploaded daily, and the For You Page that decides which video plays next for every viewer in under 100 ms. Unlike Instagram and Twitter, TikTok has no follower-driven feed - the For You Page is pure ML recommendation from a global pool. The interview centerpiece is the recommendation system architecture: candidate retrieval, two-tower models, online ranking with engagement signals, and how to keep video pre-loaded so the next swipe is instant. We also cover content moderation at scale, edge caching for the long-form-of-short-form access pattern, and why TikTok's product choice eliminated the celebrity fan-out problem entirely.
Design Facebook News Feed
Design Facebook's News Feed for 2 billion daily active users where every feed open reads from a personalized, ML-ranked timeline assembled from thousands of candidate posts in real time. Unlike Instagram's chronological precomputed feed or TikTok's pure recommendation, Facebook blends a friend graph, group memberships, page follows, and ads into one ranked stream via the legendary EdgeRank-and-successor algorithms. The interview centerpiece is the aggregator pattern: parallel candidate retrieval from many sources, real-time feature lookup, ML scoring, and online filtering, all under a 200 ms p99 budget. We also cover real-time updates (push notifications when a friend posts), edge ranking signals, and how Meta keeps the feed fresh with no precomputed timeline.
Design a Chat System (WhatsApp)
Design a real-time chat system like WhatsApp serving 2B users sending 100B messages per day with sub-second delivery, presence indicators, and read receipts. The interview centerpiece is the persistent WebSocket connection layer: how many connections per server, how to route a message to a recipient who may be on a different server, and how to guarantee delivery when the recipient is offline. We cover the message delivery state machine (sent, delivered, read), the connection routing layer that maps user_id to a chat server, the message store for offline delivery, and presence/typing indicators that operate at a higher write rate than messages themselves.
Design a Notification Service
Design a multi-channel notification service that delivers 10B push, email, and SMS notifications per day across three independent provider networks (APNs, FCM, SendGrid, Twilio) with priority queues, per-user rate limits, and idempotent retries. The interview centerpiece is the fan-out from a single application event to multiple channels and providers, each with its own rate limits, failure modes, and delivery semantics. We cover priority queues for transactional vs marketing traffic, retry policies with exponential backoff, deduplication of duplicate triggers, user preference enforcement, and the device token lifecycle that quietly invalidates tens of millions of tokens per day.
Design an Email Service (Gmail)
Design an email service like Gmail handling 1.8B users storing 500EB of email, accepting ~300B inbound messages per day from the public SMTP network while filtering 90%+ as spam, and serving full-text search over a user's entire inbox in sub-200ms. The interview centerpiece is the asymmetric architecture: SMTP is an untrusted public protocol with hostile traffic patterns (spam, phishing, sender forgery) that needs heavy gateway-side filtering, while the user-facing IMAP/web layer needs cheap reads, pagination of huge mailboxes, and per-user inverted indexes for search. We cover the SMTP MX gateway, the spam pipeline (SPF/DKIM/DMARC + ML), the per-user inverted index for search, and how mailboxes scale when one user holds 50GB of email.
Design Video Conferencing (Zoom)
Design a real-time video conferencing system like Zoom that supports 1-on-1 calls and meetings of up to 1000 participants with sub-200ms glass-to-glass latency, adapts to user bandwidth, and runs reliably across mobile networks. The interview centerpiece is the choice of media topology: peer-to-peer mesh (small calls), MCU mixing (centralized, expensive), or SFU forwarding (the modern standard). We cover the WebRTC stack (signaling vs media planes, ICE/STUN/TURN), simulcast and SVC for adaptive quality, recording pipelines, and how to keep latency low when participants span multiple continents.
Design Discord (Real-time Communities)
Design Discord, a real-time community platform with 200M monthly active users, organized into 'guilds' (servers) of up to 500K members each, with persistent text channels storing trillions of messages and live voice channels with sub-100ms latency. The interview centerpiece is the dual architecture: a sharded text-message store (Cassandra/ScyllaDB) with billions of messages per guild and per-channel ordering, plus a real-time voice infrastructure with regional voice servers and custom UDP transport. We cover guild sharding by Snowflake ID, the Elixir/Erlang gateway that holds millions of WebSocket connections, presence at the guild scale, and how Discord migrated from MongoDB to Cassandra to ScyllaDB as message volume crossed trillions.
Design Typeahead / Autocomplete
Design a typeahead/autocomplete service like Google Search's suggestion bar that returns the top 10 ranked completions for a query prefix in under 100ms p99, scaling to 5B searches per day with a multi-billion-entry suggestion index. The interview centerpiece is the data structure choice (trie vs sorted strings vs ngram index) and the offline pipeline that ranks suggestions by frequency, recency, personalization, and click-through rate. We cover the trie with precomputed top-K per node, edge n-gram indexes for typo tolerance, the MapReduce/Spark batch pipeline that rebuilds suggestions nightly, and the per-region edge cache that absorbs 99% of traffic.
Design a Web Crawler
Design a distributed web crawler that fetches 5 billion pages per month from the public web while respecting robots.txt, applying per-host politeness limits, deduplicating URLs and content across a 50PB corpus, and feeding the indexer pipeline downstream. The interview centerpiece is the URL frontier: a priority-aware queue of pending URLs sharded by host so politeness rules can be enforced per domain, plus content deduplication via hashing and shingling. We cover the fetcher worker pool, DNS caching, content extraction, the bloom-filter URL seen set, and how to handle hostile sites (large pages, redirect loops, slow responses, deliberate spam).
Design a Search Engine
Design a web-scale search engine that indexes 50B documents and serves 100K queries per second with sub-200ms p99 latency, ranking results by relevance (BM25), authority (PageRank), and personalization. The interview centerpiece is the inverted index sharded across thousands of nodes with scatter-gather query execution, plus the multi-stage ranking pipeline (cheap candidate generation, expensive learned-to-rank rerank). We cover document parsing and tokenization, the offline indexing pipeline (Spark MapReduce), term-partitioned vs document-partitioned sharding, query understanding and expansion, snippet generation, and how to keep the index fresh as the web changes.
Design Nearby / Location Service (Yelp)
Design a 'nearby' service like Yelp that returns the top businesses within a search radius of the user's location, ranking by distance, rating, and category, scaling to 200M monthly users querying 100M businesses. The interview centerpiece is the geospatial index: how to find 'all businesses within 5 km of (lat, lng)' efficiently. We compare bounding-box scans, geohashes, quadtrees, R-trees, and PostGIS GIST indexes; we recommend geohash + secondary index for write-heavy systems and quadtree/R-tree for read-heavy. We cover business storage and search, review ranking, the infrequent-update vs frequent-query asymmetry, and how to handle the long tail of remote regions.
Design a Rate Limiter
Design a distributed rate limiter that protects an API platform from abuse and uneven load while staying fast and accurate at 1B requests per day. The interview centerpiece is choosing among the five canonical algorithms (fixed window, sliding window log, sliding window counter, token bucket, leaky bucket) and explaining how to make the chosen one atomic across a Redis cluster. We cover where to place the limiter (edge, gateway, in-process), per-IP vs per-user vs per-API-key keys, returning 429 with Retry-After, the hot key problem, and fail-open vs fail-closed under cache outages.
Design an E-Commerce Platform (Amazon)
Design an Amazon-scale e-commerce platform that lets 200M monthly users browse 100M SKUs, add items to a cart, check out, and have orders fulfilled from regional warehouses. The interview centerpiece is the order lifecycle: how to reserve inventory atomically while a customer is on the checkout page, how to chain cart-to-payment-to-fulfillment as a saga with compensating actions, and how to make checkout idempotent so a flaky network never charges a customer twice. We also cover catalog browse at scale, multi-warehouse fulfillment routing, and the asymmetric read/write workload that makes aggressive catalog caching the right call.
Design a Ticketing System (Ticketmaster)
Design a Ticketmaster-style ticketing platform that sells reserved seats for concerts and sports events, with the central challenge being a flash onsale where 1M users compete for 50K seats in five minutes. The interview centerpiece is the seat reservation lock: each unique seat (Section A, Row 12, Seat 7) cannot be split or sub-bucketed like fungible inventory, so contention is unavoidable. We cover seat-level pessimistic holds with TTL, the virtual waiting room that randomizes queue position to absorb flash demand fairly, anti-bot defenses, dynamic pricing tiers, and the read-replica explosion that interactive seat maps cause.
Design a Payment System (Stripe)
Design a Stripe-style payment platform that processes 100M payments per day across 50 currencies and dozens of payment methods, where the central requirement is financial correctness: never charge a customer twice, never lose a payment, always reconcile to the cent. The interview centerpiece is the trio of idempotency keys, the payment intent state machine, and the immutable double-entry ledger - together they make the system safe in the face of network failures, partial outages, and adversarial retries. We also cover webhook delivery with signing and exponential backoff, PCI scope minimization through tokenization, multi-region availability, and the reconciliation jobs that compare our ledger to the bank's settlement files every night.
Design a Key-Value Store (DynamoDB)
Design a Dynamo-style distributed key-value store that scales linearly to thousands of nodes, stays available during partitions, and offers tunable consistency through a quorum (N, W, R). The interview centerpiece is the trio that makes this work at scale: consistent hashing with virtual nodes for partitioning, N/W/R quorums for replication and consistency, and vector clocks for resolving concurrent writes. We cover the gossip protocol for membership, Merkle trees for anti-entropy, hinted handoff for transient failures, sloppy quorum for write availability during partitions, and the LSM-tree storage engine that powers each node.
Design a Distributed Cache (Redis)
Design a Redis-style in-memory distributed cache that serves billions of GET/SET operations per day at sub-millisecond latency, with sharding across hundreds of nodes and explicit eviction when memory fills. The interview centerpiece is the eviction-and-partitioning combination: how LRU and LFU choose what to drop, and how a cluster picks which node owns each key without a central coordinator. We compare client-side hashing, proxy-based partitioning (twemproxy), and Redis Cluster's hash-slot model; we cover cache-aside as the dominant access pattern, replica failover, optional persistence, and the sub-ms latency budget that makes this design fundamentally different from the durable KV store covered in the previous case study.
Design Object Storage (S3)
Design an S3-style object storage service that stores trillions of immutable blobs ranging from 1 KB to 5 TB at eleven nines of durability and a fraction of the cost of triple replication. The interview centerpiece is the trio that makes this economical: erasure coding (typically 12 data shards plus 4 parity shards) instead of full replicas; a separate metadata service that maps object keys to chunk locations; and multi-part upload that lets a 5 TB object stream from many sources in parallel. We also cover the bucket/object namespace, lifecycle policies that move cold objects to colder tiers, immutability with versioning, pre-signed URLs for direct client transfer, and the move from eventual to strong read-after-write consistency that AWS shipped in 2020.
Design a Distributed File System (GFS/HDFS)
Design a Google-File-System or HDFS-style distributed file system that stores petabytes across commodity hardware, optimized for batch analytics workloads where files are large (gigabytes), reads are sequential, and writes are append-mostly. The interview centerpiece is the leader-based architecture: one strongly-consistent master node holds the entire file namespace and chunk locations in memory, while many chunkservers store the actual data in 64-128 MB chunks replicated three times across racks. We cover the lease-based primary-replica protocol that lets the master stay out of the data path, the heartbeat-and-chunk-report mechanism that keeps cluster state fresh, and the federation strategy for scaling beyond a single master's memory.
Design a Content Delivery Network
Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).
Design Uber / Lyft (Ride-Sharing)
Design a ride-sharing service like Uber that matches a rider's request to a nearby driver in under 5 seconds, streams driver locations every 4 seconds, computes ETAs, and applies surge pricing in real time at 1M concurrent active drivers and 100K rides/min globally. The interview centerpiece is the dispatch path: how to find the nearest available driver, hold them briefly, and confirm the match without race conditions. We compare geohash, S2, and H3 for the driver index and recommend H3 hex grid for ride-sharing because hex neighbors are equidistant. We cover the trip state machine, surge multipliers per cell, and how location updates fan out without melting the network.
Design Google Maps
Design Google Maps: a global mapping service that renders the Earth from 256x256 tiles, computes the shortest driving route in under 200 ms, and folds live traffic into routing for 1B users issuing 5B route requests per day. The interview centerpiece is the routing engine: how Dijkstra is too slow on a continent-scale graph and how Contraction Hierarchies (CH) precompute shortcuts so the live query is logarithmic. We cover the tile pyramid (zoom 0-20, ~1 trillion possible tiles at zoom 20), how live traffic from 100M Android phones updates edge weights every minute, and how to keep navigation latency under 1 second when re-routing.
Design Food Delivery (DoorDash)
Design a food delivery service like DoorDash that links three actors (customer, restaurant, courier) with an end-to-end SLA of <40 minutes per order at 10M orders per day across 500K restaurants. The interview centerpiece is the courier dispatch problem, which is fundamentally different from ride-sharing: it is a 3-leg trip (courier-to-restaurant, wait for food, restaurant-to-customer) and the platform routinely batches multiple orders onto one courier to cut cost. We compare Uber's 1:1 matching to DoorDash's many-to-1 batching, design the ETA composition (prep time + assignment time + drive time + handoff), and walk through the order state machine that coordinates three independent humans.
Design a Unique ID Generator
Design a service that generates globally unique, roughly time-sortable 64-bit IDs at 1M IDs per second across hundreds of application servers, without coordination on the hot path. The interview centerpiece is the trade-off between uniqueness, ordering, size, and coordination cost. We compare UUIDv4 (random, no coordination, 128 bits, no ordering), database AUTOINCREMENT (single point of contention), Twitter Snowflake (64 bits, time-ordered, requires worker_id assignment and clock discipline), Instagram's per-shard hybrid, and ULID/KSUID. We deep-dive into Snowflake: bit layout, clock skew handling, leader election for worker IDs, and the dreaded clock-rollback bug.
Design Google Docs (Collaborative Editing)
Design a real-time collaborative document editor like Google Docs where 1B+ users can co-edit the same document with sub-200 ms latency, never lose a keystroke, and converge to the same state across all clients regardless of network conditions. The interview centerpiece is concurrency control: how to merge two users' simultaneous edits without conflicts. We compare Operational Transformation (OT, used by Google Docs) and Conflict-free Replicated Data Types (CRDT, used by Figma, Notion, Linear), explain the convergence problem (TP1, TP2 properties), walk through cursor presence, and design the document storage as an append-only operation log compacted into snapshots.
Design a Stock Exchange
Design a stock exchange like NASDAQ that matches buy and sell orders for thousands of symbols at sub-100-microsecond latency, handles 200K orders per second per symbol at peak, and produces a deterministic, replayable trade history with regulatory audit guarantees. The interview centerpiece is the matching engine: a deliberately single-threaded, in-memory order book that processes orders sequentially in price-time priority. We design the order book data structures (price-indexed levels with FIFO queues), the gateway path (ultra-low-latency parsing and rate-limit), the event-sourced persistence (every order and trade as an append-only event), and how to scale by sharding per symbol.
Authentication & Authorization (OAuth2, JWT, RBAC)
Authentication answers 'who are you?'. Authorization answers 'what are you allowed to do?'. Most systems get both wrong in subtle ways: rolling their own crypto, treating JWTs as a session store, copying RBAC into every service, or never thinking about how to revoke a leaked credential. This lesson covers the standard building blocks: password storage with adaptive hashing, session vs token authentication, OAuth2 and OIDC flows, JWTs and their honest trade-offs, RBAC vs ABAC vs ReBAC, multi-tenant authorization at scale, machine-to-machine auth (API keys, mTLS, workload identity), and the operational concerns (key rotation, revocation, audit). The goal is to leave you able to design and defend the auth architecture for any system, from a single product to a federated multi-tenant platform.
Data Pipelines & ETL/ELT
Data pipelines move data from operational systems (your transactional databases, event logs, third-party APIs) into analytical systems (warehouses, lakes, search indexes, ML feature stores). The 'shape' of the pipeline (ETL vs ELT, batch vs incremental, push vs pull) determines latency, cost, and how painful schema changes will be. This lesson covers the architectural choices: ingestion patterns, transformation engines (dbt, Spark, Beam), orchestration (Airflow, Dagster, Prefect), data quality, lineage, and the standard production layout (raw / staging / mart). It also covers the failure modes you must design for: late-arriving data, idempotency, backfills, schema evolution, and the silent corruption that comes from not testing your pipelines.
Microservices vs Monolith: When to Choose What
Microservices are not a maturity badge. Monoliths are not a code smell. The honest interview answer is that architecture is a continuum (monolith, modular monolith, services, microservices) and the right point on it is set by team size, deployment frequency, and the cost of distribution, not by what the cool kids at Netflix did. This lesson walks through the trade-offs concretely: latency tax, operational overhead, organizational coupling (Conway's Law), data consistency, and the migration paths that work. By the end you can defend either choice for a given product without reaching for buzzwords.
The System Design Interview Framework (RESHADED)
A system design interview is 45-60 minutes to design something the interviewer has been thinking about for years. Without a framework you will spend the first 20 minutes flailing, the next 20 deep in one corner, and the last 20 watching the interviewer try to redirect you. The RESHADED framework (Requirements, Estimation, Schema / API, High-level design, Architecture deep dive, Edge cases, Done / wrap-up) gives you a defensible structure that maps to how senior engineers actually think. This lesson walks through every stage with concrete tactics: the questions to ask in Requirements, the back-of-envelope numbers to estimate, the layer to draw first in HLD, the components to deep-dive into, and how to read the interviewer's signals to know what they want next. By the end you can walk into any system design interview with a known opener and a sequence of moves that work for any prompt.
Batch vs Stream Processing (Lambda/Kappa)
Batch processing computes results over a finite, bounded dataset. Stream processing computes results continuously over an unbounded, ever-arriving dataset. The two paradigms have different latency, cost, correctness, and operational profiles, and choosing wrong is one of the most expensive architectural mistakes a senior engineer can make. This lesson covers the mental model (bounded vs unbounded data, event time vs processing time, watermarks, windows), the two classical reference architectures (Lambda and Kappa), the modern unified models (Beam, Flink), and the production realities of exactly-once semantics, late data, replays, and operational complexity. The goal is to leave you able to choose batch, streaming, or a hybrid for any system, and to defend the choice in an interview.
Encryption at Rest/Transit & Data Privacy (GDPR)
Encryption protects data from unauthorized access; privacy regulations (GDPR, CCPA, HIPAA, PCI-DSS) determine what data you may collect, how you must protect it, who can see it, and how you must respond to user requests. The two intersect: regulations mandate encryption in many cases, and encryption is the technical foundation for most privacy controls. This lesson covers the standard primitives (TLS 1.3 for transit, AES-GCM and envelope encryption for rest), key management (KMS, HSM, key rotation), application-level encryption (per-tenant keys, field-level encryption, deterministic encryption for searchability), the privacy-engineering layer (data classification, minimization, retention, right-to-be-forgotten), and the operational realities (key compromise, crypto-shredding, BYOK, audit logs). The goal is to leave you able to design a system that is encryption-correct, privacy-compliant, and operationally honest about its trade-offs.
Event Sourcing & CQRS
Event Sourcing stores every change to your application state as an immutable event, and the current state is what you get when you replay them. CQRS splits the read and write paths so each can be optimized independently. Together they unlock auditability, time travel, and read/write scaling that traditional CRUD cannot. They also introduce eventual consistency, schema evolution pain, and a steep operational learning curve. This lesson teaches the mechanics, the implementation patterns (event store, snapshots, projections, sagas), and the honest answer to when these patterns are worth the cost (financial ledgers, audit-heavy domains, complex business workflows) and when they are over-engineering (a typical SaaS CRUD app).
Back-of-the-Envelope Estimation & Capacity Planning
Back-of-the-envelope estimation is the math you do in three minutes to ground a system design in numbers. It is what tells you whether your single Postgres instance can handle the load (no), how much storage you need over five years (probably more than you think), and how much CDN bandwidth you are about to commit to (probably more than that). This lesson covers the standard latency / throughput / size / bandwidth numbers every engineer should have memorized, the unit conversions and order-of-magnitude reasoning that keep you fast, the templates for QPS, storage, and bandwidth estimation, capacity planning beyond steady state (peak vs average, headroom, growth, regional, seasonal), and the cost rough-arithmetic that turns 'we need more servers' into a defensible business case. The goal is to leave you able to walk into any interview or design review and produce useful numbers in three minutes flat.
DDoS Protection, WAF & Security Best Practices
DDoS attacks try to exhaust your bandwidth, your TCP stack, your application capacity, or your downstream dependencies. A WAF (web application firewall) tries to block exploit traffic before it reaches your code. Together with rate limiting, bot management, anti-abuse tooling, and a hardened application layer, they form the defensive perimeter that real production systems live behind. This lesson covers the layered defense: edge / CDN scrubbing for L3/L4 floods, rate limiting and bot detection for L7 abuse, WAF rules for OWASP-class exploits, the OWASP Top 10 with concrete mitigations, secure development practices (input validation, output encoding, secrets management, dependency hygiene), incident response, and the operational realities of running this stack (false positives, vendor selection, escalation, post-mortems). The goal is to leave you able to design and defend the security perimeter for any user-facing system.
ML System Design (Feature Store, Model Serving)
An ML system in production is mostly a data system with a model in the middle. The model is the smallest, most-discussed, and least-troublesome part. The hard parts are training data pipelines, feature freshness and parity between training and serving, the feature store that enforces that parity, model deployment and rollback, online and offline evaluation, and the operational concern that the model silently degrades as the world drifts. This lesson covers the canonical reference architecture: training pipeline, feature store with online and offline halves, model registry, serving infrastructure, monitoring, and the feedback loop. It is the senior-level mental model for designing 'add ML to product X' without falling into the standard traps.
Service Mesh, Sidecar & Service Discovery
Once you have more than a handful of services, the cross-cutting concerns (mTLS, retries, circuit breaking, load balancing, traffic shifting, observability) start to dominate. Doing them in every service in every language is a maintenance nightmare. The sidecar pattern moves these concerns into a co-located proxy that runs next to your service, and a service mesh is the control plane that programs every sidecar in your fleet from one place. This lesson covers how a mesh actually works (data plane vs control plane, Envoy as the de-facto data plane, Istio and Linkerd as control planes), how service discovery underpins it, and the very real cost (latency tax, complexity, on-call burden) so you know when a mesh helps and when it is over-engineering.
Recommendation Systems Architecture
A recommendation system at scale is a multi-stage funnel: candidate generation narrows millions of items to a few thousand, light ranking trims to a few hundred, heavy ranking scores those, and a re-ranking stage applies business and policy constraints. Each stage has a different latency budget, a different model, and a different operational profile. This lesson covers the canonical architecture (retrieval + ranking + re-ranking), the core algorithmic families (collaborative filtering, content-based, two-tower neural retrieval, sequential models), the embedding store and vector ANN serving stack, the cold-start problem, ranking objectives and the metrics that measure them, and the rollout / monitoring discipline that keeps the system honest. The goal is to leave you able to design the recommendation system for any consumer product and defend every layer's choices.
Serverless Architecture & FaaS
Serverless does not mean 'no servers'. It means the cloud provider runs the servers, scales them to zero when idle, and bills you per request rather than per running hour. Functions-as-a-Service (Lambda, Cloud Functions, Cloud Run, Azure Functions) is the most visible flavor. The pattern is genuinely powerful for spiky workloads, glue code, and small teams who want to skip the infrastructure tax. It is genuinely a bad fit for steady high-throughput services, latency-critical paths, and stateful systems. This lesson covers how serverless actually executes (cold starts, warm pools, concurrency limits), the architectural patterns it enables, the patterns it breaks, and the honest cost model.
Multi-Region, Multi-Tenant Architecture
Going from one region to many is one of the largest architectural commitments a company can make. The motivations are real (latency for global users, regulatory data residency, disaster recovery, regional uptime SLOs) and so are the costs (cross-region replication latency, conflict resolution, deployment complexity, blast-radius management, double or triple infrastructure spend). Multi-tenancy adds another orthogonal axis: how do you share the same infrastructure safely across hundreds or thousands of customers without one of them noisy-neighboring everyone else? This lesson covers active-active vs active-passive deployments, the data layer (replication, conflict handling, GDPR-style data residency), DNS and traffic routing, deployment topology, and the tenancy patterns (silo, pool, bridge) along with when each is the right answer.
Search Indexing at Scale (Elasticsearch)
Search at scale is two systems in one: an indexing pipeline that ingests, transforms, and stores documents into an inverted index (and increasingly a vector index), and a query path that distributes searches across shards, scores results, and merges them under tight latency budgets. Elasticsearch and OpenSearch are the dominant production engines, and almost every large product runs one. This lesson covers the architecture: how Lucene segments and inverted indexes work, how Elasticsearch shards and replicates them, the tokenization and analyzer pipeline that determines what 'matches' mean, the query coordinator -> shard fan-out -> merge flow, hybrid search (lexical + vector), reindexing strategies, and the operational realities (hot shards, mapping explosions, garbage collection pauses, write amplification). The goal is to leave you able to design and operate search for any catalog from a million to billions of documents.
Behavioral Interviews
Navigating Technical Trade-offs
Trade-off questions are the senior-engineering judgement probe. They test whether you can weigh competing technical priorities, articulate the criteria that drove your choice, own the path you took including its costs, and distinguish real trade-offs from false choices that better engineering would dissolve. This lesson defines trade-off literacy across the canonical axes (consistency vs availability, build vs buy, simplicity vs flexibility, speed vs safety, cost vs latency), walks through the explicit-criteria framework strong candidates use to make trade-offs visible, covers the technical-debt framing that scores best in interviews, and provides fully worked model STAR answers for the prompts you will hear most. After this lesson you will be able to take any consequential technical choice from your career and tell the story so the rubric reads judgement, calibration, and ownership simultaneously.
System Design Decision Stories
System design decision questions are the staff-and-above architecture probe. They test whether you can shape a design that compounds correctly over years, demonstrate second-order thinking about how decisions interact, balance forward-looking design with iterative delivery, and tell a story that operates at the right altitude for staff scale. This lesson defines what counts as a scale-shaping decision (architectural choices whose costs and benefits compound), walks through how to present design decisions in narrative form rather than whiteboard form, covers the second-order-thinking moves that distinguish staff stories from senior stories, addresses when to over-engineer versus when to ship-and-iterate, and provides fully worked model STAR answers for the prompts you will hear most. After this lesson you will be able to take any consequential architectural decision from your career and tell the story so the rubric reads design judgement, second-order thinking, and operating at staff altitude.
Community
CAP, PACELC, and the Trade-off People Misquote
CAP is a real theorem about a narrow edge case. PACELC is the framing that captures the trade-off teams actually make in production.
Datadog Onsite: Five Hours of System Design
A Datadog senior backend onsite where four of the five rounds were system design, anchored on real telemetry-shaped problems.
Designing a Feed in 45 Minutes at a Mid-Size SaaS
A senior system design round at a mid-size B2B SaaS where the prompt was a generic activity feed but 45 minutes forced me to commit to a fan-out strategy in the first ten minutes.
Caching Strategies: Write-Through, Write-Behind, and When Each Fits
Write-through is the safe default. Write-behind is the option for write-heavy paths. Cache-aside is what most teams actually use, and that is fine.
RBAC vs ABAC vs ReBAC, Explained
RBAC, ABAC, and ReBAC are different shapes for different rules, not stages of maturity. Pick by the shape of your access policy, and most real systems end up a thoughtful hybrid.
Backend Loop Questions That Actually Test System Design
Five backend coding questions where the surface is a function but the real signal is your system-design instincts. None of them want the cleverest algorithm; all of them want the right data model and the right failure mode.
My Google L4 Interview Experience
A round-by-round account of my Google L4 software engineer loop, from recruiter screen to team match, ending in an offer.
Pagination Strategies: Offset, Cursor, and Keyset
Offset is the default that breaks under load. Keyset is what you want for most lists. Cursor is keyset wearing a public costume. Pick deliberately, not by ORM defaults.
Building a Notification Service From Scratch
Delivery is the easy part. Preferences, dedup, throttling, and timezone-aware digests are where notification services succeed or generate complaints.
API Gateway vs BFF vs Reverse Proxy
Three terms, three distinct concerns, three different owners. Most teams collapse them and end up with one thing pretending to be all three.
The Saga Pattern: When Distributed Transactions Aren't an Option
Why 2PC is rarely available, what a saga actually is, and the compensation design rules that separate working sagas from stuck ones.
The Sysdesign Round Where I Talked Myself Out of an Offer
I drew a clean diagram, then over-explained every tradeoff until the interviewer no longer trusted any of them. A postmortem on a defensible answer that still got rejected.
SSR, CSR, SSG, ISR: Pick the Right One
Four rendering strategies, four cost profiles. Pick by data freshness and personalization needs, not by which acronym sounds most modern.
Event-Driven Architecture and the Three Failure Modes
Lost messages, out-of-order delivery, duplicate processing. EDA buys decoupling and replay; the price is three failure modes you must operate.
Microservices vs Monolith: An Honest Comparison
Modular monolith is the right default for most teams. Microservices earn their cost only past a specific organizational scale, and the bar is higher than the literature suggests.
Rate Limiting: Token Bucket vs Sliding Window
Token bucket is the right default. Sliding window log is correct but expensive. Fixed window is the algorithm I would not ship.
System Design Interview at Stripe
A senior backend system design round at Stripe centered on idempotent webhooks, the failure mode I missed, and how the interviewer pushed me from a clean diagram to a defensible one.
REST vs GraphQL vs RPC: Pick the Fit, Not the Trend
Three protocols, three call shapes. The wrong choice is fixable, indecision is not. Pick by caller, dominant call shape, and how much HTTP caching matters.
Cloudflare System Design: The Edge-Latency Question
A senior backend system design round at Cloudflare anchored on p99 latency at the edge, where the interviewer pushed past the obvious answers until I had to commit to a defensible number budget.
Consistent Hashing Explained with a 200-Line Toy
A working Python toy of the ring, with virtual nodes, the bounded-movement test that proves the algorithm earns its complexity, and the cases where I would not reach for it.
Idempotency Keys: The Pattern Stripe Taught Everyone
The key itself is the trivial part. The lifecycle, the storage, the body fingerprint, and the TTL are where production teams trip.
Senior Engineer Design Questions I Actually Use
Four open-ended design prompts I ask in senior engineer loops. There is no clean LeetCode answer; I am listening for how the candidate frames the tradeoff, when they push back, and whether they can ship a v1 before optimizing.
Coinbase System Design Round: What "Crypto-Native" Meant
A senior backend system design round at Coinbase where the generic exchange-order-book prompt was actually grading deposit confirmations, double-spend windows, and the cold-wallet boundary.
Staff+ Tradeoff Questions With No Right Answer
Four staff-plus prompts where the interviewer is testing whether you can hold two answers in your head and pick the right one for a specific context. The Python is intentionally thin: this is about judgment, not syntax.
Shopify Senior Engineer Loop: Take-Home Plus Architecture
A Shopify senior backend loop centered on a take-home, an architecture deep dive on what I built, and a Life Story round.
CDN 101: Edge Caches, Origin Shields, and Cache Keys
The cache key matters more than the TTL. Origin shield is a cheap config win. Most CDN incidents are key bugs, not capacity bugs.
Atlassian Senior SWE Loop: The Roadmap Round
How a roadmap-and-product round at Atlassian sank an otherwise solid senior backend loop, and what I would prep next time.
Meta E5 Backend, Phone Screen to Offer
A full-loop account of my Meta E5 backend interview, from cold-applying through team match, with the rounds and the calibration I missed.
