Infrastructure & Storage
infrastructure-storage
System Design
Design a Key-Value Store (DynamoDB)
Design a Dynamo-style distributed key-value store that scales linearly to thousands of nodes, stays available during partitions, and offers tunable consistency through a quorum (N, W, R). The interview centerpiece is the trio that makes this work at scale: consistent hashing with virtual nodes for partitioning, N/W/R quorums for replication and consistency, and vector clocks for resolving concurrent writes. We cover the gossip protocol for membership, Merkle trees for anti-entropy, hinted handoff for transient failures, sloppy quorum for write availability during partitions, and the LSM-tree storage engine that powers each node.
Design a Distributed Cache (Redis)
Design a Redis-style in-memory distributed cache that serves billions of GET/SET operations per day at sub-millisecond latency, with sharding across hundreds of nodes and explicit eviction when memory fills. The interview centerpiece is the eviction-and-partitioning combination: how LRU and LFU choose what to drop, and how a cluster picks which node owns each key without a central coordinator. We compare client-side hashing, proxy-based partitioning (twemproxy), and Redis Cluster's hash-slot model; we cover cache-aside as the dominant access pattern, replica failover, optional persistence, and the sub-ms latency budget that makes this design fundamentally different from the durable KV store covered in the previous case study.
Design Object Storage (S3)
Design an S3-style object storage service that stores trillions of immutable blobs ranging from 1 KB to 5 TB at eleven nines of durability and a fraction of the cost of triple replication. The interview centerpiece is the trio that makes this economical: erasure coding (typically 12 data shards plus 4 parity shards) instead of full replicas; a separate metadata service that maps object keys to chunk locations; and multi-part upload that lets a 5 TB object stream from many sources in parallel. We also cover the bucket/object namespace, lifecycle policies that move cold objects to colder tiers, immutability with versioning, pre-signed URLs for direct client transfer, and the move from eventual to strong read-after-write consistency that AWS shipped in 2020.
Design a Distributed File System (GFS/HDFS)
Design a Google-File-System or HDFS-style distributed file system that stores petabytes across commodity hardware, optimized for batch analytics workloads where files are large (gigabytes), reads are sequential, and writes are append-mostly. The interview centerpiece is the leader-based architecture: one strongly-consistent master node holds the entire file namespace and chunk locations in memory, while many chunkservers store the actual data in 64-128 MB chunks replicated three times across racks. We cover the lease-based primary-replica protocol that lets the master stay out of the data path, the heartbeat-and-chunk-report mechanism that keeps cluster state fresh, and the federation strategy for scaling beyond a single master's memory.
Design a Content Delivery Network
Design a Cloudflare/Akamai/Fastly-style content delivery network that offloads 95%+ of static traffic from origin servers, brings latency from hundreds of milliseconds down to single digits, and absorbs DDoS attacks at the edge. The interview centerpiece is the cache hierarchy and routing: hundreds of edge POPs anycast-routed to the user's nearest location, a regional shield layer that consolidates fetches, and the origin only seeing the long tail of misses. We cover cache key design with Vary headers, the TTL lifecycle and purge model, stale-while-revalidate for resilience under origin outages, and the moves CDNs make to keep dynamic content fast (programmable edge functions, smart routing).
