System Design Article
Design a Distributed File System (GFS/HDFS)
Difficulty: Hard
Design a Google-File-System or HDFS-style distributed file system that stores petabytes across commodity hardware, optimized for batch analytics workloads where files are large (gigabytes), reads are sequential, and writes are append-mostly. The interview centerpiece is the leader-based architecture: one strongly-consistent master node holds the entire file namespace and chunk locations in memory, while many chunkservers store the actual data in 64-128 MB chunks replicated three times across racks. We cover the lease-based primary-replica protocol that lets the master stay out of the data path, the heartbeat-and-chunk-report mechanism that keeps cluster state fresh, and the federation strategy for scaling beyond a single master's memory.
