Tags

Search & Discovery

Search & Discovery

0 lessons
4 system designs

search-discovery

System Design

4 articles
System Design

Design Typeahead / Autocomplete

Design a typeahead/autocomplete service like Google Search's suggestion bar that returns the top 10 ranked completions for a query prefix in under 100ms p99, scaling to 5B searches per day with a multi-billion-entry suggestion index. The interview centerpiece is the data structure choice (trie vs sorted strings vs ngram index) and the offline pipeline that ranks suggestions by frequency, recency, personalization, and click-through rate. We cover the trie with precomputed top-K per node, edge n-gram indexes for typo tolerance, the MapReduce/Spark batch pipeline that rebuilds suggestions nightly, and the per-region edge cache that absorbs 99% of traffic.

design-typeahead
case-study
search-discovery
autocomplete
trie
edge-ngrams
ranking
top-k-precomputation
edge-caching
personalization
system-design
intermediate
free

712

23

Medium
System Design

Design a Web Crawler

Design a distributed web crawler that fetches 5 billion pages per month from the public web while respecting robots.txt, applying per-host politeness limits, deduplicating URLs and content across a 50PB corpus, and feeding the indexer pipeline downstream. The interview centerpiece is the URL frontier: a priority-aware queue of pending URLs sharded by host so politeness rules can be enforced per domain, plus content deduplication via hashing and shingling. We cover the fetcher worker pool, DNS caching, content extraction, the bloom-filter URL seen set, and how to handle hostile sites (large pages, redirect loops, slow responses, deliberate spam).

design-web-crawler
case-study
search-discovery
web-crawler
url-frontier
politeness
robots-txt
bloom-filter
shingling
minhash
content-dedup
distributed-fetching
system-design
intermediate
premium

626

6

Medium
System Design
Premium

Design a Search Engine

Design a web-scale search engine that indexes 50B documents and serves 100K queries per second with sub-200ms p99 latency, ranking results by relevance (BM25), authority (PageRank), and personalization. The interview centerpiece is the inverted index sharded across thousands of nodes with scatter-gather query execution, plus the multi-stage ranking pipeline (cheap candidate generation, expensive learned-to-rank rerank). We cover document parsing and tokenization, the offline indexing pipeline (Spark MapReduce), term-partitioned vs document-partitioned sharding, query understanding and expansion, snippet generation, and how to keep the index fresh as the web changes.

design-search-engine
case-study
search-discovery
search-engine
inverted-index
bm25
pagerank
scatter-gather
learned-to-rank
tf-idf
tokenization
near-real-time-indexing
system-design
advanced
premium

516

7

Hard
System Design

Design Nearby / Location Service (Yelp)

Design a 'nearby' service like Yelp that returns the top businesses within a search radius of the user's location, ranking by distance, rating, and category, scaling to 200M monthly users querying 100M businesses. The interview centerpiece is the geospatial index: how to find 'all businesses within 5 km of (lat, lng)' efficiently. We compare bounding-box scans, geohashes, quadtrees, R-trees, and PostGIS GIST indexes; we recommend geohash + secondary index for write-heavy systems and quadtree/R-tree for read-heavy. We cover business storage and search, review ranking, the infrequent-update vs frequent-query asymmetry, and how to handle the long tail of remote regions.

design-nearby-service
case-study
search-discovery
nearby-search
geospatial-index
geohash
quadtree
r-tree
postgis
yelp
location-based-services
spatial-indexing
system-design
intermediate
premium

171

4

Medium