██████╗ ██████╗ ██████╗ ███████╗    ███████╗███╗   ██╗ █████╗ ████████╗ ██████╗██╗  ██╗██╔════╝██╔═══██╗██╔══██╗██╔════╝    ██╔════╝████╗  ██║██╔══██╗╚══██╔══╝██╔════╝██║  ██║██║     ██║   ██║██║  ██║█████╗      ███████╗██╔██╗ ██║███████║   ██║   ██║     ███████║██║     ██║   ██║██║  ██║██╔══╝      ╚════██║██║╚██╗██║██╔══██║   ██║   ██║     ██╔══██║╚██████╗╚██████╔╝██████╔╝███████╗    ███████║██║ ╚████║██║  ██║   ██║   ╚██████╗██║  ██║ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝    ╚══════╝╚═╝  ╚═══╝╚═╝  ╚═╝   ╚═╝    ╚═════╝╚═╝  ╚═╝

Learn · Earn · Connect

System Design Article

Design a Search Engine

Difficulty: Hard

Design a web-scale search engine that indexes 50B documents and serves 100K queries per second with sub-200ms p99 latency, ranking results by relevance (BM25), authority (PageRank), and personalization. The interview centerpiece is the inverted index sharded across thousands of nodes with scatter-gather query execution, plus the multi-stage ranking pipeline (cheap candidate generation, expensive learned-to-rank rerank). We cover document parsing and tokenization, the offline indexing pipeline (Spark MapReduce), term-partitioned vs document-partitioned sharding, query understanding and expansion, snippet generation, and how to keep the index fresh as the web changes.

Design a Search Engine

Design a web-scale search engine that indexes 50B documents and serves 100K queries per second with sub-200ms p99 latency, ranking results by relevance (BM25), authority (PageRank), and personalization. The interview centerpiece is the inverted index sharded across thousands of nodes with scatter-gather query execution, plus the multi-stage ranking pipeline (cheap candidate generation, expensive learned-to-rank rerank). We cover document parsing and tokenization, the offline indexing pipeline (Spark MapReduce), term-partitioned vs document-partitioned sharding, query understanding and expansion, snippet generation, and how to keep the index fresh as the web changes.

System Design

Hard

design-search-engine

case-study

search-discovery

search-engine

inverted-index

bm25

pagerank

scatter-gather

learned-to-rank

tf-idf

tokenization

near-real-time-indexing

system-design

advanced

premium

516 views

7

This system design article is available for premium members only.

Upgrade to Premium

Back to System Design