Data-Intensive Systems

0 lessons

4 system designs

data-intensive-systems

System Design

4 articles

System Design

Premium

Batch vs Stream Processing (Lambda/Kappa)

Batch processing computes results over a finite, bounded dataset. Stream processing computes results continuously over an unbounded, ever-arriving dataset. The two paradigms have different latency, cost, correctness, and operational profiles, and choosing wrong is one of the most expensive architectural mistakes a senior engineer can make. This lesson covers the mental model (bounded vs unbounded data, event time vs processing time, watermarks, windows), the two classical reference architectures (Lambda and Kappa), the modern unified models (Beam, Flink), and the production realities of exactly-once semantics, late data, replays, and operational complexity. The goal is to leave you able to choose batch, streaming, or a hybrid for any system, and to defend the choice in an interview.

stream-processing

batch-processing

lambda-architecture

kappa-architecture

system-design

advanced

premium

data-intensive-systems

449

Hard

System Design

Premium

ML System Design (Feature Store, Model Serving)

An ML system in production is mostly a data system with a model in the middle. The model is the smallest, most-discussed, and least-troublesome part. The hard parts are training data pipelines, feature freshness and parity between training and serving, the feature store that enforces that parity, model deployment and rollback, online and offline evaluation, and the operational concern that the model silently degrades as the world drifts. This lesson covers the canonical reference architecture: training pipeline, feature store with online and offline halves, model registry, serving infrastructure, monitoring, and the feedback loop. It is the senior-level mental model for designing 'add ML to product X' without falling into the standard traps.

ml-system-design

feature-store

model-serving

mlops

system-design

advanced

premium

data-intensive-systems

Hard

System Design

Premium

Recommendation Systems Architecture

A recommendation system at scale is a multi-stage funnel: candidate generation narrows millions of items to a few thousand, light ranking trims to a few hundred, heavy ranking scores those, and a re-ranking stage applies business and policy constraints. Each stage has a different latency budget, a different model, and a different operational profile. This lesson covers the canonical architecture (retrieval + ranking + re-ranking), the core algorithmic families (collaborative filtering, content-based, two-tower neural retrieval, sequential models), the embedding store and vector ANN serving stack, the cold-start problem, ranking objectives and the metrics that measure them, and the rollout / monitoring discipline that keeps the system honest. The goal is to leave you able to design the recommendation system for any consumer product and defend every layer's choices.

recommendation-systems

ranking

embedding

vector-search

system-design

advanced

premium

data-intensive-systems

773

Hard

System Design

Premium

Search Indexing at Scale (Elasticsearch)

Search at scale is two systems in one: an indexing pipeline that ingests, transforms, and stores documents into an inverted index (and increasingly a vector index), and a query path that distributes searches across shards, scores results, and merges them under tight latency budgets. Elasticsearch and OpenSearch are the dominant production engines, and almost every large product runs one. This lesson covers the architecture: how Lucene segments and inverted indexes work, how Elasticsearch shards and replicates them, the tokenization and analyzer pipeline that determines what 'matches' mean, the query coordinator -> shard fan-out -> merge flow, hybrid search (lexical + vector), reindexing strategies, and the operational realities (hot shards, mapping explosions, garbage collection pauses, write amplification). The goal is to leave you able to design and operate search for any catalog from a million to billions of documents.

elasticsearch

lucene

inverted-index

system-design

advanced

premium

data-intensive-systems

295

Hard