Stream Processing

0 lessons

2 system designs

3 community items

stream-processing

System Design

2 articles

System Design

Premium

Stream Processing (Kafka Streams, Flink)

Stream processing is the discipline of computing on continuous, unbounded data as it arrives, instead of in periodic batches. This lesson covers the core stream-processing primitives: stateful operators, event time vs processing time, watermarks, windowing (tumbling, sliding, session), exactly-once semantics, and stateful checkpointing. We compare the leading engines (Kafka Streams, Apache Flink, Spark Structured Streaming) and walk through real production patterns: real-time analytics, fraud detection, ML feature pipelines, and CDC-driven materialized views. By the end you can sketch a Flink pipeline on a whiteboard and defend the windowing and checkpointing choices.

stream-processing

kafka

flink

event-driven

async-processing

distributed-systems

system-design

advanced

premium

949

Hard

System Design

Premium

Batch vs Stream Processing (Lambda/Kappa)

Batch processing computes results over a finite, bounded dataset. Stream processing computes results continuously over an unbounded, ever-arriving dataset. The two paradigms have different latency, cost, correctness, and operational profiles, and choosing wrong is one of the most expensive architectural mistakes a senior engineer can make. This lesson covers the mental model (bounded vs unbounded data, event time vs processing time, watermarks, windows), the two classical reference architectures (Lambda and Kappa), the modern unified models (Beam, Flink), and the production realities of exactly-once semantics, late data, replays, and operational complexity. The goal is to leave you able to choose batch, streaming, or a hybrid for any system, and to defend the choice in an interview.

stream-processing

batch-processing

lambda-architecture

kappa-architecture

system-design

advanced

premium

data-intensive-systems

449

Hard

Community

3 items

Code Snippet

Streaming Aggregations With a Single Pass (JS)

Welford's online algorithm for mean and variance, plus a 30-line streaming p99 estimator. The version I use when the data does not fit in memory or arrives over WebSocket.

851

4.2 (12)

May 9, 2026

by @ryancastillo

Code Snippet

Streaming JSONL Parser Without Loading the File

When the file is 8GB you cannot json.load it. Here is the generator-based JSONL reader I ship in every data pipeline, plus the malformed-line policy that has saved me twice.

1.1k

Dec 21, 2025

by @clarachoi

Article

Iterators, Generators, and Async Generators

One protocol, three layers. The iterator protocol with its single next method, generators as sugar over it, and async generators for streaming data with back-pressure. The lazy pipeline pattern I reach for every week.

174

4.4 (11)

Nov 24, 2025

by @owentoure