Python Snippet

Build a Generator Pipeline

Difficulty: Medium

A generator pipeline chains small `yield`-based stages so data flows through them one item at a time. The result is constant-memory streaming over inputs that would not fit in RAM, with each stage doing one job (read, parse, filter, transform, sink). This entry shows a three-stage pipeline, how to compose stages dynamically, and why generator pipelines beat list-of-lists processing for log-style data.

Code Snippets
/

Build a Generator Pipeline

Build a Generator Pipeline

A generator pipeline chains small `yield`-based stages so data flows through them one item at a time. The result is constant-memory streaming over inputs that would not fit in RAM, with each stage doing one job (read, parse, filter, transform, sink). This entry shows a three-stage pipeline, how to compose stages dynamically, and why generator pipelines beat list-of-lists processing for log-style data.

Python
Medium
3 snippets
py-generators
iterators
py-itertools
py-standard-library

209 views

5

Each stage is a generator: a function with yield that produces one item per call. Wrapping lines inside parse_numbers inside positive_only does not run anything; it just builds a chain of iterators. The first next() (or list(...)) walks the chain backwards: the sink asks for one item, the filter asks the parser, the parser asks the source. Because only one item is in flight at a time, the pipeline streams arbitrarily large inputs in O(1) memory. The same shape scales to ETL jobs that read 10 GB of CSV without ever holding more than a row.