Python Generators
py-generators
Code Snippets
Flatten with itertools.chain
`itertools.chain` lazily concatenates several iterables into a single one without copying their elements, which is the right tool for flattening a list of lists by exactly one level. It works on any iterable (lists, tuples, generators, file objects), so it composes cleanly with the rest of the iterator toolbox. This entry covers `chain`, the unpacking-friendly `chain.from_iterable`, and how it differs from a recursive deep flatten.
Combinations and Permutations
When the problem reads 'pick K of N' or 'order all N', the right reflex in Python is `itertools.combinations` or `itertools.permutations`. Both are lazy iterators, so they enumerate huge search spaces without materializing them. This entry walks combinations, permutations, and `combinations_with_replacement`, plus when each is the right tool.
Group Consecutive Items with groupby
`itertools.groupby` collapses runs of equal-keyed items into `(key, group_iterator)` pairs. The catch is that it only groups *consecutive* equal items, so the input must already be sorted by the key if you want full grouping. This snippet covers run-length encoding, the sort-first idiom for dict-like grouping, and the iterator gotcha that bites every newcomer.
Pairwise Iteration
`itertools.pairwise` (Python 3.10+) yields successive overlapping pairs from any iterable. It replaces the classic `zip(seq, seq[1:])` and the `tee` recipe with a single, lazy, memory-flat call. This entry covers the basic pattern, the manual fallback for older Python, and a tiny example: detecting monotonic runs.
Build a Generator Pipeline
A generator pipeline chains small `yield`-based stages so data flows through them one item at a time. The result is constant-memory streaming over inputs that would not fit in RAM, with each stage doing one job (read, parse, filter, transform, sink). This entry shows a three-stage pipeline, how to compose stages dynamically, and why generator pipelines beat list-of-lists processing for log-style data.
Community
Generators, yield, and Lazy Pipelines
Generators turn a 4GB log-processing job into a 50MB one without changing the consumer code. Here is the mental model, the pipeline pattern I reuse, and the four traps that make hand-rolled generators leak.
A stdout Progress Bar Without a Library (Python)
tqdm is wonderful but adds 30k of dependencies for a 30-line job. Here is the pure-stdlib progress bar I drop into ETL scripts when I just want to know how far through the file I am.
When I Stop Reaching for List Comprehensions
I love comprehensions, but I have learned the three cases where they cost more than they save: nested filtering, side effects, and big intermediate lists. Here is the pattern I switch to in each.
List Comprehensions and When to Stop Using Them
Comprehensions are the fastest way to express a simple transform-and-filter, but they decay into write-only code the moment you nest them or sneak side effects in. Here is the line I draw.
Streaming JSONL Parser Without Loading the File
When the file is 8GB you cannot json.load it. Here is the generator-based JSONL reader I ship in every data pipeline, plus the malformed-line policy that has saved me twice.
