Generators
Generator and itertools patterns for memory-efficient data processing in Python
You are an expert in Python generators and itertools for writing memory-efficient, lazy data processing pipelines. ## Key Points - **Generator functions** contain `yield` and return a generator iterator when called. - **Generator expressions** (`(x for x in items)`) are the lazy counterpart to list comprehensions. - **`yield from`** delegates to a sub-generator, flattening nested iteration. - **`send()` and `throw()`** enable two-way communication with a running generator (coroutine style). - **`itertools`** provides optimized building blocks: `chain`, `islice`, `groupby`, `product`, `combinations`, etc. - **Generators are single-pass** — once exhausted, they cannot be restarted. - Use generator expressions for simple transformations; use generator functions when logic requires multiple yields or state. - Build processing as composable generator pipelines — each stage does one transformation. - Use `itertools.islice` to take finite slices of infinite generators; never call `list()` on an unbounded generator. - Prefer `yield from` over manual loops when delegating to sub-iterators — it is cleaner and faster. - Use `more-itertools` (third-party) for additional recipes like `chunked`, `flatten`, `peekable`, `unique_everseen`. - Wrap file-reading generators with resource management (open inside the generator, close on return/exception).
skilldb get python-patterns-skills/GeneratorsFull skill: 208 linesGenerators — Python Patterns
You are an expert in Python generators and itertools for writing memory-efficient, lazy data processing pipelines.
Overview
Generators produce items one at a time via yield, enabling lazy evaluation of sequences that may be large or infinite. Combined with itertools and generator expressions, they form the backbone of memory-efficient data processing in Python. Generator pipelines compose naturally, processing one element at a time through multiple transformation stages without materializing intermediate collections.
Core Philosophy
Generators represent Python's answer to the fundamental tension between expressiveness and efficiency. A list comprehension is clear and Pythonic, but it materializes every element in memory at once. A generator expression is equally clear but produces elements one at a time, on demand. This lazy evaluation model means you can express operations on datasets larger than available memory, infinite sequences, and streaming data using the same familiar iteration patterns.
The pipeline composition model is where generators truly shine. Each generator in a pipeline is a single transformation stage: read lines, filter blanks, parse fields, validate records. Data flows through the pipeline one element at a time, and the entire chain uses constant memory regardless of input size. This is the Unix pipe philosophy applied to Python: small, composable stages connected by a universal interface (iteration).
Generators also encode an important discipline about separation of concerns. The producer (the generator function) decides what to yield and when; the consumer (the for loop or downstream generator) decides how much to take and when to stop. This decoupling means producers do not need to know whether the consumer wants ten items or ten million, and consumers do not need to know whether the producer reads from a file, a network socket, or an algorithm.
Core Concepts
- Generator functions contain
yieldand return a generator iterator when called. - Generator expressions (
(x for x in items)) are the lazy counterpart to list comprehensions. yield fromdelegates to a sub-generator, flattening nested iteration.send()andthrow()enable two-way communication with a running generator (coroutine style).itertoolsprovides optimized building blocks:chain,islice,groupby,product,combinations, etc.- Generators are single-pass — once exhausted, they cannot be restarted.
Implementation Patterns
Basic generator function
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take first 10 Fibonacci numbers
from itertools import islice
print(list(islice(fibonacci(), 10)))
Generator pipeline
def read_lines(path: str):
with open(path) as f:
for line in f:
yield line.strip()
def filter_nonempty(lines):
for line in lines:
if line:
yield line
def parse_csv_row(lines):
for line in lines:
yield line.split(",")
# Compose: reads one line at a time through the entire pipeline
rows = parse_csv_row(filter_nonempty(read_lines("data.csv")))
for row in rows:
process(row)
yield from for delegation
def flatten(nested):
for item in nested:
if isinstance(item, (list, tuple)):
yield from flatten(item)
else:
yield item
data = [1, [2, 3, [4, 5]], 6, [7]]
print(list(flatten(data))) # [1, 2, 3, 4, 5, 6, 7]
Generator with send() for coroutine pattern
def running_average():
total = 0.0
count = 0
average = None
while True:
value = yield average
total += value
count += 1
average = total / count
avg = running_average()
next(avg) # prime the generator
print(avg.send(10)) # 10.0
print(avg.send(20)) # 15.0
print(avg.send(30)) # 20.0
Chunking with itertools
from itertools import islice
def chunked(iterable, size):
it = iter(iterable)
while True:
chunk = list(islice(it, size))
if not chunk:
break
yield chunk
for batch in chunked(range(25), 10):
process_batch(batch)
# [0..9], [10..19], [20..24]
itertools recipes
from itertools import chain, groupby, product, accumulate, takewhile, dropwhile
# Flatten one level
flat = chain.from_iterable([[1, 2], [3, 4], [5]])
# [1, 2, 3, 4, 5]
# Group consecutive elements
data = [("a", 1), ("a", 2), ("b", 3), ("b", 4)]
for key, group in groupby(data, key=lambda x: x[0]):
print(key, list(group))
# Cartesian product
for x, y in product([1, 2], ["a", "b"]):
print(x, y) # (1,"a"), (1,"b"), (2,"a"), (2,"b")
# Running totals
print(list(accumulate([1, 2, 3, 4]))) # [1, 3, 6, 10]
# Take/drop while condition
print(list(takewhile(lambda x: x < 5, [1, 3, 5, 2]))) # [1, 3]
Context manager generator for cleanup
def batch_processor(items, batch_size=100):
"""Yield batches, flush remaining on exit."""
batch = []
for item in items:
batch.append(item)
if len(batch) >= batch_size:
yield batch
batch = []
if batch:
yield batch
Infinite generators with sentinel
import random
def random_walk():
position = 0
while True:
step = random.choice([-1, 1])
position += step
yield position
# Use takewhile or islice to consume finite portions
from itertools import takewhile
steps = list(takewhile(lambda p: abs(p) < 10, random_walk()))
Best Practices
- Use generator expressions for simple transformations; use generator functions when logic requires multiple yields or state.
- Build processing as composable generator pipelines — each stage does one transformation.
- Use
itertools.isliceto take finite slices of infinite generators; never calllist()on an unbounded generator. - Prefer
yield fromover manual loops when delegating to sub-iterators — it is cleaner and faster. - Use
more-itertools(third-party) for additional recipes likechunked,flatten,peekable,unique_everseen. - Wrap file-reading generators with resource management (open inside the generator, close on return/exception).
Common Pitfalls
- Calling
list()on large/infinite generators exhausts memory — take only what you need withislice. - Generators are single-use — iterating a second time yields nothing; create a new generator or use
itertools.tee(with caution). itertools.teememory trap — if one copy advances far ahead,teebuffers all skipped items in memory.groupbyrequires sorted input — it groups consecutive equal elements, not all equal elements globally.- Forgetting to prime
send()-based generators — you must callnext(gen)before the firstsend(). - Generator cleanup — if you break out of a
forloop over a generator,GeneratorExitis thrown; ensurefinallyblocks handle cleanup.
Anti-Patterns
-
Materializing everything with list() — wrapping generators in
list()out of habit or convenience defeats the entire purpose of lazy evaluation. If you need random access or multiple passes, a list is appropriate; if you are just iterating once, keep it lazy. -
Generator with side effects on every yield — writing a generator that sends emails, writes to a database, or mutates global state each time it yields an item. This couples iteration with side effects, making it impossible to partially consume or retry the generator without duplicating those effects.
-
Using send() when a simple parameter would do — reaching for the
send()/coroutine protocol to feed data into a generator when a regular function argument or class with state would be simpler and more readable. Reservesend()for genuine coroutine patterns where bidirectional communication is essential. -
Infinite generator without a termination strategy — creating an infinite generator and expecting consumers to always remember to use
isliceortakewhile. Provide a built-in limit parameter or document prominently that the generator is unbounded, so callers do not accidentally loop forever. -
Chaining too many generators without debugging hooks — building a ten-stage generator pipeline where an error in stage seven produces a cryptic traceback with no indication of which stage or which input element caused the problem. Add logging or intermediate materialization points during development to maintain debuggability.
Install this skill directly: skilldb add python-patterns-skills
Related Skills
Async Patterns
Asyncio patterns for concurrent I/O-bound programming in Python
Context Managers
Context manager patterns using with statements for reliable resource management in Python
Dataclasses
Dataclass and Pydantic model patterns for structured data in Python
Decorators
Decorator patterns for wrapping, extending, and composing Python functions and classes
Dependency Injection
Dependency injection patterns for loosely coupled, testable Python applications
Metaclasses
Metaclass and descriptor patterns for advanced class customization in Python