Skip to main content
Technology & EngineeringPython Patterns208 lines

Generators

Generator and itertools patterns for memory-efficient data processing in Python

Quick Summary18 lines
You are an expert in Python generators and itertools for writing memory-efficient, lazy data processing pipelines.

## Key Points

- **Generator functions** contain `yield` and return a generator iterator when called.
- **Generator expressions** (`(x for x in items)`) are the lazy counterpart to list comprehensions.
- **`yield from`** delegates to a sub-generator, flattening nested iteration.
- **`send()` and `throw()`** enable two-way communication with a running generator (coroutine style).
- **`itertools`** provides optimized building blocks: `chain`, `islice`, `groupby`, `product`, `combinations`, etc.
- **Generators are single-pass** — once exhausted, they cannot be restarted.
- Use generator expressions for simple transformations; use generator functions when logic requires multiple yields or state.
- Build processing as composable generator pipelines — each stage does one transformation.
- Use `itertools.islice` to take finite slices of infinite generators; never call `list()` on an unbounded generator.
- Prefer `yield from` over manual loops when delegating to sub-iterators — it is cleaner and faster.
- Use `more-itertools` (third-party) for additional recipes like `chunked`, `flatten`, `peekable`, `unique_everseen`.
- Wrap file-reading generators with resource management (open inside the generator, close on return/exception).
skilldb get python-patterns-skills/GeneratorsFull skill: 208 lines
Paste into your CLAUDE.md or agent config

Generators — Python Patterns

You are an expert in Python generators and itertools for writing memory-efficient, lazy data processing pipelines.

Overview

Generators produce items one at a time via yield, enabling lazy evaluation of sequences that may be large or infinite. Combined with itertools and generator expressions, they form the backbone of memory-efficient data processing in Python. Generator pipelines compose naturally, processing one element at a time through multiple transformation stages without materializing intermediate collections.

Core Philosophy

Generators represent Python's answer to the fundamental tension between expressiveness and efficiency. A list comprehension is clear and Pythonic, but it materializes every element in memory at once. A generator expression is equally clear but produces elements one at a time, on demand. This lazy evaluation model means you can express operations on datasets larger than available memory, infinite sequences, and streaming data using the same familiar iteration patterns.

The pipeline composition model is where generators truly shine. Each generator in a pipeline is a single transformation stage: read lines, filter blanks, parse fields, validate records. Data flows through the pipeline one element at a time, and the entire chain uses constant memory regardless of input size. This is the Unix pipe philosophy applied to Python: small, composable stages connected by a universal interface (iteration).

Generators also encode an important discipline about separation of concerns. The producer (the generator function) decides what to yield and when; the consumer (the for loop or downstream generator) decides how much to take and when to stop. This decoupling means producers do not need to know whether the consumer wants ten items or ten million, and consumers do not need to know whether the producer reads from a file, a network socket, or an algorithm.

Core Concepts

  • Generator functions contain yield and return a generator iterator when called.
  • Generator expressions ((x for x in items)) are the lazy counterpart to list comprehensions.
  • yield from delegates to a sub-generator, flattening nested iteration.
  • send() and throw() enable two-way communication with a running generator (coroutine style).
  • itertools provides optimized building blocks: chain, islice, groupby, product, combinations, etc.
  • Generators are single-pass — once exhausted, they cannot be restarted.

Implementation Patterns

Basic generator function

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Take first 10 Fibonacci numbers
from itertools import islice
print(list(islice(fibonacci(), 10)))

Generator pipeline

def read_lines(path: str):
    with open(path) as f:
        for line in f:
            yield line.strip()

def filter_nonempty(lines):
    for line in lines:
        if line:
            yield line

def parse_csv_row(lines):
    for line in lines:
        yield line.split(",")

# Compose: reads one line at a time through the entire pipeline
rows = parse_csv_row(filter_nonempty(read_lines("data.csv")))
for row in rows:
    process(row)

yield from for delegation

def flatten(nested):
    for item in nested:
        if isinstance(item, (list, tuple)):
            yield from flatten(item)
        else:
            yield item

data = [1, [2, 3, [4, 5]], 6, [7]]
print(list(flatten(data)))  # [1, 2, 3, 4, 5, 6, 7]

Generator with send() for coroutine pattern

def running_average():
    total = 0.0
    count = 0
    average = None
    while True:
        value = yield average
        total += value
        count += 1
        average = total / count

avg = running_average()
next(avg)          # prime the generator
print(avg.send(10))  # 10.0
print(avg.send(20))  # 15.0
print(avg.send(30))  # 20.0

Chunking with itertools

from itertools import islice

def chunked(iterable, size):
    it = iter(iterable)
    while True:
        chunk = list(islice(it, size))
        if not chunk:
            break
        yield chunk

for batch in chunked(range(25), 10):
    process_batch(batch)
# [0..9], [10..19], [20..24]

itertools recipes

from itertools import chain, groupby, product, accumulate, takewhile, dropwhile

# Flatten one level
flat = chain.from_iterable([[1, 2], [3, 4], [5]])
# [1, 2, 3, 4, 5]

# Group consecutive elements
data = [("a", 1), ("a", 2), ("b", 3), ("b", 4)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))

# Cartesian product
for x, y in product([1, 2], ["a", "b"]):
    print(x, y)  # (1,"a"), (1,"b"), (2,"a"), (2,"b")

# Running totals
print(list(accumulate([1, 2, 3, 4])))  # [1, 3, 6, 10]

# Take/drop while condition
print(list(takewhile(lambda x: x < 5, [1, 3, 5, 2])))  # [1, 3]

Context manager generator for cleanup

def batch_processor(items, batch_size=100):
    """Yield batches, flush remaining on exit."""
    batch = []
    for item in items:
        batch.append(item)
        if len(batch) >= batch_size:
            yield batch
            batch = []
    if batch:
        yield batch

Infinite generators with sentinel

import random

def random_walk():
    position = 0
    while True:
        step = random.choice([-1, 1])
        position += step
        yield position

# Use takewhile or islice to consume finite portions
from itertools import takewhile
steps = list(takewhile(lambda p: abs(p) < 10, random_walk()))

Best Practices

  • Use generator expressions for simple transformations; use generator functions when logic requires multiple yields or state.
  • Build processing as composable generator pipelines — each stage does one transformation.
  • Use itertools.islice to take finite slices of infinite generators; never call list() on an unbounded generator.
  • Prefer yield from over manual loops when delegating to sub-iterators — it is cleaner and faster.
  • Use more-itertools (third-party) for additional recipes like chunked, flatten, peekable, unique_everseen.
  • Wrap file-reading generators with resource management (open inside the generator, close on return/exception).

Common Pitfalls

  • Calling list() on large/infinite generators exhausts memory — take only what you need with islice.
  • Generators are single-use — iterating a second time yields nothing; create a new generator or use itertools.tee (with caution).
  • itertools.tee memory trap — if one copy advances far ahead, tee buffers all skipped items in memory.
  • groupby requires sorted input — it groups consecutive equal elements, not all equal elements globally.
  • Forgetting to prime send()-based generators — you must call next(gen) before the first send().
  • Generator cleanup — if you break out of a for loop over a generator, GeneratorExit is thrown; ensure finally blocks handle cleanup.

Anti-Patterns

  • Materializing everything with list() — wrapping generators in list() out of habit or convenience defeats the entire purpose of lazy evaluation. If you need random access or multiple passes, a list is appropriate; if you are just iterating once, keep it lazy.

  • Generator with side effects on every yield — writing a generator that sends emails, writes to a database, or mutates global state each time it yields an item. This couples iteration with side effects, making it impossible to partially consume or retry the generator without duplicating those effects.

  • Using send() when a simple parameter would do — reaching for the send()/coroutine protocol to feed data into a generator when a regular function argument or class with state would be simpler and more readable. Reserve send() for genuine coroutine patterns where bidirectional communication is essential.

  • Infinite generator without a termination strategy — creating an infinite generator and expecting consumers to always remember to use islice or takewhile. Provide a built-in limit parameter or document prominently that the generator is unbounded, so callers do not accidentally loop forever.

  • Chaining too many generators without debugging hooks — building a ten-stage generator pipeline where an error in stage seven produces a cryptic traceback with no indication of which stage or which input element caused the problem. Add logging or intermediate materialization points during development to maintain debuggability.

Install this skill directly: skilldb add python-patterns-skills

Get CLI access →