Skip to main content
Technology & EngineeringSql228 lines

Ctes Recursive

Common Table Expressions and recursive queries for hierarchical data, graph traversal, and complex query composition

Quick Summary35 lines
You are an expert in SQL Common Table Expressions (CTEs) and recursive queries for composing readable queries and traversing hierarchical or graph-structured data in relational databases.

## Key Points

- The anchor member executes once to seed the result.
- The recursive member executes repeatedly, each iteration seeing only the rows produced by the previous iteration.
- Iteration stops when the recursive member returns zero rows.
- `UNION ALL` preserves duplicates; `UNION` deduplicates (and can act as a cycle guard but is slower).
1. **Use CTEs for readability** — break a 200-line query into named steps that read top-to-bottom like a pipeline.
2. **Always set a depth limit on recursive CTEs** — add `WHERE depth < N` in the recursive member to prevent runaway queries on cyclic or unexpectedly deep data.
3. **Prevent cycles explicitly** — track visited nodes in an array or path string and filter them out in the recursive member.
4. **Choose `UNION ALL` over `UNION`** unless you specifically need deduplication — `UNION` adds a sort/hash step every iteration.
5. **Be aware of materialization** — in PostgreSQL <12, CTEs are always materialized (an optimization fence). Use subqueries or upgrade if the planner needs to push filters through.
6. **Prefer CTEs over nested subqueries** when the same derived table is referenced more than once in the query.
- **Assuming CTE execution order** — the optimizer may reorder or inline CTEs. Do not rely on side effects between CTEs unless using data-modifying CTEs (INSERT/UPDATE/DELETE in WITH).
- **UNION vs UNION ALL in recursive CTEs** — `UNION` deduplicates per iteration, which can silently hide expected rows or mask bugs in your join condition.

## Quick Example

```sql
WITH cte_name AS (
  SELECT ...
)
SELECT * FROM cte_name;
```

```sql
WITH
  step1 AS (SELECT ...),
  step2 AS (SELECT ... FROM step1),
  step3 AS (SELECT ... FROM step2 JOIN other_table ...)
SELECT * FROM step3;
```
skilldb get sql-skills/Ctes RecursiveFull skill: 228 lines
Paste into your CLAUDE.md or agent config

CTEs and Recursive Queries — SQL

You are an expert in SQL Common Table Expressions (CTEs) and recursive queries for composing readable queries and traversing hierarchical or graph-structured data in relational databases.

Overview

A Common Table Expression (CTE) is a named, temporary result set defined with the WITH clause that exists only for the duration of a single statement. CTEs improve readability by breaking complex queries into logical steps. Recursive CTEs extend this by allowing a CTE to reference itself, enabling traversal of trees, graphs, and series generation directly in SQL.

Core Concepts

Basic CTE Syntax

WITH cte_name AS (
  SELECT ...
)
SELECT * FROM cte_name;

Multiple CTEs

WITH
  step1 AS (SELECT ...),
  step2 AS (SELECT ... FROM step1),
  step3 AS (SELECT ... FROM step2 JOIN other_table ...)
SELECT * FROM step3;

Recursive CTE Structure

WITH RECURSIVE cte_name AS (
  -- Anchor member: the starting rows
  SELECT ... FROM base_table WHERE ...

  UNION ALL

  -- Recursive member: references cte_name
  SELECT ... FROM base_table JOIN cte_name ON ...
)
SELECT * FROM cte_name;

Key rules:

  • The anchor member executes once to seed the result.
  • The recursive member executes repeatedly, each iteration seeing only the rows produced by the previous iteration.
  • Iteration stops when the recursive member returns zero rows.
  • UNION ALL preserves duplicates; UNION deduplicates (and can act as a cycle guard but is slower).

Implementation Patterns

Organizational Hierarchy (Tree Traversal)

WITH RECURSIVE org_tree AS (
  -- Anchor: top-level managers
  SELECT
    id,
    name,
    manager_id,
    1 AS depth,
    name::text AS path
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recurse into direct reports
  SELECT
    e.id,
    e.name,
    e.manager_id,
    t.depth + 1,
    t.path || ' > ' || e.name
  FROM employees e
  JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree ORDER BY path;

Bill of Materials (Part Explosion)

WITH RECURSIVE bom AS (
  SELECT
    part_id,
    component_id,
    quantity,
    1 AS level
  FROM assemblies
  WHERE part_id = 1001

  UNION ALL

  SELECT
    a.part_id,
    a.component_id,
    a.quantity * b.quantity,
    b.level + 1
  FROM assemblies a
  JOIN bom b ON a.part_id = b.component_id
)
SELECT component_id, SUM(quantity) AS total_needed
FROM bom
GROUP BY component_id;

Generating a Date Series

-- Portable alternative to generate_series
WITH RECURSIVE dates AS (
  SELECT DATE '2024-01-01' AS d
  UNION ALL
  SELECT d + INTERVAL '1 day'
  FROM dates
  WHERE d < DATE '2024-12-31'
)
SELECT d FROM dates;

Graph Shortest Path (BFS)

WITH RECURSIVE paths AS (
  SELECT
    target_node AS node,
    ARRAY[source_node, target_node] AS path,
    1 AS hops
  FROM edges
  WHERE source_node = 'A'

  UNION ALL

  SELECT
    e.target_node,
    p.path || e.target_node,
    p.hops + 1
  FROM edges e
  JOIN paths p ON e.source_node = p.node
  WHERE e.target_node <> ALL(p.path)  -- cycle prevention
    AND p.hops < 10                    -- depth limit
)
SELECT DISTINCT ON (node) node, path, hops
FROM paths
ORDER BY node, hops;

Materialized vs Inline CTEs

PostgreSQL 12+ lets you control materialization:

-- Force materialization (result cached, used multiple times)
WITH cached AS MATERIALIZED (
  SELECT expensive_function(id) AS val FROM big_table
)
SELECT * FROM cached WHERE val > 100;

-- Force inlining (optimizer can push predicates in)
WITH inlined AS NOT MATERIALIZED (
  SELECT * FROM big_table
)
SELECT * FROM inlined WHERE id = 42;

CTE for Readable Upserts

WITH new_data AS (
  SELECT unnest(ARRAY[1,2,3]) AS id,
         unnest(ARRAY['a','b','c']) AS val
),
updated AS (
  UPDATE target t
  SET val = n.val
  FROM new_data n
  WHERE t.id = n.id
  RETURNING t.id
)
INSERT INTO target (id, val)
SELECT id, val FROM new_data
WHERE id NOT IN (SELECT id FROM updated);

Core Philosophy

CTEs transform SQL from a language of nested expressions into a language of named, sequential steps. A 200-line query with five levels of subquery nesting is nearly impossible to debug or modify safely. The same logic expressed as a chain of CTEs reads top-to-bottom like a data pipeline: each step has a name, a clear input, and a clear output. This readability is not cosmetic — it directly reduces the probability of logic errors and makes code review meaningful.

Recursive CTEs unlock an entire category of problems that would otherwise require application-level loops or denormalized data structures. Hierarchical data (org charts, category trees, bill-of-materials), graph traversal (shortest path, dependency resolution), and series generation (date ranges, sequence fills) can all be expressed in a single SQL statement. The key insight is the anchor-plus-recursive structure: the anchor seeds the result, and the recursive member extends it one level at a time until no new rows are produced.

The power of recursive CTEs comes with genuine danger. A missing termination condition or a cycle in the data produces infinite recursion, which will either exhaust memory or run until the engine's hard limit kicks in. Every recursive CTE must have an explicit depth guard (WHERE depth < N) and, for graph data, an explicit cycle guard (tracking visited nodes in an array or path). These are not optional safety margins — they are required for correctness.

Anti-Patterns

  • Recursive CTEs without depth limits — omitting a WHERE depth < N condition in the recursive member means a single unexpected cycle or deeper-than-expected tree will run the query until the database kills it. Always set an explicit maximum.

  • Using UNION instead of UNION ALL without understanding the costUNION deduplicates on every iteration, adding a sort or hash step that can dramatically slow execution. Use UNION ALL by default and only switch to UNION when deduplication is specifically needed for cycle prevention.

  • Treating CTEs as performance optimizations — in PostgreSQL versions before 12, CTEs are always materialized, creating an optimization fence that prevents the planner from pushing predicates through. A CTE that filters a large table may force a full scan. Check EXPLAIN output and consider subqueries or NOT MATERIALIZED when performance matters.

  • Deeply nested CTE chains without intermediate validation — chaining ten CTEs where each depends on the last creates a query that is difficult to debug when results are wrong. Develop and validate each step independently before composing them.

  • Using recursive CTEs for problems that have set-based solutions — generating a sequence of dates or numbers is a valid use case, but recursive CTEs should not replace generate_series (PostgreSQL) or calendar tables when those are available and more efficient.

Best Practices

  1. Use CTEs for readability — break a 200-line query into named steps that read top-to-bottom like a pipeline.
  2. Always set a depth limit on recursive CTEs — add WHERE depth < N in the recursive member to prevent runaway queries on cyclic or unexpectedly deep data.
  3. Prevent cycles explicitly — track visited nodes in an array or path string and filter them out in the recursive member.
  4. Choose UNION ALL over UNION unless you specifically need deduplication — UNION adds a sort/hash step every iteration.
  5. Be aware of materialization — in PostgreSQL <12, CTEs are always materialized (an optimization fence). Use subqueries or upgrade if the planner needs to push filters through.
  6. Prefer CTEs over nested subqueries when the same derived table is referenced more than once in the query.

Common Pitfalls

  • Infinite recursion — forgetting a termination condition or cycle guard. PostgreSQL has max_recursive_iterations (default unlimited); MySQL limits to cte_max_recursion_depth (default 1000). Always add your own guard.
  • Assuming CTE execution order — the optimizer may reorder or inline CTEs. Do not rely on side effects between CTEs unless using data-modifying CTEs (INSERT/UPDATE/DELETE in WITH).
  • Performance surprises from materialization — a CTE that filters a billion-row table without an index-friendly predicate forces a full scan if materialized. Check EXPLAIN and consider NOT MATERIALIZED or a subquery.
  • UNION vs UNION ALL in recursive CTEsUNION deduplicates per iteration, which can silently hide expected rows or mask bugs in your join condition.
  • Column type mismatches — the anchor member defines column types. If the recursive member produces wider types (e.g., longer strings), you may get truncation errors. Cast explicitly in the anchor.

Install this skill directly: skilldb add sql-skills

Get CLI access →