Ctes Recursive
Common Table Expressions and recursive queries for hierarchical data, graph traversal, and complex query composition
You are an expert in SQL Common Table Expressions (CTEs) and recursive queries for composing readable queries and traversing hierarchical or graph-structured data in relational databases. ## Key Points - The anchor member executes once to seed the result. - The recursive member executes repeatedly, each iteration seeing only the rows produced by the previous iteration. - Iteration stops when the recursive member returns zero rows. - `UNION ALL` preserves duplicates; `UNION` deduplicates (and can act as a cycle guard but is slower). 1. **Use CTEs for readability** — break a 200-line query into named steps that read top-to-bottom like a pipeline. 2. **Always set a depth limit on recursive CTEs** — add `WHERE depth < N` in the recursive member to prevent runaway queries on cyclic or unexpectedly deep data. 3. **Prevent cycles explicitly** — track visited nodes in an array or path string and filter them out in the recursive member. 4. **Choose `UNION ALL` over `UNION`** unless you specifically need deduplication — `UNION` adds a sort/hash step every iteration. 5. **Be aware of materialization** — in PostgreSQL <12, CTEs are always materialized (an optimization fence). Use subqueries or upgrade if the planner needs to push filters through. 6. **Prefer CTEs over nested subqueries** when the same derived table is referenced more than once in the query. - **Assuming CTE execution order** — the optimizer may reorder or inline CTEs. Do not rely on side effects between CTEs unless using data-modifying CTEs (INSERT/UPDATE/DELETE in WITH). - **UNION vs UNION ALL in recursive CTEs** — `UNION` deduplicates per iteration, which can silently hide expected rows or mask bugs in your join condition. ## Quick Example ```sql WITH cte_name AS ( SELECT ... ) SELECT * FROM cte_name; ``` ```sql WITH step1 AS (SELECT ...), step2 AS (SELECT ... FROM step1), step3 AS (SELECT ... FROM step2 JOIN other_table ...) SELECT * FROM step3; ```
skilldb get sql-skills/Ctes RecursiveFull skill: 228 linesCTEs and Recursive Queries — SQL
You are an expert in SQL Common Table Expressions (CTEs) and recursive queries for composing readable queries and traversing hierarchical or graph-structured data in relational databases.
Overview
A Common Table Expression (CTE) is a named, temporary result set defined with the WITH clause that exists only for the duration of a single statement. CTEs improve readability by breaking complex queries into logical steps. Recursive CTEs extend this by allowing a CTE to reference itself, enabling traversal of trees, graphs, and series generation directly in SQL.
Core Concepts
Basic CTE Syntax
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;
Multiple CTEs
WITH
step1 AS (SELECT ...),
step2 AS (SELECT ... FROM step1),
step3 AS (SELECT ... FROM step2 JOIN other_table ...)
SELECT * FROM step3;
Recursive CTE Structure
WITH RECURSIVE cte_name AS (
-- Anchor member: the starting rows
SELECT ... FROM base_table WHERE ...
UNION ALL
-- Recursive member: references cte_name
SELECT ... FROM base_table JOIN cte_name ON ...
)
SELECT * FROM cte_name;
Key rules:
- The anchor member executes once to seed the result.
- The recursive member executes repeatedly, each iteration seeing only the rows produced by the previous iteration.
- Iteration stops when the recursive member returns zero rows.
UNION ALLpreserves duplicates;UNIONdeduplicates (and can act as a cycle guard but is slower).
Implementation Patterns
Organizational Hierarchy (Tree Traversal)
WITH RECURSIVE org_tree AS (
-- Anchor: top-level managers
SELECT
id,
name,
manager_id,
1 AS depth,
name::text AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recurse into direct reports
SELECT
e.id,
e.name,
e.manager_id,
t.depth + 1,
t.path || ' > ' || e.name
FROM employees e
JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree ORDER BY path;
Bill of Materials (Part Explosion)
WITH RECURSIVE bom AS (
SELECT
part_id,
component_id,
quantity,
1 AS level
FROM assemblies
WHERE part_id = 1001
UNION ALL
SELECT
a.part_id,
a.component_id,
a.quantity * b.quantity,
b.level + 1
FROM assemblies a
JOIN bom b ON a.part_id = b.component_id
)
SELECT component_id, SUM(quantity) AS total_needed
FROM bom
GROUP BY component_id;
Generating a Date Series
-- Portable alternative to generate_series
WITH RECURSIVE dates AS (
SELECT DATE '2024-01-01' AS d
UNION ALL
SELECT d + INTERVAL '1 day'
FROM dates
WHERE d < DATE '2024-12-31'
)
SELECT d FROM dates;
Graph Shortest Path (BFS)
WITH RECURSIVE paths AS (
SELECT
target_node AS node,
ARRAY[source_node, target_node] AS path,
1 AS hops
FROM edges
WHERE source_node = 'A'
UNION ALL
SELECT
e.target_node,
p.path || e.target_node,
p.hops + 1
FROM edges e
JOIN paths p ON e.source_node = p.node
WHERE e.target_node <> ALL(p.path) -- cycle prevention
AND p.hops < 10 -- depth limit
)
SELECT DISTINCT ON (node) node, path, hops
FROM paths
ORDER BY node, hops;
Materialized vs Inline CTEs
PostgreSQL 12+ lets you control materialization:
-- Force materialization (result cached, used multiple times)
WITH cached AS MATERIALIZED (
SELECT expensive_function(id) AS val FROM big_table
)
SELECT * FROM cached WHERE val > 100;
-- Force inlining (optimizer can push predicates in)
WITH inlined AS NOT MATERIALIZED (
SELECT * FROM big_table
)
SELECT * FROM inlined WHERE id = 42;
CTE for Readable Upserts
WITH new_data AS (
SELECT unnest(ARRAY[1,2,3]) AS id,
unnest(ARRAY['a','b','c']) AS val
),
updated AS (
UPDATE target t
SET val = n.val
FROM new_data n
WHERE t.id = n.id
RETURNING t.id
)
INSERT INTO target (id, val)
SELECT id, val FROM new_data
WHERE id NOT IN (SELECT id FROM updated);
Core Philosophy
CTEs transform SQL from a language of nested expressions into a language of named, sequential steps. A 200-line query with five levels of subquery nesting is nearly impossible to debug or modify safely. The same logic expressed as a chain of CTEs reads top-to-bottom like a data pipeline: each step has a name, a clear input, and a clear output. This readability is not cosmetic — it directly reduces the probability of logic errors and makes code review meaningful.
Recursive CTEs unlock an entire category of problems that would otherwise require application-level loops or denormalized data structures. Hierarchical data (org charts, category trees, bill-of-materials), graph traversal (shortest path, dependency resolution), and series generation (date ranges, sequence fills) can all be expressed in a single SQL statement. The key insight is the anchor-plus-recursive structure: the anchor seeds the result, and the recursive member extends it one level at a time until no new rows are produced.
The power of recursive CTEs comes with genuine danger. A missing termination condition or a cycle in the data produces infinite recursion, which will either exhaust memory or run until the engine's hard limit kicks in. Every recursive CTE must have an explicit depth guard (WHERE depth < N) and, for graph data, an explicit cycle guard (tracking visited nodes in an array or path). These are not optional safety margins — they are required for correctness.
Anti-Patterns
-
Recursive CTEs without depth limits — omitting a
WHERE depth < Ncondition in the recursive member means a single unexpected cycle or deeper-than-expected tree will run the query until the database kills it. Always set an explicit maximum. -
Using UNION instead of UNION ALL without understanding the cost —
UNIONdeduplicates on every iteration, adding a sort or hash step that can dramatically slow execution. UseUNION ALLby default and only switch toUNIONwhen deduplication is specifically needed for cycle prevention. -
Treating CTEs as performance optimizations — in PostgreSQL versions before 12, CTEs are always materialized, creating an optimization fence that prevents the planner from pushing predicates through. A CTE that filters a large table may force a full scan. Check
EXPLAINoutput and consider subqueries orNOT MATERIALIZEDwhen performance matters. -
Deeply nested CTE chains without intermediate validation — chaining ten CTEs where each depends on the last creates a query that is difficult to debug when results are wrong. Develop and validate each step independently before composing them.
-
Using recursive CTEs for problems that have set-based solutions — generating a sequence of dates or numbers is a valid use case, but recursive CTEs should not replace
generate_series(PostgreSQL) or calendar tables when those are available and more efficient.
Best Practices
- Use CTEs for readability — break a 200-line query into named steps that read top-to-bottom like a pipeline.
- Always set a depth limit on recursive CTEs — add
WHERE depth < Nin the recursive member to prevent runaway queries on cyclic or unexpectedly deep data. - Prevent cycles explicitly — track visited nodes in an array or path string and filter them out in the recursive member.
- Choose
UNION ALLoverUNIONunless you specifically need deduplication —UNIONadds a sort/hash step every iteration. - Be aware of materialization — in PostgreSQL <12, CTEs are always materialized (an optimization fence). Use subqueries or upgrade if the planner needs to push filters through.
- Prefer CTEs over nested subqueries when the same derived table is referenced more than once in the query.
Common Pitfalls
- Infinite recursion — forgetting a termination condition or cycle guard. PostgreSQL has
max_recursive_iterations(default unlimited); MySQL limits tocte_max_recursion_depth(default 1000). Always add your own guard. - Assuming CTE execution order — the optimizer may reorder or inline CTEs. Do not rely on side effects between CTEs unless using data-modifying CTEs (INSERT/UPDATE/DELETE in WITH).
- Performance surprises from materialization — a CTE that filters a billion-row table without an index-friendly predicate forces a full scan if materialized. Check
EXPLAINand considerNOT MATERIALIZEDor a subquery. - UNION vs UNION ALL in recursive CTEs —
UNIONdeduplicates per iteration, which can silently hide expected rows or mask bugs in your join condition. - Column type mismatches — the anchor member defines column types. If the recursive member produces wider types (e.g., longer strings), you may get truncation errors. Cast explicitly in the anchor.
Install this skill directly: skilldb add sql-skills
Related Skills
Indexing Strategies
Index types, design strategies, and maintenance for optimal query performance in relational databases
JSON Operations
JSON storage, querying, indexing, and manipulation in PostgreSQL and MySQL for semi-structured data
Migration Patterns
Database schema migration strategies for safe, reversible, and zero-downtime changes to production databases
Query Optimization
Techniques for analyzing and optimizing SQL query performance using execution plans, statistics, and query rewrites
Stored Procedures
Stored procedures, functions, and triggers for encapsulating business logic and automating actions within the database
Transactions Isolation
Transaction management, ACID properties, isolation levels, and concurrency control in relational databases