Prompt Chaining
Multi-step prompt chains that decompose complex tasks into sequential LLM calls
You are an expert in Prompt Chaining for crafting effective AI prompts that decompose complex tasks into sequential, composable LLM calls. ## Key Points - Customer name - Product mentioned - Issue category (billing, technical, shipping, other) - Emotional tone (angry, frustrated, neutral, positive) - Match tone to the customer's emotional state. - Lead with empathy if the customer is frustrated or angry. - Present the most relevant solution first. - Keep the response under 150 words. 1. Correctness (does it handle edge cases like empty lists, invalid 2. Performance (any unnecessary operations?) 3. Style (PEP 8, naming, docstring quality) - CODE_HELP: User needs help writing or debugging code.
skilldb get prompt-engineering-skills/Prompt ChainingFull skill: 288 linesPrompt Chaining — Prompt Engineering
You are an expert in Prompt Chaining for crafting effective AI prompts that decompose complex tasks into sequential, composable LLM calls.
Overview
Prompt chaining breaks a complex task into a series of simpler, focused LLM calls where the output of one step becomes the input (or part of the input) for the next. Each link in the chain has a single, well-defined responsibility. This approach improves reliability, makes debugging easier, allows mixing different models or configurations per step, and enables human-in-the-loop checkpoints between steps.
Core Concepts
Task Decomposition
Breaking a complex objective into discrete sub-tasks, each simple enough for a single prompt to handle reliably. The decomposition determines the chain's structure.
Data Flow
Defining how outputs from one step transform into inputs for the next. This includes extraction, formatting, filtering, and enrichment between steps.
Gate Steps
Intermediate validation or routing steps that check the output of a previous step and decide whether to proceed, retry, or branch to an alternative path.
Parallel Chains
Sub-tasks that are independent of each other can run in parallel, with results merged in a later aggregation step.
Error Propagation
How failures in one step affect downstream steps. Robust chains include error handling, fallbacks, and retry logic at each link.
Implementation Patterns
Sequential Analysis Chain
Step 1 — Extract
Prompt:
Read the following customer email and extract:
- Customer name
- Product mentioned
- Issue category (billing, technical, shipping, other)
- Emotional tone (angry, frustrated, neutral, positive)
Return as JSON.
Email: {raw_email}
Step 2 — Research (input: Step 1 output)
Prompt:
You are a support agent. Given this customer issue summary:
{step_1_json}
Search the knowledge base context below for relevant solutions:
{knowledge_base_results}
Return the top 3 most relevant solutions with confidence scores.
Step 3 — Draft Response (input: Step 1 + Step 2 output)
Prompt:
Draft a customer support email response.
Customer context: {step_1_json}
Available solutions: {step_2_solutions}
Rules:
- Match tone to the customer's emotional state.
- Lead with empathy if the customer is frustrated or angry.
- Present the most relevant solution first.
- Keep the response under 150 words.
Generate-Then-Critique Chain
Step 1 — Generate
Prompt:
Write a Python function that takes a list of timestamps (ISO 8601
strings) and returns a dictionary grouping them by day of the week.
Include type hints and docstring.
Step 2 — Critique (input: Step 1 output)
Prompt:
You are a senior Python code reviewer. Review the following function
for:
1. Correctness (does it handle edge cases like empty lists, invalid
timestamps, timezone-aware vs naive datetimes?)
2. Performance (any unnecessary operations?)
3. Style (PEP 8, naming, docstring quality)
Function:
{step_1_code}
List specific issues and suggested fixes.
Step 3 — Revise (input: Step 1 + Step 2 output)
Prompt:
Revise the following Python function based on the code review feedback.
Original function:
{step_1_code}
Review feedback:
{step_2_critique}
Return only the revised function with all issues addressed.
Routing Chain with Gate
Step 1 — Classify
Prompt:
Classify the following user message into exactly one category:
- CODE_HELP: User needs help writing or debugging code.
- CONCEPT: User wants to understand a concept or theory.
- OPINION: User is asking for a recommendation or comparison.
- OTHER: Does not fit the above categories.
Return only the category label.
Message: {user_message}
Step 2 — Route (application logic, not LLM)
if step_1_output == "CODE_HELP":
use coding_prompt_template
elif step_1_output == "CONCEPT":
use explanation_prompt_template
elif step_1_output == "OPINION":
use comparison_prompt_template
else:
use general_prompt_template
Step 3 — Execute
Prompt: {selected_template with user_message inserted}
Parallel Fan-Out / Fan-In
Step 1 — Fan Out (parallel)
Run these three prompts simultaneously:
Prompt A (Technical Analysis):
Analyze this product proposal from a technical feasibility perspective.
Proposal: {proposal}
Prompt B (Market Analysis):
Analyze this product proposal from a market opportunity perspective.
Proposal: {proposal}
Prompt C (Risk Analysis):
Analyze this product proposal from a risk and compliance perspective.
Proposal: {proposal}
Step 2 — Fan In (aggregation)
Prompt:
You are a VP of Product. Synthesize the following three analyses into
a single recommendation with a GO / NO-GO / NEEDS-MORE-INFO verdict.
Technical Analysis: {prompt_a_output}
Market Analysis: {prompt_b_output}
Risk Analysis: {prompt_c_output}
Structure: Executive Summary, Key Factors, Recommendation, Next Steps.
Iterative Refinement Chain
Step 1 — Initial Draft
Prompt:
Write a blog post introduction (3 paragraphs) about the benefits of
type-safe APIs.
Step 2 — Evaluate (input: Step 1 output)
Prompt:
Rate the following blog introduction on a scale of 1-10 for:
- Clarity
- Engagement
- Technical accuracy
- Conciseness
Provide specific feedback for any dimension scoring below 7.
Introduction: {step_1_output}
Return as JSON: {"scores": {...}, "feedback": [...]}
Step 3 — Conditional (application logic)
if all scores >= 7:
return step_1_output # Done
else:
proceed to Step 4
Step 4 — Refine (input: Step 1 + Step 2 output)
Prompt:
Rewrite this blog introduction incorporating the following feedback:
Original: {step_1_output}
Feedback: {step_2_feedback}
Return only the revised introduction.
(Repeat Steps 2-4 up to max_iterations)
Document Processing Pipeline
# Pseudocode for a document processing chain
async def process_document(document: str) -> dict:
# Step 1: Chunk the document
chunks = split_into_chunks(document, max_tokens=500)
# Step 2: Process each chunk in parallel
summaries = await asyncio.gather(*[
llm_call(
f"Summarize this section in 2-3 sentences:\n{chunk}"
)
for chunk in chunks
])
# Step 3: Synthesize summaries
combined = "\n".join(summaries)
final_summary = await llm_call(
f"Combine these section summaries into a coherent "
f"executive summary:\n{combined}"
)
# Step 4: Extract structured data
structured = await llm_call(
f"From this summary, extract key entities, dates, and "
f"action items as JSON:\n{final_summary}"
)
return {
"summary": final_summary,
"structured_data": json.loads(structured),
"section_summaries": summaries,
}
Best Practices
- Single responsibility per step. Each prompt in the chain should do one thing well. If a prompt is doing extraction, classification, and generation, split it.
- Define clear interfaces between steps. Use structured output (JSON) between steps so parsing is reliable and data flow is explicit.
- Add gate steps for quality control. Before passing output downstream, validate it. A classification gate or quality-score gate prevents error cascading.
- Use parallel execution where possible. Independent sub-tasks should run concurrently to reduce end-to-end latency.
- Log intermediate outputs. For debugging and iteration, save the output of every step. Chains are much easier to debug than monolithic prompts.
- Set max iteration limits on loops. Iterative refinement chains must have a cap to prevent infinite loops.
- Match model capability to step complexity. Use smaller, faster models for simple classification or extraction steps, and larger models for synthesis or creative steps.
Core Philosophy
Prompt chaining is the practice of decomposing a complex task into a pipeline of simple, focused LLM calls. Each link in the chain has a single responsibility, a well-defined input, and a structured output. This decomposition mirrors a fundamental principle of software engineering: complex systems are built from simple, composable parts. A chain of 3 focused prompts, each doing one thing well, is more reliable, debuggable, and maintainable than a single mega-prompt trying to do everything at once.
The interface between steps is the most critical design decision. When Step 1 outputs free-form text that Step 2 must interpret, parsing failures cascade through the entire chain. When Step 1 outputs structured JSON with a defined schema, Step 2 can consume it reliably and the interface is testable independently. Use structured output (JSON, typed objects) at every step boundary. This is not premature formalization; it is the mechanism that makes chains robust.
Not every task needs chaining. If a single prompt reliably produces the desired output, adding a chain adds latency, cost, and complexity for no benefit. The decision to chain should be driven by evidence: if a single prompt produces inconsistent quality, fails on certain input types, or combines tasks that would benefit from different model configurations, then chaining is justified. Start with one prompt, measure its failure modes, and decompose only where the evidence points.
Anti-Patterns
-
Over-engineering simple tasks into chains: Splitting a task that a single prompt handles reliably into 3-4 steps. Each additional step adds latency, cost, and failure points. Chain when you have evidence that a single prompt is insufficient, not as a default architecture.
-
Free-form text interfaces between steps: Passing the output of one step to the next as unstructured prose. The receiving step must parse natural language, which is fragile and nondeterministic. Use JSON with a defined schema at every step boundary.
-
No validation or gate steps: Passing the output of Step 1 directly to Step 3 without verifying that it is correct, complete, or well-formed. A hallucination or formatting error in an early step propagates unchecked through the entire chain. Add validation gates between critical steps.
-
Sequential execution when steps are independent: Running Step A, then Step B, then Step C serially when A and B do not depend on each other. Independent steps should run in parallel to reduce end-to-end latency, with a fan-in step to merge their results.
-
No maximum iteration limit on refinement loops: Building an iterative "generate, critique, revise" chain without a cap on iterations. If the critique step always finds something to improve, the loop runs indefinitely. Set
max_iterationsand accept "good enough" when the cap is reached.
Common Pitfalls
- Over-engineering the chain. Not every task needs chaining. If a single prompt reliably produces the desired output, adding steps only adds latency and cost.
- Fragile data passing. If Step 1 returns slightly different JSON than Step 2 expects, the chain breaks. Validate and normalize between steps.
- Error cascading. A hallucination in Step 1 propagates through every subsequent step. Add validation gates to catch errors early.
- Ignoring latency. Each LLM call adds seconds. A 5-step sequential chain with a large model can take 30+ seconds. Parallelize and use faster models where possible.
- No fallback paths. If the classification step returns an unexpected category, the chain crashes. Always include a default/fallback route.
- Losing context across steps. Later steps may need information from earlier steps that was not passed forward. Design data flow to carry all necessary context.
Install this skill directly: skilldb add prompt-engineering-skills
Related Skills
Chain of Thought
Chain-of-thought prompting to elicit step-by-step reasoning from language models
Evaluation
Prompt evaluation and testing methodologies for measuring and improving prompt quality
Few Shot Learning
Few-shot example prompting to guide model behavior through demonstration
Retrieval Augmented
RAG prompt patterns for grounding model responses in retrieved context documents
Role Prompting
Role and persona prompting to shape model expertise, tone, and perspective
Structured Output
Techniques for reliably extracting structured JSON and typed data from language models