Building Agent Workflows with LangGraph
Use LangGraph (or equivalent state-machine frameworks) to express
LangGraph and similar frameworks model agent workflows as state machines: nodes that do work, edges that decide what comes next, state that persists between nodes. The model is more constrained than free-form Python orchestration but produces more debuggable, more testable, more durable systems.
## Key Points
- **State.** A typed structure that flows through the graph. Each node reads from it, optionally writes to it.
- **Nodes.** Functions or LLM calls. Each node receives the state, does work, returns updates to apply to the state.
- **Edges.** Define which node runs next. Can be unconditional (always go from A to B) or conditional (run a function on state to decide).
- **Entry and exit points.** Where execution starts and where it ends.
- **Explicit fields.** Each piece of state is a named field with a type, not a `dict[str, Any]`.
- **Immutability of inputs.** Nodes return updates; they don't mutate state in place. The framework merges updates.
- **Append for accumulation.** State that accumulates over time (messages, tool calls) uses an `add` reducer; the framework handles the append.
- **Reset for replacements.** State that should be replaced (current task, last result) uses a `replace` reducer.
- The LLM might return an invalid agent name. Validate; route to a fallback if the response doesn't match.
- The LLM might pick the same agent twice in a row, looping. Track the route history; bail or change strategy if looping.
- The LLM's decisions might be inconsistent. Cache when input state is the same.
- **Long-running workflows** that exceed a single request lifetime.
## Quick Example
```python
for chunk in app.stream(initial_state, config=config):
print(chunk)
```
```
plan → execute_step → check_progress → (re-plan | execute_next | done)
```skilldb get multi-agent-orchestration-skills/Building Agent Workflows with LangGraphFull skill: 214 linesLangGraph and similar frameworks model agent workflows as state machines: nodes that do work, edges that decide what comes next, state that persists between nodes. The model is more constrained than free-form Python orchestration but produces more debuggable, more testable, more durable systems.
This skill covers the patterns for using state-machine frameworks effectively. It's framework-agnostic in spirit, though concrete examples use LangGraph syntax.
The State Machine Model
A LangGraph workflow has:
- State. A typed structure that flows through the graph. Each node reads from it, optionally writes to it.
- Nodes. Functions or LLM calls. Each node receives the state, does work, returns updates to apply to the state.
- Edges. Define which node runs next. Can be unconditional (always go from A to B) or conditional (run a function on state to decide).
- Entry and exit points. Where execution starts and where it ends.
The graph is explicit. You can draw it. You can name every node. You can trace every edge. This is the value: agentic workflows that would otherwise be implicit control flow inside Python become first-class artifacts.
Designing the State
The state is the most important design decision. Get this right and the workflow is easy; get it wrong and every node fights it.
Principles:
- Explicit fields. Each piece of state is a named field with a type, not a
dict[str, Any]. - Immutability of inputs. Nodes return updates; they don't mutate state in place. The framework merges updates.
- Append for accumulation. State that accumulates over time (messages, tool calls) uses an
addreducer; the framework handles the append. - Reset for replacements. State that should be replaced (current task, last result) uses a
replacereducer.
from typing import TypedDict, Annotated
from operator import add
class State(TypedDict):
messages: Annotated[list[Message], add]
current_task: str
research_results: list[str]
final_answer: str | None
Avoid: a single context field that accumulates everything as a blob. The fields decompose the state into queryable, debuggable pieces.
Designing the Nodes
Each node has a single responsibility. The agent that does research is a node; the agent that writes is another node; the agent that reviews is a third.
def research_node(state: State) -> State:
query = state["current_task"]
results = search_tool.run(query)
return {"research_results": results}
def write_node(state: State) -> State:
research = state["research_results"]
response = llm.invoke(f"Write a summary based on: {research}")
return {"final_answer": response.content}
Nodes are pure-ish functions of state. They take state in, return updates. They don't have side effects beyond what's necessary.
Conditional Edges
Edges decide where to go next. Most edges are unconditional ("research → write → review"). The interesting ones are conditional.
def route_after_review(state: State) -> str:
if state["review_passed"]:
return "publish"
elif state["review_attempt"] >= 3:
return "human_escalation"
else:
return "rewrite"
graph.add_conditional_edges("review", route_after_review)
The router is a function of state. It decides the next node deterministically given the state.
Conditional edges are where agentic decisions happen. The supervisor agent's "which worker should run next" is implemented as a conditional edge with a routing function powered by an LLM call.
The LLM-Powered Router
The supervisor pattern in LangGraph:
def supervisor_router(state: State) -> str:
response = llm.invoke([
("system", "You are a supervisor. Decide which agent should run next."),
("user", f"Current state: {state['current_task']}. "
f"Available agents: research, write, review, publish."),
])
return response.content.strip() # returns the agent name
Hazards:
- The LLM might return an invalid agent name. Validate; route to a fallback if the response doesn't match.
- The LLM might pick the same agent twice in a row, looping. Track the route history; bail or change strategy if looping.
- The LLM's decisions might be inconsistent. Cache when input state is the same.
Use temperature 0 for routing decisions; low temperature for consistency.
Persistence and Resumption
LangGraph supports checkpointing — saving state at each step. Useful for:
- Long-running workflows that exceed a single request lifetime.
- Human-in-the-loop steps where the workflow waits for user input.
- Failure recovery: if a node errors, restart from the last checkpoint.
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver(connection_string="postgres://...")
app = graph.compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "user-42-task-123"}}
result = app.invoke(initial_state, config=config)
# Later, resume:
state = app.get_state(config)
result = app.invoke(None, config=config)
Checkpointing turns workflows from synchronous functions into durable processes. The thread ID identifies the workflow run; the checkpointer persists state at each node boundary.
Streaming and Observability
LangGraph supports streaming intermediate state and outputs. Useful for UIs that show progress and for debugging.
for chunk in app.stream(initial_state, config=config):
print(chunk)
Each chunk is the state delta from one node. The UI can render progress as it arrives.
Logging: every node invocation, every edge decision, every state change. Production systems need this for incident investigation. LangSmith and similar tools handle visualization; without them, you're reading raw logs.
Subgraphs
For complex workflows, use subgraphs. A "research" subgraph might have its own internal nodes (query generation → search → filter → summarize) but appear as a single node in the parent graph.
Subgraphs:
- Encapsulate complexity.
- Are independently testable.
- Can be reused across workflows.
The parent passes state to the subgraph; the subgraph returns its result; the parent integrates.
Testing
Test the graph and its nodes separately.
Node tests: each node is a function. Pass in a fixture state; assert on the returned updates. Mock LLM calls and tool calls.
Graph tests: pass in initial state; run the graph; assert on the final state. Mock LLM responses to deterministic values.
Integration tests: run the full graph end-to-end against a real LLM with realistic inputs. These are slow and expensive; run in CI but not on every commit.
Common Patterns
The Re-Plan Pattern
Plan the work; execute step 1; if step 1's result invalidates the plan, re-plan; otherwise execute step 2.
plan → execute_step → check_progress → (re-plan | execute_next | done)
Useful for tasks where progress reveals new constraints.
The Self-Correction Pattern
Generate; check; if failed, generate again with feedback; otherwise return.
generate → check → (return | generate_with_feedback)
The check node is often an LLM call evaluating the generation against criteria.
The Hierarchical Pattern
Top-level supervisor invokes mid-level supervisors; each mid-level invokes specialized workers.
top_supervisor → (mid_supervisor_a | mid_supervisor_b)
mid_supervisor_a → (worker_1 | worker_2 | worker_3)
Useful for complex tasks that decompose into subtasks.
Anti-Patterns
State as a single dict. Untyped, unstructured, accumulating blob. Decompose into named typed fields.
Mutable state. Nodes mutating state in place. Causes ordering bugs. Return updates instead.
Looping without budgets. Conditional edges that can loop forever. Add a step counter; bail at a budget.
Untested nodes. Nodes only tested as part of the full graph. Slow, hard to debug. Test nodes individually with fixtures.
No checkpointing for long workflows. A failure mid-way loses all progress. Checkpoint at every node boundary.
LLM router with high temperature. Routing decisions are inconsistent. Set temperature 0.
Install this skill directly: skilldb add multi-agent-orchestration-skills
Related Skills
Multi-Agent Handoff Patterns
Coordinate multiple specialized agents on a single task — when to hand
Agent Tool Design Principles
Design the tools that an LLM agent uses. Covers naming, parameter
Evaluation-Driven Agent Development
Build the eval suite that tells you whether changes to your agent
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.
API Design Testing
Design, document, and test APIs following RESTful principles, consistent
Architecture
Design software systems with sound architecture — choosing patterns, defining boundaries,