agent-with-claude
Building agents specifically with the Claude API: extended thinking for complex reasoning, tool use patterns, computer use for browser/desktop automation, multi-turn conversation management, crafting system prompts for agents, and streaming agent responses. Covers Claude-specific features and best practices for building reliable autonomous agents.
Leverage Claude-specific API features to build powerful agents: extended thinking, tool use, computer use, and streaming. ## Key Points - **Complex tool selection**: When the agent has many tools and must choose carefully. - **Multi-step reasoning**: Math, logic, code analysis, planning. - **Error recovery**: After a tool failure, extra thinking helps the agent reason about alternatives. - **Final synthesis**: The last turn where the agent produces its final answer. 1. Understand the task by reading relevant files first. 2. Plan your changes before writing any code. 3. Make changes incrementally — small edits, then test. 4. After writing code, ALWAYS run it to verify it works. 5. If tests fail, read the error, fix the issue, and re-run. - Never modify files outside the project directory. - Always run existing tests after making changes. - If you are unsure about something, search the codebase first.
skilldb get ai-agent-orchestration-skills/agent-with-claudeFull skill: 415 linesBuilding Agents with Claude
Leverage Claude-specific API features to build powerful agents: extended thinking, tool use, computer use, and streaming.
Basic Claude Agent Setup
import anthropic
client = anthropic.Anthropic()
def claude_agent(task: str, tools: list[dict], system: str,
model: str = "claude-sonnet-4-20250514",
max_steps: int = 20) -> str:
"""Standard Claude agent loop."""
messages = [{"role": "user", "content": task}]
for step in range(max_steps):
response = client.messages.create(
model=model,
max_tokens=4096,
system=system,
tools=tools,
messages=messages,
)
# Done when model responds without tool use
if response.stop_reason == "end_turn":
return next(
(b.text for b in response.content if b.type == "text"), ""
)
# Execute tools and continue
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return "Reached max steps."
Extended Thinking for Complex Reasoning
Extended thinking lets Claude reason deeply before responding. Critical for agents tackling complex, multi-step problems.
def agent_with_thinking(task: str, tools: list[dict],
budget_tokens: int = 10000) -> str:
"""Agent that uses extended thinking for better reasoning."""
messages = [{"role": "user", "content": task}]
for _ in range(15):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": budget_tokens,
},
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
# Extract text (thinking blocks are also present but we want the final text)
return next(
(b.text for b in response.content if b.type == "text"), ""
)
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return "Max iterations reached."
When to use extended thinking in agents:
- Complex tool selection: When the agent has many tools and must choose carefully.
- Multi-step reasoning: Math, logic, code analysis, planning.
- Error recovery: After a tool failure, extra thinking helps the agent reason about alternatives.
- Final synthesis: The last turn where the agent produces its final answer.
Dynamic Thinking Budget
def adaptive_thinking_agent(task: str, tools: list[dict]) -> str:
"""Adjust thinking budget based on task phase."""
messages = [{"role": "user", "content": task}]
step = 0
for _ in range(20):
step += 1
# More thinking budget for first step (planning) and every 5th step (checkpoint)
if step == 1 or step % 5 == 0:
budget = 15000
else:
budget = 5000
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": budget},
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return extract_text(response)
messages.append({"role": "assistant", "content": response.content})
tool_results = execute_all_tools(response)
messages.append({"role": "user", "content": tool_results})
return "Max steps reached."
System Prompts for Agents
System prompts define agent behavior. Be specific about capabilities, constraints, and expected patterns.
CODING_AGENT_SYSTEM = """You are an expert coding agent. You solve programming tasks by reading, writing, and running code.
## Capabilities
You have these tools: read_file, write_file, run_command, search_files.
## Workflow
1. Understand the task by reading relevant files first.
2. Plan your changes before writing any code.
3. Make changes incrementally — small edits, then test.
4. After writing code, ALWAYS run it to verify it works.
5. If tests fail, read the error, fix the issue, and re-run.
## Rules
- Never modify files outside the project directory.
- Always run existing tests after making changes.
- If you are unsure about something, search the codebase first.
- When you are done, provide a summary of what you changed and why.
## Error Handling
- If a command fails, read the error message carefully before retrying.
- If you are stuck after 3 attempts, explain what is going wrong and ask for help.
- Never run destructive commands (rm -rf, drop database, etc.) without confirmation."""
RESEARCH_AGENT_SYSTEM = """You are a research agent. You find, verify, and synthesize information.
## Workflow
1. Break the research question into specific sub-questions.
2. Search for information using available tools.
3. Cross-reference findings across multiple sources.
4. Note any contradictions or uncertainty.
5. Synthesize findings into a clear, cited answer.
## Rules
- Always cite your sources with URLs.
- Distinguish between facts, estimates, and opinions.
- If information is outdated or uncertain, say so explicitly.
- Never fabricate sources or statistics."""
Multi-Turn Conversation Management
For agents that run many turns, manage the message history to stay within context limits.
class ClaudeConversation:
"""Manage multi-turn conversations with context window awareness."""
def __init__(self, model: str = "claude-sonnet-4-20250514",
max_context_tokens: int = 180000):
self.model = model
self.max_context = max_context_tokens
self.messages: list[dict] = []
self.system: str = ""
def _estimate_tokens(self) -> int:
import json
return len(json.dumps(self.messages)) // 4
def add_user(self, content):
self.messages.append({"role": "user", "content": content})
def add_assistant(self, content):
self.messages.append({"role": "assistant", "content": content})
def trim_if_needed(self):
"""Remove old messages if approaching context limit."""
while self._estimate_tokens() > self.max_context * 0.8:
if len(self.messages) <= 2:
break
# Remove the second message (keep the first user message)
removed = self.messages.pop(1)
# If we removed a user message, also remove the next (assistant response)
if removed["role"] == "user" and self.messages[1:]:
if self.messages[1]["role"] == "assistant":
self.messages.pop(1)
def send(self, tools: list[dict] = None, **kwargs) -> anthropic.types.Message:
self.trim_if_needed()
params = {
"model": self.model,
"max_tokens": kwargs.get("max_tokens", 4096),
"messages": self.messages,
}
if self.system:
params["system"] = self.system
if tools:
params["tools"] = tools
return client.messages.create(**params)
Computer Use
Claude can interact with computers through screenshots and mouse/keyboard actions.
def computer_use_agent(task: str) -> str:
"""Agent that uses Claude's computer use capability."""
computer_tool = {
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 1,
}
messages = [{"role": "user", "content": task}]
for _ in range(30):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[computer_tool],
messages=messages,
)
if response.stop_reason == "end_turn":
return extract_text(response)
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the computer action and take a screenshot
screenshot_b64 = execute_computer_action(block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
},
}
],
})
messages.append({"role": "user", "content": tool_results})
return "Computer use agent reached step limit."
def execute_computer_action(action: dict) -> str:
"""Execute a computer action and return a screenshot as base64."""
import subprocess
import base64
action_type = action.get("action")
if action_type == "screenshot":
pass # Just take screenshot
elif action_type == "click":
x, y = action["coordinate"]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
elif action_type == "type":
text = action["text"]
subprocess.run(["xdotool", "type", "--clearmodifiers", text])
elif action_type == "key":
key = action["key"]
subprocess.run(["xdotool", "key", key])
elif action_type == "scroll":
x, y = action["coordinate"]
direction = action["direction"]
button = "4" if direction == "up" else "5"
subprocess.run(["xdotool", "mousemove", str(x), str(y),
"click", "--repeat", "3", button])
# Take screenshot
subprocess.run(["scrot", "/tmp/screenshot.png", "-o"])
with open("/tmp/screenshot.png", "rb") as f:
return base64.b64encode(f.read()).decode()
Streaming Agent Responses
Stream agent responses for real-time feedback during long-running tasks.
def streaming_agent(task: str, tools: list[dict], system: str):
"""Agent that streams responses for real-time output."""
messages = [{"role": "user", "content": task}]
for _ in range(20):
collected_content = []
current_tool_use = None
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system,
tools=tools,
messages=messages,
) as stream:
for event in stream:
if event.type == "content_block_start":
if event.content_block.type == "text":
pass # Text block starting
elif event.content_block.type == "tool_use":
current_tool_use = {
"id": event.content_block.id,
"name": event.content_block.name,
"input_json": "",
}
elif event.type == "content_block_delta":
if event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)
elif event.delta.type == "input_json_delta":
if current_tool_use:
current_tool_use["input_json"] += event.delta.partial_json
# Get the full response
response = stream.get_final_message()
if response.stop_reason == "end_turn":
return extract_text(response)
messages.append({"role": "assistant", "content": response.content})
# Execute tools
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"\n> Calling {block.name}...")
result = execute_tool(block.name, block.input)
print(f"> Result: {result[:100]}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return "Streaming agent reached max steps."
Claude Agent Best Practices
-
Use
claude-sonnet-4-20250514for most agents — best balance of speed, cost, and capability. Useclaude-opus-4-20250514only for the most complex reasoning tasks. -
Keep system prompts under 1500 tokens — Longer system prompts eat into context and can dilute instructions.
-
Return structured tool results — Claude parses structured text better than raw JSON blobs.
-
Use
stop_reasonto control the loop —"end_turn"means the agent is done."tool_use"means it wants to use a tool."max_tokens"means the response was cut off. -
Set reasonable
max_tokens— 4096 is enough for most agent turns. Higher values waste time on long responses when the agent should be acting, not writing. -
Truncate tool outputs — If a tool returns 50KB of text, the agent will struggle. Cap outputs at 5-10KB and summarize if needed.
Install this skill directly: skilldb add ai-agent-orchestration-skills
Related Skills
agent-architecture
Core patterns for building AI agent systems: the observe-think-act loop, ReAct pattern implementation, tool-use cycles, memory systems (short-term and long-term), and planning strategies. Covers how to structure an agent's main loop, manage state between iterations, and wire together perception, reasoning, and action into a reliable autonomous system.
agent-error-recovery
Handling failures in AI agent systems: retry strategies with backoff, fallback tools, graceful degradation, human-in-the-loop escalation, stuck-loop detection, and context recovery after crashes. Covers practical patterns for making agents robust against tool failures, API errors, and reasoning dead-ends.
agent-evaluation
Testing and evaluating AI agents: trajectory evaluation, task completion metrics, tool-use accuracy measurement, regression testing, benchmark suites, and A/B testing agent configurations. Covers practical approaches to measuring whether agents are working correctly and improving over time.
agent-frameworks
Comparison of major AI agent frameworks: LangGraph, CrewAI, AutoGen, Semantic Kernel, and Claude Agent SDK. Covers when to use each framework, their trade-offs, core patterns, practical setup examples, and migration strategies between frameworks.
agent-guardrails
Safety and control systems for AI agents: input and output validation, action authorization, rate limiting, cost controls, content filtering, scope restriction, and audit logging. Covers practical implementations for keeping agents within bounds while maintaining their usefulness.
agent-memory
Memory systems for AI agents: conversation history management, summarization strategies, vector-based long-term memory, entity memory, episodic memory, and memory retrieval patterns. Covers practical implementations for giving agents persistent, searchable memory across sessions and within long-running tasks.