Anthropic Claude API
"Anthropic Claude API: messages API, tool use, streaming, vision, system prompts, extended thinking, batches, Node SDK"
Claude's API is built around the **Messages API** — a single, clean endpoint for all interactions. Design prompts with clear, direct instructions. Leverage **system prompts** to define persona and constraints separately from user input. Use **tool use** for structured actions and data extraction. Claude excels at following nuanced instructions, long-context reasoning, and careful analysis. Prefer the official TypeScript SDK for streaming helpers, typed responses, and automatic retries.
## Key Points
- **Always set `max_tokens`** — it is required by the API and prevents runaway generation.
- **Use system prompts for persona and rules**, user messages for the actual task. This separation improves instruction following.
- **Check `stop_reason`** — handle `"end_turn"`, `"max_tokens"`, `"tool_use"` appropriately.
- **Use `tool_choice: { type: "tool", name: "..." }`** to force structured extraction without extra text.
- **Enable prompt caching** with `cache_control` on system prompts and long context to reduce latency and cost.
- **Use extended thinking** for complex reasoning tasks — it significantly improves accuracy on math, logic, and multi-step problems.
- **Use batches** for bulk processing — they offer 50% cost savings and higher throughput.
- **Log `usage` fields** including `cache_creation_input_tokens` and `cache_read_input_tokens` for cost tracking.
- **Using `assistant` prefill to put words in Claude's mouth** without understanding that it affects the response distribution. Use it deliberately, not casually.
- **Sending images as text descriptions** instead of actual image content blocks. Claude's vision is highly capable — use it.
- **Not handling `tool_use` stop reasons** — if you provide tools, you must handle the case where the model calls them.
- **Ignoring the content array structure** — responses are arrays of content blocks (text, tool_use, thinking), not a single string.
## Quick Example
```bash
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://... # optional, for proxies
```skilldb get ai-llm-services-skills/Anthropic Claude APIFull skill: 328 linesAnthropic Claude API Skill
Core Philosophy
Claude's API is built around the Messages API — a single, clean endpoint for all interactions. Design prompts with clear, direct instructions. Leverage system prompts to define persona and constraints separately from user input. Use tool use for structured actions and data extraction. Claude excels at following nuanced instructions, long-context reasoning, and careful analysis. Prefer the official TypeScript SDK for streaming helpers, typed responses, and automatic retries.
Setup
Install the SDK and configure your client:
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// Basic message
const message = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: "You are a concise technical writer.",
messages: [
{ role: "user", content: "Explain the actor model in three sentences." },
],
});
console.log(message.content[0].type === "text" ? message.content[0].text : "");
Environment variables:
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://... # optional, for proxies
Key Techniques
Streaming Responses
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a haiku about TypeScript." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
// Or use the helper for the final assembled message
const finalMessage = await stream.finalMessage();
console.log("\n\nTokens used:", finalMessage.usage);
Tool Use (Function Calling)
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: [
{
name: "search_database",
description: "Search the product database by query string",
input_schema: {
type: "object" as const,
properties: {
query: { type: "string", description: "Search query" },
limit: { type: "number", description: "Max results to return" },
},
required: ["query"],
},
},
],
messages: [{ role: "user", content: "Find laptops under $1000" }],
});
// Process tool use blocks
for (const block of response.content) {
if (block.type === "tool_use") {
const searchInput = block.input as { query: string; limit?: number };
const results = await searchDatabase(searchInput.query, searchInput.limit);
// Send tool result back
const followUp = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: [/* same tools */],
messages: [
{ role: "user", content: "Find laptops under $1000" },
{ role: "assistant", content: response.content },
{
role: "user",
content: [
{
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(results),
},
],
},
],
});
console.log(followUp.content);
}
}
Tool Use for Structured Extraction
// Force a tool call to get structured JSON output
const extraction = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tool_choice: { type: "tool", name: "extract_contact" },
tools: [
{
name: "extract_contact",
description: "Extract contact information from text",
input_schema: {
type: "object" as const,
properties: {
name: { type: "string" },
email: { type: "string" },
phone: { type: "string" },
company: { type: "string" },
},
required: ["name"],
},
},
],
messages: [
{
role: "user",
content:
"Contact Jane Smith at jane@acme.co or 555-0123. She works at Acme Corp.",
},
],
});
const toolBlock = extraction.content.find((b) => b.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
const contact = toolBlock.input; // { name, email, phone, company }
console.log(contact);
}
Vision (Image Inputs)
import { readFileSync } from "fs";
// From URL
const visionResponse = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{
type: "image",
source: {
type: "url",
url: "https://example.com/chart.png",
},
},
{ type: "text", text: "Describe the trends shown in this chart." },
],
},
],
});
// From base64
const imageData = readFileSync("./screenshot.png").toString("base64");
const base64Response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{
type: "image",
source: {
type: "base64",
media_type: "image/png",
data: imageData,
},
},
{ type: "text", text: "What UI issues do you see?" },
],
},
],
});
Extended Thinking
const thinkingResponse = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000,
},
messages: [
{
role: "user",
content: "Solve this step by step: If f(x) = x^3 - 6x^2 + 11x - 6, find all roots.",
},
],
});
for (const block of thinkingResponse.content) {
if (block.type === "thinking") {
console.log("Reasoning:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Message Batches
const batch = await anthropic.beta.messages.batches.create({
requests: [
{
custom_id: "review-1",
params: {
model: "claude-sonnet-4-20250514",
max_tokens: 512,
messages: [{ role: "user", content: "Summarize: ..." }],
},
},
{
custom_id: "review-2",
params: {
model: "claude-sonnet-4-20250514",
max_tokens: 512,
messages: [{ role: "user", content: "Summarize: ..." }],
},
},
],
});
// Poll for completion
let status = batch;
while (status.processing_status === "in_progress") {
await new Promise((r) => setTimeout(r, 30000));
status = await anthropic.beta.messages.batches.retrieve(batch.id);
}
// Stream results
for await (const result of anthropic.beta.messages.batches.results(batch.id)) {
if (result.result.type === "succeeded") {
console.log(result.custom_id, result.result.message.content);
}
}
System Prompts and Multi-Turn Conversations
const conversationHistory: Anthropic.MessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
conversationHistory.push({ role: "user", content: userMessage });
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: [
{
type: "text",
text: "You are a senior code reviewer. Be direct and specific.",
cache_control: { type: "ephemeral" },
},
],
messages: conversationHistory,
});
const assistantText =
response.content[0].type === "text" ? response.content[0].text : "";
conversationHistory.push({ role: "assistant", content: response.content });
return assistantText;
}
Best Practices
- Always set
max_tokens— it is required by the API and prevents runaway generation. - Use system prompts for persona and rules, user messages for the actual task. This separation improves instruction following.
- Check
stop_reason— handle"end_turn","max_tokens","tool_use"appropriately. - Use
tool_choice: { type: "tool", name: "..." }to force structured extraction without extra text. - Enable prompt caching with
cache_controlon system prompts and long context to reduce latency and cost. - Use extended thinking for complex reasoning tasks — it significantly improves accuracy on math, logic, and multi-step problems.
- Use batches for bulk processing — they offer 50% cost savings and higher throughput.
- Log
usagefields includingcache_creation_input_tokensandcache_read_input_tokensfor cost tracking.
Anti-Patterns
- Using
assistantprefill to put words in Claude's mouth without understanding that it affects the response distribution. Use it deliberately, not casually. - Sending images as text descriptions instead of actual image content blocks. Claude's vision is highly capable — use it.
- Not handling
tool_usestop reasons — if you provide tools, you must handle the case where the model calls them. - Ignoring the content array structure — responses are arrays of content blocks (text, tool_use, thinking), not a single string.
- Recreating conversation history from scratch each turn instead of appending. This wastes cache hits and increases costs.
- Setting thinking budget too low — if you enable extended thinking, give it enough tokens (at least 5000) to be useful.
- Mixing tool results with regular user messages — tool results must use the
tool_resultcontent block type with the correcttool_use_id. - Polling batch status in a tight loop — use 30-60 second intervals. Batches are designed for async workloads.
Install this skill directly: skilldb add ai-llm-services-skills
Related Skills
Fireworks AI
"Fireworks AI: fast inference, function calling, grammar mode, JSON output, OpenAI-compatible API, fine-tuning"
Google Gemini API
"Google Gemini API: generateContent, multimodal (images/video/audio), function calling, streaming, embeddings, context caching"
Groq
"Groq: ultra-fast inference, OpenAI-compatible API, Llama/Mixtral models, tool use, JSON mode, streaming"
OpenAI API
"OpenAI API: chat completions, function calling/tools, streaming, embeddings, vision, JSON mode, assistants, Node SDK"
Replicate
"Replicate: run open-source models, image generation (Flux/SDXL), predictions API, webhooks, streaming, Node SDK"
Together AI
"Together AI: inference API, open-source LLMs (Llama/Mistral), chat completions, embeddings, fine-tuning, JSON mode"