OpenAI API
"OpenAI API: chat completions, function calling/tools, streaming, embeddings, vision, JSON mode, assistants, Node SDK"
OpenAI's API is the most widely adopted LLM interface. Build around **chat completions** as the universal primitive. Use structured outputs (JSON mode, function calling) to get reliable, parseable responses. Stream responses for better UX. Treat the API as stateless — manage conversation history yourself. Prefer the official Node SDK over raw HTTP for type safety, automatic retries, and streaming helpers. ## Key Points - **Set `max_tokens`** to prevent runaway costs. Always define a reasonable ceiling. - **Use `response_format: json_schema` with `strict: true`** for structured extraction — it is more reliable than JSON mode alone. - **Stream long responses** to reduce time-to-first-token perceived by users. - **Include a system message** to set tone, format, and constraints. Keep it concise. - **Use `tool_choice: "required"`** when you always want a function call, `"auto"` when optional. - **Cache embeddings** — do not re-embed the same text. Store vectors in a database. - **Log `usage` fields** (prompt_tokens, completion_tokens) from every response for cost tracking. - **Use `text-embedding-3-small`** with reduced `dimensions` for cost-effective similarity search. - **Pin model versions** in production (e.g., `gpt-4o-2024-08-06`) to avoid behavior changes. - **Stuffing entire documents into a single prompt** without chunking. Respect context windows and use RAG for large corpora. - **Ignoring `finish_reason`** — check for `"length"` (truncated), `"content_filter"` (blocked), or `"tool_calls"` to handle each case. - **Hardcoding API keys** in source code. Always use environment variables or secret managers. ## Quick Example ```bash OPENAI_API_KEY=sk-... OPENAI_ORG_ID=org-... # optional OPENAI_BASE_URL=https://... # optional, for proxies/Azure ```
skilldb get ai-llm-services-skills/OpenAI APIFull skill: 268 linesOpenAI API Skill
Core Philosophy
OpenAI's API is the most widely adopted LLM interface. Build around chat completions as the universal primitive. Use structured outputs (JSON mode, function calling) to get reliable, parseable responses. Stream responses for better UX. Treat the API as stateless — manage conversation history yourself. Prefer the official Node SDK over raw HTTP for type safety, automatic retries, and streaming helpers.
Setup
Install the SDK and configure your client:
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Basic chat completion
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain monads in one sentence." },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(response.choices[0].message.content);
Environment variables:
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-... # optional
OPENAI_BASE_URL=https://... # optional, for proxies/Azure
Key Techniques
Streaming Responses
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Write a short poem." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
Function Calling / Tools
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["location"],
},
},
},
];
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools,
tool_choice: "auto",
});
const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
const args = JSON.parse(toolCall.function.arguments);
const weatherResult = await fetchWeather(args.location, args.unit);
// Send the tool result back
const followUp = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "user", content: "What's the weather in Paris?" },
response.choices[0].message,
{
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(weatherResult),
},
],
tools,
});
console.log(followUp.choices[0].message.content);
}
JSON Mode and Structured Outputs
// JSON mode — guarantees valid JSON
const jsonResponse = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "Respond with JSON containing name and age." },
{ role: "user", content: "Tell me about Ada Lovelace." },
],
response_format: { type: "json_object" },
});
const data = JSON.parse(jsonResponse.choices[0].message.content!);
// Structured outputs — schema-constrained generation
const structured = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Extract: John is 30 and lives in NYC" }],
response_format: {
type: "json_schema",
json_schema: {
name: "person",
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
city: { type: "string" },
},
required: ["name", "age", "city"],
additionalProperties: false,
},
strict: true,
},
},
});
Vision (Image Inputs)
const visionResponse = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "https://example.com/photo.jpg",
detail: "high", // "low" | "high" | "auto"
},
},
],
},
],
max_tokens: 500,
});
Embeddings
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: "The quick brown fox jumps over the lazy dog",
dimensions: 512, // optional dimensionality reduction
});
const vector = embedding.data[0].embedding; // number[]
// Batch embeddings
const batchEmbedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: ["First document", "Second document", "Third document"],
});
Assistants API
const assistant = await openai.beta.assistants.create({
name: "Data Analyst",
instructions: "You analyze CSV data and produce insights.",
model: "gpt-4o",
tools: [{ type: "code_interpreter" }],
});
const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
role: "user",
content: "Analyze the trend in this data: 10, 15, 13, 22, 28, 35",
});
const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
if (run.status === "completed") {
const messages = await openai.beta.threads.messages.list(thread.id);
console.log(messages.data[0].content);
}
Error Handling and Retries
import OpenAI, { APIError, RateLimitError } from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
maxRetries: 3, // automatic retries on transient errors
timeout: 30 * 1000, // 30 second timeout
});
try {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof RateLimitError) {
console.error("Rate limited, back off:", error.message);
} else if (error instanceof APIError) {
console.error(`API error ${error.status}:`, error.message);
}
}
Best Practices
- Set
max_tokensto prevent runaway costs. Always define a reasonable ceiling. - Use
response_format: json_schemawithstrict: truefor structured extraction — it is more reliable than JSON mode alone. - Stream long responses to reduce time-to-first-token perceived by users.
- Include a system message to set tone, format, and constraints. Keep it concise.
- Use
tool_choice: "required"when you always want a function call,"auto"when optional. - Cache embeddings — do not re-embed the same text. Store vectors in a database.
- Log
usagefields (prompt_tokens, completion_tokens) from every response for cost tracking. - Use
text-embedding-3-smallwith reduceddimensionsfor cost-effective similarity search. - Pin model versions in production (e.g.,
gpt-4o-2024-08-06) to avoid behavior changes.
Anti-Patterns
- Stuffing entire documents into a single prompt without chunking. Respect context windows and use RAG for large corpora.
- Ignoring
finish_reason— check for"length"(truncated),"content_filter"(blocked), or"tool_calls"to handle each case. - Hardcoding API keys in source code. Always use environment variables or secret managers.
- Not handling tool call loops — the model may call tools multiple times; build a loop that re-submits tool results until
finish_reasonis"stop". - Using
temperature: 0for creative tasks, ortemperature: 1+ for structured extraction. Match temperature to the task. - Skipping input validation on function call arguments — the model can produce unexpected values. Validate and sanitize before executing.
- Creating a new Assistants thread per message — threads are designed to hold multi-turn conversations; reuse them.
- Polling runs without backoff — use
createAndPollor implement exponential backoff rather than tight loops.
Install this skill directly: skilldb add ai-llm-services-skills
Related Skills
Anthropic Claude API
"Anthropic Claude API: messages API, tool use, streaming, vision, system prompts, extended thinking, batches, Node SDK"
Fireworks AI
"Fireworks AI: fast inference, function calling, grammar mode, JSON output, OpenAI-compatible API, fine-tuning"
Google Gemini API
"Google Gemini API: generateContent, multimodal (images/video/audio), function calling, streaming, embeddings, context caching"
Groq
"Groq: ultra-fast inference, OpenAI-compatible API, Llama/Mixtral models, tool use, JSON mode, streaming"
Replicate
"Replicate: run open-source models, image generation (Flux/SDXL), predictions API, webhooks, streaming, Node SDK"
Together AI
"Together AI: inference API, open-source LLMs (Llama/Mistral), chat completions, embeddings, fine-tuning, JSON mode"