Skip to main content
Technology & EngineeringAi Llm Services268 lines

OpenAI API

"OpenAI API: chat completions, function calling/tools, streaming, embeddings, vision, JSON mode, assistants, Node SDK"

Quick Summary26 lines
OpenAI's API is the most widely adopted LLM interface. Build around **chat completions** as the universal primitive. Use structured outputs (JSON mode, function calling) to get reliable, parseable responses. Stream responses for better UX. Treat the API as stateless — manage conversation history yourself. Prefer the official Node SDK over raw HTTP for type safety, automatic retries, and streaming helpers.

## Key Points

- **Set `max_tokens`** to prevent runaway costs. Always define a reasonable ceiling.
- **Use `response_format: json_schema` with `strict: true`** for structured extraction — it is more reliable than JSON mode alone.
- **Stream long responses** to reduce time-to-first-token perceived by users.
- **Include a system message** to set tone, format, and constraints. Keep it concise.
- **Use `tool_choice: "required"`** when you always want a function call, `"auto"` when optional.
- **Cache embeddings** — do not re-embed the same text. Store vectors in a database.
- **Log `usage` fields** (prompt_tokens, completion_tokens) from every response for cost tracking.
- **Use `text-embedding-3-small`** with reduced `dimensions` for cost-effective similarity search.
- **Pin model versions** in production (e.g., `gpt-4o-2024-08-06`) to avoid behavior changes.
- **Stuffing entire documents into a single prompt** without chunking. Respect context windows and use RAG for large corpora.
- **Ignoring `finish_reason`** — check for `"length"` (truncated), `"content_filter"` (blocked), or `"tool_calls"` to handle each case.
- **Hardcoding API keys** in source code. Always use environment variables or secret managers.

## Quick Example

```bash
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...        # optional
OPENAI_BASE_URL=https://...  # optional, for proxies/Azure
```
skilldb get ai-llm-services-skills/OpenAI APIFull skill: 268 lines
Paste into your CLAUDE.md or agent config

OpenAI API Skill

Core Philosophy

OpenAI's API is the most widely adopted LLM interface. Build around chat completions as the universal primitive. Use structured outputs (JSON mode, function calling) to get reliable, parseable responses. Stream responses for better UX. Treat the API as stateless — manage conversation history yourself. Prefer the official Node SDK over raw HTTP for type safety, automatic retries, and streaming helpers.

Setup

Install the SDK and configure your client:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Basic chat completion
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain monads in one sentence." },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Environment variables:

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...        # optional
OPENAI_BASE_URL=https://...  # optional, for proxies/Azure

Key Techniques

Streaming Responses

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Write a short poem." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Function Calling / Tools

const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string", description: "City name" },
          unit: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["location"],
      },
    },
  },
];

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools,
  tool_choice: "auto",
});

const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const args = JSON.parse(toolCall.function.arguments);
  const weatherResult = await fetchWeather(args.location, args.unit);

  // Send the tool result back
  const followUp = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "user", content: "What's the weather in Paris?" },
      response.choices[0].message,
      {
        role: "tool",
        tool_call_id: toolCall.id,
        content: JSON.stringify(weatherResult),
      },
    ],
    tools,
  });
  console.log(followUp.choices[0].message.content);
}

JSON Mode and Structured Outputs

// JSON mode — guarantees valid JSON
const jsonResponse = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "Respond with JSON containing name and age." },
    { role: "user", content: "Tell me about Ada Lovelace." },
  ],
  response_format: { type: "json_object" },
});

const data = JSON.parse(jsonResponse.choices[0].message.content!);

// Structured outputs — schema-constrained generation
const structured = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Extract: John is 30 and lives in NYC" }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
        additionalProperties: false,
      },
      strict: true,
    },
  },
});

Vision (Image Inputs)

const visionResponse = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://example.com/photo.jpg",
            detail: "high", // "low" | "high" | "auto"
          },
        },
      ],
    },
  ],
  max_tokens: 500,
});

Embeddings

const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "The quick brown fox jumps over the lazy dog",
  dimensions: 512, // optional dimensionality reduction
});

const vector = embedding.data[0].embedding; // number[]

// Batch embeddings
const batchEmbedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: ["First document", "Second document", "Third document"],
});

Assistants API

const assistant = await openai.beta.assistants.create({
  name: "Data Analyst",
  instructions: "You analyze CSV data and produce insights.",
  model: "gpt-4o",
  tools: [{ type: "code_interpreter" }],
});

const thread = await openai.beta.threads.create();

await openai.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "Analyze the trend in this data: 10, 15, 13, 22, 28, 35",
});

const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

if (run.status === "completed") {
  const messages = await openai.beta.threads.messages.list(thread.id);
  console.log(messages.data[0].content);
}

Error Handling and Retries

import OpenAI, { APIError, RateLimitError } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,       // automatic retries on transient errors
  timeout: 30 * 1000,  // 30 second timeout
});

try {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof RateLimitError) {
    console.error("Rate limited, back off:", error.message);
  } else if (error instanceof APIError) {
    console.error(`API error ${error.status}:`, error.message);
  }
}

Best Practices

  • Set max_tokens to prevent runaway costs. Always define a reasonable ceiling.
  • Use response_format: json_schema with strict: true for structured extraction — it is more reliable than JSON mode alone.
  • Stream long responses to reduce time-to-first-token perceived by users.
  • Include a system message to set tone, format, and constraints. Keep it concise.
  • Use tool_choice: "required" when you always want a function call, "auto" when optional.
  • Cache embeddings — do not re-embed the same text. Store vectors in a database.
  • Log usage fields (prompt_tokens, completion_tokens) from every response for cost tracking.
  • Use text-embedding-3-small with reduced dimensions for cost-effective similarity search.
  • Pin model versions in production (e.g., gpt-4o-2024-08-06) to avoid behavior changes.

Anti-Patterns

  • Stuffing entire documents into a single prompt without chunking. Respect context windows and use RAG for large corpora.
  • Ignoring finish_reason — check for "length" (truncated), "content_filter" (blocked), or "tool_calls" to handle each case.
  • Hardcoding API keys in source code. Always use environment variables or secret managers.
  • Not handling tool call loops — the model may call tools multiple times; build a loop that re-submits tool results until finish_reason is "stop".
  • Using temperature: 0 for creative tasks, or temperature: 1+ for structured extraction. Match temperature to the task.
  • Skipping input validation on function call arguments — the model can produce unexpected values. Validate and sanitize before executing.
  • Creating a new Assistants thread per message — threads are designed to hold multi-turn conversations; reuse them.
  • Polling runs without backoff — use createAndPoll or implement exponential backoff rather than tight loops.

Install this skill directly: skilldb add ai-llm-services-skills

Get CLI access →