Technology & EngineeringPrompt Engineering236 lines

Structured Output

Techniques for reliably extracting structured JSON and typed data from language models

Quick Summary18 lines

You are an expert in Structured Output prompting for crafting effective AI prompts that reliably produce JSON, typed data, and machine-parseable responses.

## Key Points

- Use null for missing fields, never omit keys.
- Dates in "YYYY-MM" format.
- "end_date" is null for current positions.
- Return ONLY the JSON object.
- **Always provide an explicit schema.** Do not rely on "return as JSON" alone. Show the exact keys, types, and constraints.
- **Use "Return ONLY the JSON" to suppress commentary.** Without this, models often wrap JSON in explanatory text.
- **Prefer constrained decoding when available.** API-level enforcement is more reliable than prompt-level instructions.
- **Validate on the application side.** Never trust model output without parsing and schema validation. Use Pydantic, Zod, or equivalent.
- **Separate reasoning from output.** If the task requires analysis, give the model a reasoning section before the structured output section. This improves accuracy without polluting the JSON.
- **Use enums for categorical fields.** Spell out allowed values (e.g., `"high" | "medium" | "low"`) rather than leaving them open-ended.
- **Specify null handling.** Explicitly state whether missing data should be `null`, omitted, or use a default value.
- **Trailing commas and comments.** Models sometimes produce JSON with trailing commas or `//` comments, which are invalid JSON. Instruct: "Return valid JSON without comments."

skilldb get prompt-engineering-skills/Structured OutputFull skill: 236 lines

Paste into your CLAUDE.md or agent config

Structured Output — Prompt Engineering

You are an expert in Structured Output prompting for crafting effective AI prompts that reliably produce JSON, typed data, and machine-parseable responses.

Overview

Structured output prompting is the practice of designing prompts so that the model returns data in a consistent, machine-readable format — most commonly JSON, but also CSV, XML, YAML, or custom schemas. This is essential for any system where model output feeds into downstream code, APIs, databases, or pipelines.

Core Concepts

Schema Specification

Providing an explicit schema (JSON Schema, TypeScript interface, or example object) in the prompt so the model knows the exact shape of expected output.

Output Anchoring

Starting the model's response with the opening token of the desired format (e.g., { for JSON) to bias generation toward structured output from the first token.

Constrained Decoding

API-level features (like OpenAI's response_format or Anthropic's tool use) that enforce output conformity at the token-sampling level, not just via prompt instructions.

Validation and Retry

Wrapping model calls with schema validation (e.g., Zod, Pydantic, JSON Schema validators) and retrying with error feedback when output is malformed.

Separation of Reasoning and Output

Allowing the model to reason freely in one section, then produce the structured output in a clearly delimited final section.

Implementation Patterns

Basic JSON with Schema

Prompt:
Extract the following information from the text and return it as a JSON
object matching this schema:

{
  "name": string,
  "email": string,
  "company": string,
  "role": string,
  "phone": string | null
}

Text: "Hi, I'm Sarah Chen, CTO at Bridgewater Analytics. You can reach
me at sarah.chen@bridgewater.io or call 555-0142."

Return ONLY the JSON object, no additional text.

TypeScript Interface as Schema

Prompt:
Parse the product listing into the following TypeScript type:

interface Product {
  name: string;
  price: number;          // in USD, numeric only
  currency: "USD";
  category: "electronics" | "clothing" | "home" | "other";
  inStock: boolean;
  tags: string[];         // max 5 tags
}

Listing: "Apple AirPods Pro 2nd Gen - $249.99 - Available now.
Noise cancelling, wireless, Bluetooth 5.3, USB-C charging."

Return a single JSON object conforming to the Product type.

Array Extraction

Prompt:
Extract all action items from the meeting notes below. Return a JSON
array where each element has this structure:

{
  "task": string,
  "assignee": string,
  "due_date": string | null,   // ISO 8601 format if mentioned
  "priority": "high" | "medium" | "low"
}

Meeting Notes:
"John will finalize the Q1 budget by Friday. Sarah needs to review the
vendor contracts ASAP — this is urgent. Mike should update the project
timeline sometime next week."

Return ONLY the JSON array.

Separating Reasoning from Output

Prompt:
Analyze the following customer feedback and categorize it.

First, in a section labeled ANALYSIS, briefly explain your reasoning.
Then, in a section labeled OUTPUT, provide a JSON object with this schema:

{
  "sentiment": "positive" | "negative" | "mixed" | "neutral",
  "topics": string[],
  "urgency": "high" | "medium" | "low",
  "suggested_action": string
}

Feedback: "Your app crashes every time I try to upload a photo. I love
the UI redesign though. Please fix this soon, I have a deadline."

ANALYSIS:

Nested and Complex Schemas

Prompt:
Parse this resume into structured JSON following this schema:

{
  "personal": {
    "name": string,
    "email": string,
    "location": string | null
  },
  "experience": [
    {
      "company": string,
      "title": string,
      "start_date": string,
      "end_date": string | null,
      "highlights": string[]
    }
  ],
  "education": [
    {
      "institution": string,
      "degree": string,
      "year": number
    }
  ],
  "skills": string[]
}

Rules:
- Use null for missing fields, never omit keys.
- Dates in "YYYY-MM" format.
- "end_date" is null for current positions.
- Return ONLY the JSON object.

Validation Retry Pattern (Application Layer)

import json
from pydantic import BaseModel, ValidationError

class ExtractedEntity(BaseModel):
    name: str
    entity_type: str
    confidence: float  # 0.0 to 1.0

def extract_with_retry(text: str, max_retries: int = 3) -> ExtractedEntity:
    prompt = f"""Extract the primary entity from this text as JSON:
{{"name": string, "entity_type": string, "confidence": number}}

Text: {text}
Return ONLY valid JSON."""

    for attempt in range(max_retries):
        response = call_llm(prompt)
        try:
            data = json.loads(response)
            return ExtractedEntity(**data)
        except (json.JSONDecodeError, ValidationError) as e:
            prompt = f"""Your previous response was not valid JSON or
did not match the schema. Error: {e}

Try again. Extract the primary entity from this text as JSON:
{{"name": string, "entity_type": string, "confidence": number}}

Text: {text}
Return ONLY valid JSON."""

    raise ValueError("Failed to extract valid entity after retries")

Best Practices

Always provide an explicit schema. Do not rely on "return as JSON" alone. Show the exact keys, types, and constraints.
Use "Return ONLY the JSON" to suppress commentary. Without this, models often wrap JSON in explanatory text.
Prefer constrained decoding when available. API-level enforcement is more reliable than prompt-level instructions.
Validate on the application side. Never trust model output without parsing and schema validation. Use Pydantic, Zod, or equivalent.
Separate reasoning from output. If the task requires analysis, give the model a reasoning section before the structured output section. This improves accuracy without polluting the JSON.
Use enums for categorical fields. Spell out allowed values (e.g., "high" | "medium" | "low") rather than leaving them open-ended.
Specify null handling. Explicitly state whether missing data should be null, omitted, or use a default value.

Core Philosophy

Structured output prompting exists because LLM output is, by default, free-form text -- and free-form text breaks downstream systems. The moment model output feeds into an API call, a database insert, a UI component, or a conditional branch, it must conform to a predictable shape. Structured output prompting is the discipline of specifying that shape precisely enough that the model produces it reliably, and validating the output rigorously enough that malformed responses are caught before they cause damage.

The schema is the contract. Providing a JSON Schema, TypeScript interface, or Pydantic model in the prompt is not just a hint -- it is the specification that the model's output will be validated against. The more explicit the schema (required fields, enum constraints, nullable annotations, type annotations), the more reliable the output. Vague instructions like "return as JSON" leave the model to guess the structure; an explicit schema with field names, types, and constraints leaves nothing to guess.

Defense in depth is required. Even with constrained decoding (response_format: json_object, tool use schemas), models occasionally produce output that violates the schema: wrong types, extra fields, missing required fields, or truncated JSON. Application-level validation (Zod, Pydantic, JSON Schema validators) is not redundant with prompt-level specification -- it is the safety net that catches the cases where the model does not comply. The combination of clear schema in the prompt, constrained decoding at the API level, and strict validation in the application produces reliable structured output.

Anti-Patterns

"Return as JSON" without a schema: Asking the model to return JSON without specifying the exact keys, types, and constraints. The model picks its own structure, which may change between calls, between models, or between input variations. Always provide an explicit schema.
No application-level validation: Trusting that because the prompt asks for a specific format, the output will always comply. Models produce trailing commas, comments, extra fields, wrong types, and truncated output. Parse with JSON.parse in a try-catch and validate against the schema before using the data.
Mixing reasoning and structured output in one block: Asking the model to explain its thinking and produce JSON in the same undifferentiated text block. The reasoning text contaminates the JSON, making it unparseable. Use separate sections (e.g., ANALYSIS: and OUTPUT:) or separate the reasoning into a thinking step.
Overly complex nested schemas without examples: Providing a deeply nested schema with arrays of objects containing optional fields but no example of a completed output. Complex schemas benefit from at least one filled-in example to disambiguate how nesting, arrays, and nulls should be handled.
No retry mechanism for malformed output: Treating a parse failure as a terminal error instead of retrying with the error message included in the prompt. A single retry with feedback ("Your previous response was not valid JSON. Error: Unexpected token at position 42. Try again.") resolves most transient format failures.

Common Pitfalls

Trailing commas and comments. Models sometimes produce JSON with trailing commas or // comments, which are invalid JSON. Instruct: "Return valid JSON without comments."
Wrapping JSON in markdown code fences. The model may produce ```json ... ``` around the output. Either instruct against it or strip fences in post-processing.
Inconsistent types. A field defined as number may come back as "42" (string). Validate types strictly.
Hallucinated fields. The model may add fields not in the schema. Validate against the schema and strip extras.
Truncated output. Large JSON objects may be cut off by token limits. For large extractions, paginate or chunk the task.
Nested quotes breaking JSON. String values containing quotes can break JSON parsing. Instruct the model to escape inner quotes properly.

Install this skill directly: skilldb add prompt-engineering-skills

Get CLI access →