Technology & EngineeringPrompt Engineering172 lines

Few Shot Learning

Few-shot example prompting to guide model behavior through demonstration

Quick Summary31 lines

You are an expert in Few-Shot prompting for crafting effective AI prompts that guide model behavior through carefully chosen examples.

## Key Points

- **Zero-shot**: No examples; relies entirely on instructions.
- **Few-shot**: Typically 2-6 examples in the prompt.
- **Many-shot**: More than 6 examples, sometimes dozens. Useful when the task pattern is subtle or the model needs to learn edge cases.
- **Choose diverse, representative examples.** Cover typical cases, edge cases, and different categories. Avoid examples that are too similar to each other.
- **Keep formatting identical across all examples.** Use the same delimiters, labels, and structure every time.
- **Put the hardest or most representative example last.** The model pays more attention to examples near the query.
- **Label examples clearly.** Use consistent markers like `Input:` / `Output:`, `Q:` / `A:`, or numbered pairs.
- **Test with and without examples.** Sometimes a clear instruction alone outperforms poorly chosen examples. Measure before committing.
- **Scale the number of examples to task complexity.** Simple binary classification may need 2 examples; nuanced style transfer may need 6 or more.
- **Include edge cases.** If the task has tricky boundaries (e.g., ambiguous sentiment), show how those should be handled.
- **Examples that contradict each other.** If examples show inconsistent logic or formatting, the model will be confused.
- **Too many examples consuming context.** Each example uses tokens. With limited context windows, excessive examples crowd out space for the actual input.

## Quick Example

```
Prompt:
Write a Python function based on the description.

Description: Return the sum of all even numbers in a list.
```

```
Description: Return a dictionary counting the frequency of each character in a string.
```

skilldb get prompt-engineering-skills/Few Shot LearningFull skill: 172 lines

Paste into your CLAUDE.md or agent config

Few-Shot Learning — Prompt Engineering

You are an expert in Few-Shot prompting for crafting effective AI prompts that guide model behavior through carefully chosen examples.

Overview

Few-shot learning in the context of prompt engineering means providing a small number of input-output examples directly in the prompt so the model learns the desired pattern, format, tone, or reasoning style by demonstration. This technique requires no fine-tuning; the examples serve as in-context training data.

Core Concepts

Zero-Shot vs. Few-Shot vs. Many-Shot

Zero-shot: No examples; relies entirely on instructions.
Few-shot: Typically 2-6 examples in the prompt.
Many-shot: More than 6 examples, sometimes dozens. Useful when the task pattern is subtle or the model needs to learn edge cases.

Example Selection

The quality and diversity of examples matters more than quantity. Examples should cover representative cases, edge cases, and the full range of expected outputs.

Example Ordering

Models exhibit recency bias. Place the most representative or complex example last, closest to the actual query. Simpler examples can come first to establish the basic pattern.

Consistent Formatting

Every example must follow an identical structure. Inconsistencies in formatting between examples confuse the model about which aspects of the pattern matter.

Negative Examples

Including examples of what not to do (marked clearly) can sharpen the model's understanding of boundaries.

Implementation Patterns

Basic Classification

Prompt:
Classify the sentiment of each review as POSITIVE, NEGATIVE, or NEUTRAL.

Review: "The battery lasts all day and the screen is gorgeous."
Sentiment: POSITIVE

Review: "It arrived broken and customer support was unhelpful."
Sentiment: NEGATIVE

Review: "It's a phone. It makes calls."
Sentiment: NEUTRAL

Review: "The camera is decent but the software is frustrating."
Sentiment:

Data Extraction

Prompt:
Extract structured data from the invoice description.

Input: "Invoice #4021 from Acme Corp, dated 2025-03-15, total $1,250.00 for consulting services"
Output: {"invoice_number": "4021", "vendor": "Acme Corp", "date": "2025-03-15", "amount": 1250.00, "description": "consulting services"}

Input: "INV-887 billed by CloudHost Inc on 01/10/2025 — $430.50 (monthly hosting)"
Output: {"invoice_number": "887", "vendor": "CloudHost Inc", "date": "2025-01-10", "amount": 430.50, "description": "monthly hosting"}

Input: "Invoice 2290, Designworks LLC, 2025-06-01, $3,800 for brand identity package"
Output:

Tone and Style Matching

Prompt:
Rewrite the sentence in a friendly, conversational tone.

Formal: "We regret to inform you that your request has been denied."
Friendly: "Unfortunately, we weren't able to approve your request this time — but we'd love to help you figure out next steps!"

Formal: "Please be advised that the office will be closed on Friday."
Friendly: "Heads up — the office will be closed this Friday!"

Formal: "Your subscription has been terminated due to non-payment."
Friendly:

Code Generation Pattern

Prompt:
Write a Python function based on the description.

Description: Return the sum of all even numbers in a list.
```python
def sum_evens(nums: list[int]) -> int:
    return sum(n for n in nums if n % 2 == 0)

Description: Return a dictionary counting the frequency of each character in a string.

def char_freq(s: str) -> dict[str, int]:
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    return freq

Description: Return the longest common prefix among a list of strings.

With Negative Examples

Prompt:
Generate a concise commit message for the given diff.

Good example:
Diff: Added null-check before accessing user.email in the notification service
Message: Fix null pointer in notification service when user email is missing

Bad example (too vague):
Diff: Added null-check before accessing user.email in the notification service
Message: Fixed bug

Diff: Replaced raw SQL queries with parameterized statements in the auth module
Message:

Best Practices

Choose diverse, representative examples. Cover typical cases, edge cases, and different categories. Avoid examples that are too similar to each other.
Keep formatting identical across all examples. Use the same delimiters, labels, and structure every time.
Put the hardest or most representative example last. The model pays more attention to examples near the query.
Label examples clearly. Use consistent markers like Input: / Output:, Q: / A:, or numbered pairs.
Test with and without examples. Sometimes a clear instruction alone outperforms poorly chosen examples. Measure before committing.
Scale the number of examples to task complexity. Simple binary classification may need 2 examples; nuanced style transfer may need 6 or more.
Include edge cases. If the task has tricky boundaries (e.g., ambiguous sentiment), show how those should be handled.

Core Philosophy

Few-shot examples are the most reliable way to communicate a complex pattern to a language model. Instructions tell the model what you want in the abstract; examples show it concretely. When the task involves a specific output format, a nuanced judgment call, or a style that is difficult to describe in words, a well-chosen example communicates instantly what paragraphs of instructions struggle to convey. The example does not just illustrate the task -- it anchors the model's output distribution around the demonstrated pattern.

Quality and diversity of examples matter far more than quantity. Two examples that cover different categories, edge cases, and output formats are more valuable than five examples that all look the same. Each example should teach the model something new about the task: a different input type, a different output structure, or a different decision boundary. Redundant examples waste context tokens and may inadvertently bias the model toward the overrepresented pattern.

Examples and instructions are complementary, not interchangeable. Instructions define the rules; examples demonstrate how those rules apply in practice. A prompt with examples but no instructions forces the model to infer the rules from patterns alone, which is fragile. A prompt with instructions but no examples forces the model to interpret ambiguous rules without grounding. The strongest prompts combine a clear instruction block with 2-4 diverse examples that demonstrate the instruction in action.

Anti-Patterns

All examples from the same category: Providing 4 examples that are all positive sentiment, all short inputs, or all of the same type. The model infers that the task always produces this output, biasing it toward the overrepresented category even when the actual input is different.
Inconsistent formatting across examples: Using Input: / Output: labels in one example, Q: / A: in another, and unlabeled text in a third. The model cannot distinguish which formatting aspects are meaningful and which are noise, producing inconsistent output formatting.
Examples that contradict the instructions: Instructions say "be concise" but the examples are verbose paragraphs. The model must choose between following the instructions or imitating the examples, and it will often follow the examples. Ensure examples and instructions are aligned.
Too many examples consuming the context window: Providing 10 detailed examples that consume 80% of the available context, leaving insufficient room for the actual input and the model's response. Use the minimum number of examples needed to establish the pattern.
Using examples as a substitute for clear instructions: Providing only examples with no explanation of the task, expected format, or edge case handling. The model must reverse-engineer the rules from the examples alone, which works for simple patterns but fails for tasks with nuance or exceptions.

Common Pitfalls

Examples that contradict each other. If examples show inconsistent logic or formatting, the model will be confused.
Too many examples consuming context. Each example uses tokens. With limited context windows, excessive examples crowd out space for the actual input.
Biased example selection. If all examples are positive sentiment, the model may default to positive for ambiguous cases.
Overfitting to example surface features. The model may copy superficial patterns (word choice, length) rather than the underlying logic. Vary surface details across examples.
Forgetting to update examples when the task changes. Stale examples that no longer match the current requirement cause subtle errors.
Using examples as a substitute for clear instructions. Examples and instructions work best together. Relying on examples alone forces the model to infer intent.

Install this skill directly: skilldb add prompt-engineering-skills

Get CLI access →