Technology & EngineeringPrompt Engineering238 lines

Retrieval Augmented

RAG prompt patterns for grounding model responses in retrieved context documents

Quick Summary18 lines

You are an expert in RAG (Retrieval-Augmented Generation) prompt patterns for crafting effective AI prompts that ground model responses in retrieved context documents.

## Key Points

- Identify where sources agree and disagree.
- Do not add information beyond what is in the documents.
- Flag any contradictions explicitly.
- Structure your response with clear headings.
- HIGH: The context directly and clearly answers the question.
- MEDIUM: The context partially answers the question or requires
- LOW: The context is only tangentially related; significant inference
- NONE: The context does not address the question.
- Answer based on the knowledge base articles provided.
- If the articles don't cover the question, say: "I don't have that
- Reference article titles when citing information.
- Consider the conversation history for context on follow-up questions.

skilldb get prompt-engineering-skills/Retrieval AugmentedFull skill: 238 lines

Paste into your CLAUDE.md or agent config

Retrieval-Augmented Generation — Prompt Engineering

You are an expert in RAG (Retrieval-Augmented Generation) prompt patterns for crafting effective AI prompts that ground model responses in retrieved context documents.

Overview

Retrieval-Augmented Generation combines information retrieval with language model generation. Instead of relying solely on the model's parametric knowledge, RAG systems retrieve relevant documents from an external knowledge base and inject them into the prompt as context. The model then generates answers grounded in this retrieved information. Prompt design is the critical bridge between retrieval and generation — it determines whether the model uses, ignores, or hallucinates beyond the provided context.

Core Concepts

Context Injection

Placing retrieved documents or passages into the prompt in a structured format so the model can reference them when generating a response.

Grounding Instructions

Explicit directives telling the model to base its answer only on the provided context and to acknowledge when the context is insufficient.

Citation and Attribution

Requiring the model to reference which source document(s) support each claim in its response.

Context Window Management

Strategies for fitting retrieved content within token limits: truncation, summarization, relevance ranking, and chunking.

Faithfulness vs. Helpfulness Tension

Balancing the instruction to stay grounded in context with the user's expectation of a complete, helpful answer. Explicit rules resolve this tension.

Implementation Patterns

Basic RAG Prompt

Prompt:
Answer the user's question based ONLY on the provided context documents.
If the context does not contain enough information to answer, say
"I don't have enough information to answer this."

Context:
---
[Document 1: Employee Handbook - Leave Policy]
Full-time employees accrue 15 days of PTO per year. PTO accrues monthly
at 1.25 days per month. Unused PTO carries over up to a maximum of 30
days. Employees must give 2 weeks notice for planned leave exceeding
5 consecutive days.
---
[Document 2: Employee Handbook - Sick Leave]
Employees receive 10 sick days per year. Sick leave does not carry over.
A doctor's note is required for absences exceeding 3 consecutive days.
---

User Question: How many vacation days do I get per year, and can I carry
them over?

RAG with Citation Requirements

Prompt:
You are a research assistant. Answer the question using ONLY the
provided sources. After each factual claim, cite the source in brackets
like [Source 1].

If multiple sources support a claim, cite all of them. If no source
supports a claim, do not make it — instead note the gap.

Sources:
[Source 1] "Climate Change Impact Report 2025" - Global average
temperatures rose 1.2C above pre-industrial levels as of 2024...

[Source 2] "IPCC Sixth Assessment Summary" - Sea levels are projected
to rise 0.3-1.0m by 2100 under moderate emission scenarios...

[Source 3] "Arctic Ice Monitor Q4 2025" - Arctic sea ice extent reached
a record low of 3.74 million sq km in September 2025...

Question: What is the current state of global warming and its effects
on sea ice?

Multi-Document Synthesis

Prompt:
You are an analyst. The following documents contain information about
a single topic from different perspectives. Synthesize them into a
coherent summary.

Rules:
- Identify where sources agree and disagree.
- Do not add information beyond what is in the documents.
- Flag any contradictions explicitly.
- Structure your response with clear headings.

Document A (Internal Engineering Report):
[content]

Document B (Customer Feedback Survey):
[content]

Document C (Competitor Analysis):
[content]

Provide a synthesis that covers: key findings, areas of agreement,
contradictions, and information gaps.

RAG with Confidence Scoring

Prompt:
Answer the question based on the provided context. After your answer,
rate your confidence on this scale:

- HIGH: The context directly and clearly answers the question.
- MEDIUM: The context partially answers the question or requires
  reasonable inference.
- LOW: The context is only tangentially related; significant inference
  is needed.
- NONE: The context does not address the question.

Format your response as:

ANSWER: [your answer]
CONFIDENCE: [HIGH/MEDIUM/LOW/NONE]
REASONING: [why you assigned this confidence level]

Context:
[retrieved documents]

Question: [user question]

Conversational RAG with Memory

Prompt:
You are a customer support agent for TechCo. Use the retrieved knowledge
base articles to answer the user's question.

Rules:
- Answer based on the knowledge base articles provided.
- If the articles don't cover the question, say: "I don't have that
  information in our knowledge base. Let me escalate this to a
  specialist."
- Reference article titles when citing information.
- Consider the conversation history for context on follow-up questions.

Knowledge Base Articles:
[Article: "Setting Up Two-Factor Authentication"]
[content]

[Article: "Troubleshooting Login Issues"]
[content]

Conversation History:
User: I can't log into my account.
Agent: I'm sorry to hear that. Are you seeing a specific error message?
User: It says "invalid credentials" but I'm sure my password is right.

Current User Message: Could it be related to 2FA?

Chunked Context with Metadata

Prompt:
Answer the question using the retrieved passages below. Each passage
includes metadata about its source. Prefer more recent sources when
information conflicts.

Passage 1:
  Source: API Documentation v3.2
  Updated: 2026-01-15
  Content: "The /users endpoint accepts GET and POST methods. GET
  returns a paginated list with default page size of 20..."

Passage 2:
  Source: API Documentation v2.8
  Updated: 2025-06-01
  Content: "The /users endpoint accepts GET requests only. Returns
  all users in a single response..."

Passage 3:
  Source: Migration Guide v3.0
  Updated: 2025-11-20
  Content: "Breaking change: /users now supports POST for user
  creation. Response pagination was added in v3.0..."

Question: Does the /users endpoint support POST requests?

Best Practices

Always include grounding instructions. Without them, the model freely mixes retrieved context with parametric knowledge, increasing hallucination risk.
Label documents clearly. Use consistent delimiters and metadata (title, source, date) so the model can cite and prioritize.
Rank by relevance. Place the most relevant retrieved passages first and closest to the question. Models attend more to nearby context.
Set explicit fallback behavior. Define what the model should say when context is insufficient. "I don't know based on the provided documents" is better than a hallucinated answer.
Require citations. Forcing the model to cite sources makes it easier to verify answers and reduces fabrication.
Manage context window budget. Reserve sufficient tokens for the model's response. If retrieved content is too long, summarize or truncate lower-ranked passages.
Include metadata for conflict resolution. Dates, version numbers, and source authority help the model resolve contradictions.

Core Philosophy

The prompt in a RAG system serves one purpose: to instruct the model to answer from the provided context, not from its own parametric knowledge. Without this grounding instruction, the model treats the retrieved documents as supplementary background and freely mixes them with its training data, producing answers that are partially supported by evidence and partially hallucinated. The user cannot tell which parts are which. Explicit grounding -- "Answer based ONLY on the provided context" -- shifts the model's behavior from "helpful generalist" to "faithful reader."

Citation is not a formatting nicety; it is a verifiability mechanism. When the model is required to cite the specific source that supports each claim, it cannot make unsupported assertions without visibly failing to cite. This constraint improves answer quality by forcing the model to connect each claim to evidence, and it gives the user a way to verify the answer by checking the cited source. RAG without citation is barely better than no RAG at all, because the user has no way to distinguish grounded claims from hallucination.

The prompt must explicitly define what happens when the context is insufficient. Without a fallback instruction, the model will almost always produce a plausible-sounding answer, even when the retrieved documents contain nothing relevant. This is the most dangerous failure mode of RAG: the user trusts the answer because the system claims to be grounded in documents, but the answer is a hallucination dressed in the authority of retrieval. Instruct the model to say "I don't have enough information" and treat that response as a success, not a failure.

Anti-Patterns

No grounding instruction in the prompt: Including retrieved documents in the prompt but not telling the model to restrict its answer to the provided context. The model uses the documents as suggestions and freely supplements with parametric knowledge, defeating the purpose of RAG.
Stuffing the context with too many low-relevance passages: Including 20 retrieved chunks in the prompt because "more is better." Low-relevance passages dilute the model's attention on the truly relevant ones and increase the chance that the model latches onto an irrelevant detail. Retrieve fewer, higher-quality passages.
No fallback behavior for insufficient context: Not instructing the model what to do when the context does not contain the answer. The model defaults to generating a plausible answer from training data, which the user incorrectly trusts because it came from a "RAG system."
Ignoring source metadata: Including document text without titles, dates, version numbers, or source identifiers. When two documents contradict each other, the model has no basis for resolving the conflict. Metadata enables the model to prefer more recent, more authoritative, or more relevant sources.
Burying critical information in the middle of the context: Placing the most relevant passage in the middle of a long context block and the least relevant passages at the beginning and end. Models attend more to the beginning and end of the context. Place the highest-relevance passages first.

Common Pitfalls

Stuffing too much context. Overloading the prompt with marginally relevant documents dilutes attention on the truly relevant ones. Retrieve fewer, higher-quality passages.
No grounding instruction. Without explicit grounding, the model treats retrieved content as supplementary rather than authoritative, leading to hallucination.
Ignoring retrieval quality. The best prompt engineering cannot fix bad retrieval. If the wrong documents are retrieved, the answer will be wrong or fabricated.
Assuming the model reads everything equally. Models have attention biases. Critical information buried in the middle of a long context block may be overlooked.
Not handling "no answer" cases. If the retrieved context does not contain the answer, the model will often make one up unless explicitly instructed to abstain.
Mixing stale and current sources without metadata. Without dates or version numbers, the model cannot resolve contradictions between old and new information.

Install this skill directly: skilldb add prompt-engineering-skills

Get CLI access →