Technology & EngineeringRag Pipeline460 lines

rag-with-langchain

Building RAG pipelines with LangChain and LangGraph. Covers document loaders, text splitters, vector stores, retrievers, chains, and agents. Includes practical patterns for conversational RAG, multi-source retrieval, streaming, and LangGraph-based agentic RAG workflows.

Quick Summary20 lines

Build production RAG pipelines using LangChain's composable abstractions.

## Key Points

1. **Using deprecated chains** -- `ConversationalRetrievalChain` and `RetrievalQAWithSourcesChain` are legacy. Use LCEL or LangGraph.
2. **Not streaming** -- Always use `.stream()` in user-facing applications. The latency perception difference is dramatic.
3. **Hardcoded k=4** -- Tune k based on your query types and context window budget. Profile retrieval recall at different k values.
4. **Ignoring LangSmith** -- Enable tracing to debug retrieval quality. Set `LANGCHAIN_TRACING_V2=true` and `LANGCHAIN_API_KEY`.
5. **Monolithic chains** -- Break complex RAG into composable steps. LangGraph gives you branching, retries, and human-in-the-loop.
6. **No error handling** -- Wrap LLM calls and retrieval in try/except. Provide fallback responses when retrieval or generation fails.

## Quick Example

```bash
pip install langchain langchain-openai langchain-community langchain-chroma
pip install langgraph  # For agentic RAG
pip install unstructured  # For document parsing
```

skilldb get rag-pipeline-skills/rag-with-langchainFull skill: 460 lines

Paste into your CLAUDE.md or agent config

RAG with LangChain

Build production RAG pipelines using LangChain's composable abstractions.

Installation

pip install langchain langchain-openai langchain-community langchain-chroma
pip install langgraph  # For agentic RAG
pip install unstructured  # For document parsing

Document Loaders

from langchain_community.document_loaders import (
    DirectoryLoader,
    TextLoader,
    PyPDFLoader,
    UnstructuredMarkdownLoader,
    CSVLoader,
    WebBaseLoader,
    GitLoader,
    NotionDirectoryLoader,
    ConfluenceLoader,
)

# Plain text files from directory
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
docs = loader.load()

# PDFs
loader = PyPDFLoader("./report.pdf")
docs = loader.load()  # One document per page

# Markdown with metadata
loader = UnstructuredMarkdownLoader("./README.md", mode="elements")
docs = loader.load()

# CSV (each row becomes a document)
loader = CSVLoader("./data.csv", source_column="url")
docs = loader.load()

# Web pages
loader = WebBaseLoader(["https://docs.example.com/page1", "https://docs.example.com/page2"])
docs = loader.load()

# Git repository
loader = GitLoader(
    clone_url="https://github.com/org/repo",
    repo_path="./cloned_repo",
    branch="main",
    file_filter=lambda f: f.endswith((".py", ".md"))
)
docs = loader.load()

Text Splitters

from langchain.text_splitters import (
    RecursiveCharacterTextSplitter,
    MarkdownHeaderTextSplitter,
    Language,
)

# General purpose (recommended default)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = text_splitter.split_documents(docs)

# Token-based splitting (more accurate for LLM context)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name="cl100k_base",
    chunk_size=256,    # In tokens
    chunk_overlap=32,
)

# Markdown-aware
md_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")]
)
md_chunks = md_splitter.split_text(markdown_text)

# Then size-split the markdown chunks
final_chunks = text_splitter.split_documents(md_chunks)

# Code-aware splitting
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)
code_chunks = python_splitter.split_documents(python_docs)

Vector Stores and Retrievers

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create and persist
vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory="./chroma_db",
    collection_name="my_docs"
)

# Load existing
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
    collection_name="my_docs"
)

# Basic retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# MMR retriever (diversity)
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7}
)

# Similarity score threshold
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.7, "k": 5}
)

# Multi-query retriever (generates query variations)
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
multi_retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    llm=llm,
)
# Generates 3 query variations, retrieves for each, deduplicates

Basic RAG Chain (LCEL)

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Prompt
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context. If the context
doesn't contain enough information, say "I don't have enough information."

Context:
{context}

Question: {question}

Answer with citations in [Source: filename] format where applicable.
""")

# Format retrieved documents
def format_docs(docs):
    return "\n\n".join(
        f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
        for doc in docs
    )

# LCEL chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Invoke
answer = rag_chain.invoke("How does authentication work?")
print(answer)

# Stream
for chunk in rag_chain.stream("How does authentication work?"):
    print(chunk, end="", flush=True)

RAG Chain with Source Documents

from langchain_core.runnables import RunnableParallel

# Return both the answer and source documents
rag_with_sources = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(
    answer=lambda x: (
        prompt.invoke({"context": format_docs(x["context"]), "question": x["question"]})
        | llm
        | StrOutputParser()
    ).invoke(x)
)

# Simpler approach: use chain with return_source_documents
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

result = qa_chain.invoke({"query": "How does authentication work?"})
print(result["result"])
for doc in result["source_documents"]:
    print(f"  - {doc.metadata['source']}")

Conversational RAG (Chat History)

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Step 1: Contextualize the question using chat history
contextualize_prompt = ChatPromptTemplate.from_messages([
    ("system", "Given the chat history and latest question, reformulate the question "
     "to be standalone. Do NOT answer, just reformulate if needed."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_prompt
)

# Step 2: Answer using retrieved context
answer_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based on the context below. If unsure, say so.\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, answer_prompt)
conversational_rag = create_retrieval_chain(history_aware_retriever, question_answer_chain)

# Usage
chat_history = []
response = conversational_rag.invoke({
    "input": "What authentication methods are supported?",
    "chat_history": chat_history,
})
print(response["answer"])

# Continue conversation
chat_history.extend([
    HumanMessage(content="What authentication methods are supported?"),
    AIMessage(content=response["answer"]),
])

response2 = conversational_rag.invoke({
    "input": "How do I configure the first one?",  # Requires context from history
    "chat_history": chat_history,
})

Hybrid Retrieval with Ensemble

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

# BM25 (sparse)
bm25 = BM25Retriever.from_documents(chunks, k=10)

# Dense
dense = vectorstore.as_retriever(search_kwargs={"k": 10})

# Ensemble with RRF
ensemble = EnsembleRetriever(
    retrievers=[bm25, dense],
    weights=[0.4, 0.6]
)

# Use in RAG chain
rag_chain = (
    {"context": ensemble | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Agentic RAG with LangGraph

from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
from langchain_core.documents import Document
import operator

class RAGState(TypedDict):
    question: str
    documents: List[Document]
    generation: str
    retry_count: int

def retrieve(state: RAGState) -> RAGState:
    """Retrieve documents for the question."""
    docs = retriever.invoke(state["question"])
    return {"documents": docs}

def grade_documents(state: RAGState) -> str:
    """Grade retrieved documents for relevance. Route to generate or retry."""
    docs = state["documents"]
    question = state["question"]

    relevant_count = 0
    for doc in docs:
        grade = llm.invoke(
            f"Is this document relevant to '{question}'? "
            f"Document: {doc.page_content[:200]}. Answer YES or NO only."
        ).content.strip().upper()
        if "YES" in grade:
            relevant_count += 1

    if relevant_count >= 2:
        return "generate"
    elif state.get("retry_count", 0) < 2:
        return "rewrite_query"
    else:
        return "generate"  # Best effort

def rewrite_query(state: RAGState) -> RAGState:
    """Rewrite the query for better retrieval."""
    new_query = llm.invoke(
        f"Rewrite this query to be more specific for document search: {state['question']}"
    ).content
    return {"question": new_query, "retry_count": state.get("retry_count", 0) + 1}

def generate(state: RAGState) -> RAGState:
    """Generate answer from retrieved documents."""
    context = format_docs(state["documents"])
    answer = rag_chain.invoke(state["question"])
    return {"generation": answer}

# Build graph
workflow = StateGraph(RAGState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", lambda s: s)  # Routing node
workflow.add_node("rewrite_query", rewrite_query)
workflow.add_node("generate", generate)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    grade_documents,
    {"generate": "generate", "rewrite_query": "rewrite_query"}
)
workflow.add_edge("rewrite_query", "retrieve")
workflow.add_edge("generate", END)

app = workflow.compile()

result = app.invoke({"question": "How does OAuth2 work?", "retry_count": 0})
print(result["generation"])

Multi-Source RAG

# Different retrievers for different source types
api_docs_retriever = api_vectorstore.as_retriever(search_kwargs={"k": 3})
tutorial_retriever = tutorial_vectorstore.as_retriever(search_kwargs={"k": 3})
faq_retriever = faq_vectorstore.as_retriever(search_kwargs={"k": 2})

# Route based on query type
from langchain_core.runnables import RunnableLambda

def route_query(query: str):
    """Route to appropriate retriever based on query type."""
    classification = llm.invoke(
        f"Classify this query as one of: api_reference, tutorial, faq\nQuery: {query}"
    ).content.strip().lower()

    if "api" in classification:
        return api_docs_retriever.invoke(query)
    elif "tutorial" in classification:
        return tutorial_retriever.invoke(query)
    else:
        return faq_retriever.invoke(query)

routed_chain = (
    {"context": RunnableLambda(route_query) | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Anti-Patterns

Using deprecated chains -- ConversationalRetrievalChain and RetrievalQAWithSourcesChain are legacy. Use LCEL or LangGraph.
Not streaming -- Always use .stream() in user-facing applications. The latency perception difference is dramatic.
Hardcoded k=4 -- Tune k based on your query types and context window budget. Profile retrieval recall at different k values.
Ignoring LangSmith -- Enable tracing to debug retrieval quality. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY.
Monolithic chains -- Break complex RAG into composable steps. LangGraph gives you branching, retries, and human-in-the-loop.
No error handling -- Wrap LLM calls and retrieval in try/except. Provide fallback responses when retrieval or generation fails.

Install this skill directly: skilldb add rag-pipeline-skills

Get CLI access →

Related Skills

advanced-rag

Advanced RAG patterns beyond basic retrieve-and-generate. Covers multi-hop RAG, agentic RAG with tool use, graph RAG (knowledge graphs + vector retrieval), recursive retrieval, self-querying retrievers, query decomposition, citation extraction, and corrective RAG. Includes implementation patterns and guidance on when each advanced technique is warranted.

Rag Pipeline•464L

chunking-strategies

Comprehensive guide to document chunking strategies for RAG pipelines. Covers fixed-size, semantic, recursive character, sentence-based, parent-child, markdown-aware, and code-aware chunking. Includes chunk size optimization, overlap strategies, and practical benchmarks for choosing the right approach based on document type and retrieval quality.

Rag Pipeline•343L

embedding-models

Guide to selecting, using, and optimizing text embedding models for RAG pipelines. Covers commercial models (OpenAI text-embedding-3, Cohere embed-v3, Voyage AI) and open-source options (BGE, E5, Nomic Embed). Includes dimensionality selection, batch processing, embedding caching, fine-tuning for domain-specific retrieval, and cost analysis.

Rag Pipeline•357L

rag-evaluation

Evaluating RAG systems end-to-end. Covers retrieval metrics (context precision, context recall, MRR), generation metrics (faithfulness, answer relevance, hallucination detection), the RAGAS framework, human evaluation protocols, A/B testing retrieval strategies, building evaluation datasets, and continuous monitoring in production.

Rag Pipeline•501L

rag-fundamentals

Teaches the foundational architecture of Retrieval-Augmented Generation (RAG) systems. Covers why RAG outperforms fine-tuning for most knowledge-grounding use cases, the three core stages (indexing, retrieval, generation), component design, latency budgets, and evaluation metrics including faithfulness, relevance, and hallucination rate. Use when building or explaining any RAG system from scratch.

Rag Pipeline•266L

rag-production

Production-grade RAG deployment patterns. Covers caching strategies (semantic and exact), streaming responses, token budget management, fallback strategies for retrieval failures, monitoring retrieval quality, cost optimization, incremental indexing, multi-tenancy, and operational best practices for running RAG systems at scale.

Rag Pipeline•498L

RAG with LangChain

Installation

Document Loaders

Plain text files from directory

PDFs

Markdown with metadata

CSV (each row becomes a document)

Web pages

Git repository

Text Splitters

General purpose (recommended default)

Token-based splitting (more accurate for LLM context)

Markdown-aware

Then size-split the markdown chunks

Code-aware splitting

Vector Stores and Retrievers

Create and persist

Load existing

Basic retriever

MMR retriever (diversity)

Similarity score threshold

Multi-query retriever (generates query variations)

Generates 3 query variations, retrieves for each, deduplicates

Basic RAG Chain (LCEL)

Prompt

Format retrieved documents

LCEL chain

Invoke

Stream

RAG Chain with Source Documents

Return both the answer and source documents

Simpler approach: use chain with return_source_documents

Conversational RAG (Chat History)

Step 1: Contextualize the question using chat history

Step 2: Answer using retrieved context

Usage

Continue conversation

Hybrid Retrieval with Ensemble

BM25 (sparse)

Dense

Ensemble with RRF

Use in RAG chain

Agentic RAG with LangGraph

Build graph

Multi-Source RAG

Different retrievers for different source types

Route based on query type

Anti-Patterns

Details

Pack: rag-pipeline-skills
File: rag-with-langchain.md
Lines: 460
Category: Technology & Engineering

Download via CLI

Pro

$ skilldb add rag-pipeline-skills

Installs the full Rag Pipeline pack to your project.