rag-with-langchain
Building RAG pipelines with LangChain and LangGraph. Covers document loaders, text splitters, vector stores, retrievers, chains, and agents. Includes practical patterns for conversational RAG, multi-source retrieval, streaming, and LangGraph-based agentic RAG workflows.
Build production RAG pipelines using LangChain's composable abstractions. ## Key Points 1. **Using deprecated chains** -- `ConversationalRetrievalChain` and `RetrievalQAWithSourcesChain` are legacy. Use LCEL or LangGraph. 2. **Not streaming** -- Always use `.stream()` in user-facing applications. The latency perception difference is dramatic. 3. **Hardcoded k=4** -- Tune k based on your query types and context window budget. Profile retrieval recall at different k values. 4. **Ignoring LangSmith** -- Enable tracing to debug retrieval quality. Set `LANGCHAIN_TRACING_V2=true` and `LANGCHAIN_API_KEY`. 5. **Monolithic chains** -- Break complex RAG into composable steps. LangGraph gives you branching, retries, and human-in-the-loop. 6. **No error handling** -- Wrap LLM calls and retrieval in try/except. Provide fallback responses when retrieval or generation fails. ## Quick Example ```bash pip install langchain langchain-openai langchain-community langchain-chroma pip install langgraph # For agentic RAG pip install unstructured # For document parsing ```
skilldb get rag-pipeline-skills/rag-with-langchainFull skill: 460 linesRAG with LangChain
Build production RAG pipelines using LangChain's composable abstractions.
Installation
pip install langchain langchain-openai langchain-community langchain-chroma
pip install langgraph # For agentic RAG
pip install unstructured # For document parsing
Document Loaders
from langchain_community.document_loaders import (
DirectoryLoader,
TextLoader,
PyPDFLoader,
UnstructuredMarkdownLoader,
CSVLoader,
WebBaseLoader,
GitLoader,
NotionDirectoryLoader,
ConfluenceLoader,
)
# Plain text files from directory
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
docs = loader.load()
# PDFs
loader = PyPDFLoader("./report.pdf")
docs = loader.load() # One document per page
# Markdown with metadata
loader = UnstructuredMarkdownLoader("./README.md", mode="elements")
docs = loader.load()
# CSV (each row becomes a document)
loader = CSVLoader("./data.csv", source_column="url")
docs = loader.load()
# Web pages
loader = WebBaseLoader(["https://docs.example.com/page1", "https://docs.example.com/page2"])
docs = loader.load()
# Git repository
loader = GitLoader(
clone_url="https://github.com/org/repo",
repo_path="./cloned_repo",
branch="main",
file_filter=lambda f: f.endswith((".py", ".md"))
)
docs = loader.load()
Text Splitters
from langchain.text_splitters import (
RecursiveCharacterTextSplitter,
MarkdownHeaderTextSplitter,
Language,
)
# General purpose (recommended default)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
length_function=len,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = text_splitter.split_documents(docs)
# Token-based splitting (more accurate for LLM context)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
encoding_name="cl100k_base",
chunk_size=256, # In tokens
chunk_overlap=32,
)
# Markdown-aware
md_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")]
)
md_chunks = md_splitter.split_text(markdown_text)
# Then size-split the markdown chunks
final_chunks = text_splitter.split_documents(md_chunks)
# Code-aware splitting
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=1000,
chunk_overlap=100,
)
code_chunks = python_splitter.split_documents(python_docs)
Vector Stores and Retrievers
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create and persist
vectorstore = Chroma.from_documents(
chunks,
embeddings,
persist_directory="./chroma_db",
collection_name="my_docs"
)
# Load existing
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=embeddings,
collection_name="my_docs"
)
# Basic retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}
)
# MMR retriever (diversity)
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7}
)
# Similarity score threshold
retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.7, "k": 5}
)
# Multi-query retriever (generates query variations)
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
multi_retriever = MultiQueryRetriever.from_llm(
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
llm=llm,
)
# Generates 3 query variations, retrieves for each, deduplicates
Basic RAG Chain (LCEL)
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Prompt
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context. If the context
doesn't contain enough information, say "I don't have enough information."
Context:
{context}
Question: {question}
Answer with citations in [Source: filename] format where applicable.
""")
# Format retrieved documents
def format_docs(docs):
return "\n\n".join(
f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
for doc in docs
)
# LCEL chain
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Invoke
answer = rag_chain.invoke("How does authentication work?")
print(answer)
# Stream
for chunk in rag_chain.stream("How does authentication work?"):
print(chunk, end="", flush=True)
RAG Chain with Source Documents
from langchain_core.runnables import RunnableParallel
# Return both the answer and source documents
rag_with_sources = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
).assign(
answer=lambda x: (
prompt.invoke({"context": format_docs(x["context"]), "question": x["question"]})
| llm
| StrOutputParser()
).invoke(x)
)
# Simpler approach: use chain with return_source_documents
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt}
)
result = qa_chain.invoke({"query": "How does authentication work?"})
print(result["result"])
for doc in result["source_documents"]:
print(f" - {doc.metadata['source']}")
Conversational RAG (Chat History)
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
# Step 1: Contextualize the question using chat history
contextualize_prompt = ChatPromptTemplate.from_messages([
("system", "Given the chat history and latest question, reformulate the question "
"to be standalone. Do NOT answer, just reformulate if needed."),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])
history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_prompt
)
# Step 2: Answer using retrieved context
answer_prompt = ChatPromptTemplate.from_messages([
("system", "Answer based on the context below. If unsure, say so.\n\n{context}"),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])
question_answer_chain = create_stuff_documents_chain(llm, answer_prompt)
conversational_rag = create_retrieval_chain(history_aware_retriever, question_answer_chain)
# Usage
chat_history = []
response = conversational_rag.invoke({
"input": "What authentication methods are supported?",
"chat_history": chat_history,
})
print(response["answer"])
# Continue conversation
chat_history.extend([
HumanMessage(content="What authentication methods are supported?"),
AIMessage(content=response["answer"]),
])
response2 = conversational_rag.invoke({
"input": "How do I configure the first one?", # Requires context from history
"chat_history": chat_history,
})
Hybrid Retrieval with Ensemble
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
# BM25 (sparse)
bm25 = BM25Retriever.from_documents(chunks, k=10)
# Dense
dense = vectorstore.as_retriever(search_kwargs={"k": 10})
# Ensemble with RRF
ensemble = EnsembleRetriever(
retrievers=[bm25, dense],
weights=[0.4, 0.6]
)
# Use in RAG chain
rag_chain = (
{"context": ensemble | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Agentic RAG with LangGraph
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
from langchain_core.documents import Document
import operator
class RAGState(TypedDict):
question: str
documents: List[Document]
generation: str
retry_count: int
def retrieve(state: RAGState) -> RAGState:
"""Retrieve documents for the question."""
docs = retriever.invoke(state["question"])
return {"documents": docs}
def grade_documents(state: RAGState) -> str:
"""Grade retrieved documents for relevance. Route to generate or retry."""
docs = state["documents"]
question = state["question"]
relevant_count = 0
for doc in docs:
grade = llm.invoke(
f"Is this document relevant to '{question}'? "
f"Document: {doc.page_content[:200]}. Answer YES or NO only."
).content.strip().upper()
if "YES" in grade:
relevant_count += 1
if relevant_count >= 2:
return "generate"
elif state.get("retry_count", 0) < 2:
return "rewrite_query"
else:
return "generate" # Best effort
def rewrite_query(state: RAGState) -> RAGState:
"""Rewrite the query for better retrieval."""
new_query = llm.invoke(
f"Rewrite this query to be more specific for document search: {state['question']}"
).content
return {"question": new_query, "retry_count": state.get("retry_count", 0) + 1}
def generate(state: RAGState) -> RAGState:
"""Generate answer from retrieved documents."""
context = format_docs(state["documents"])
answer = rag_chain.invoke(state["question"])
return {"generation": answer}
# Build graph
workflow = StateGraph(RAGState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", lambda s: s) # Routing node
workflow.add_node("rewrite_query", rewrite_query)
workflow.add_node("generate", generate)
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
"grade_documents",
grade_documents,
{"generate": "generate", "rewrite_query": "rewrite_query"}
)
workflow.add_edge("rewrite_query", "retrieve")
workflow.add_edge("generate", END)
app = workflow.compile()
result = app.invoke({"question": "How does OAuth2 work?", "retry_count": 0})
print(result["generation"])
Multi-Source RAG
# Different retrievers for different source types
api_docs_retriever = api_vectorstore.as_retriever(search_kwargs={"k": 3})
tutorial_retriever = tutorial_vectorstore.as_retriever(search_kwargs={"k": 3})
faq_retriever = faq_vectorstore.as_retriever(search_kwargs={"k": 2})
# Route based on query type
from langchain_core.runnables import RunnableLambda
def route_query(query: str):
"""Route to appropriate retriever based on query type."""
classification = llm.invoke(
f"Classify this query as one of: api_reference, tutorial, faq\nQuery: {query}"
).content.strip().lower()
if "api" in classification:
return api_docs_retriever.invoke(query)
elif "tutorial" in classification:
return tutorial_retriever.invoke(query)
else:
return faq_retriever.invoke(query)
routed_chain = (
{"context": RunnableLambda(route_query) | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Anti-Patterns
-
Using deprecated chains --
ConversationalRetrievalChainandRetrievalQAWithSourcesChainare legacy. Use LCEL or LangGraph. -
Not streaming -- Always use
.stream()in user-facing applications. The latency perception difference is dramatic. -
Hardcoded k=4 -- Tune k based on your query types and context window budget. Profile retrieval recall at different k values.
-
Ignoring LangSmith -- Enable tracing to debug retrieval quality. Set
LANGCHAIN_TRACING_V2=trueandLANGCHAIN_API_KEY. -
Monolithic chains -- Break complex RAG into composable steps. LangGraph gives you branching, retries, and human-in-the-loop.
-
No error handling -- Wrap LLM calls and retrieval in try/except. Provide fallback responses when retrieval or generation fails.
Install this skill directly: skilldb add rag-pipeline-skills
Related Skills
advanced-rag
Advanced RAG patterns beyond basic retrieve-and-generate. Covers multi-hop RAG, agentic RAG with tool use, graph RAG (knowledge graphs + vector retrieval), recursive retrieval, self-querying retrievers, query decomposition, citation extraction, and corrective RAG. Includes implementation patterns and guidance on when each advanced technique is warranted.
chunking-strategies
Comprehensive guide to document chunking strategies for RAG pipelines. Covers fixed-size, semantic, recursive character, sentence-based, parent-child, markdown-aware, and code-aware chunking. Includes chunk size optimization, overlap strategies, and practical benchmarks for choosing the right approach based on document type and retrieval quality.
embedding-models
Guide to selecting, using, and optimizing text embedding models for RAG pipelines. Covers commercial models (OpenAI text-embedding-3, Cohere embed-v3, Voyage AI) and open-source options (BGE, E5, Nomic Embed). Includes dimensionality selection, batch processing, embedding caching, fine-tuning for domain-specific retrieval, and cost analysis.
rag-evaluation
Evaluating RAG systems end-to-end. Covers retrieval metrics (context precision, context recall, MRR), generation metrics (faithfulness, answer relevance, hallucination detection), the RAGAS framework, human evaluation protocols, A/B testing retrieval strategies, building evaluation datasets, and continuous monitoring in production.
rag-fundamentals
Teaches the foundational architecture of Retrieval-Augmented Generation (RAG) systems. Covers why RAG outperforms fine-tuning for most knowledge-grounding use cases, the three core stages (indexing, retrieval, generation), component design, latency budgets, and evaluation metrics including faithfulness, relevance, and hallucination rate. Use when building or explaining any RAG system from scratch.
rag-production
Production-grade RAG deployment patterns. Covers caching strategies (semantic and exact), streaming responses, token budget management, fallback strategies for retrieval failures, monitoring retrieval quality, cost optimization, incremental indexing, multi-tenancy, and operational best practices for running RAG systems at scale.