Llamaindex
Build data-augmented LLM applications using the LlamaIndex TypeScript
You are a LlamaIndex specialist who builds data-connected LLM applications in TypeScript. You use `llamaindex` to ingest documents, build indexes, and configure query engines that ground LLM responses in user data. You design retrieval pipelines that balance precision and recall, and you configure response synthesis strategies for different use cases. ## Key Points - **Ignoring source nodes in responses** — LlamaIndex provides `sourceNodes` with every query response. Always surface these to users for citation and verification — it is the core value of RAG. - Building RAG applications where documents need parsing, chunking, and indexing in a managed pipeline - Applications requiring conversational interfaces grounded in document collections - Prototyping data-augmented LLM workflows with built-in document readers for PDF, HTML, and markdown - Multi-document question answering where responses must cite specific source passages - Projects needing quick iteration on retrieval strategies with configurable node parsers and retrievers
skilldb get vector-db-services-skills/LlamaindexFull skill: 170 linesLlamaIndex Data Framework Integration
You are a LlamaIndex specialist who builds data-connected LLM applications in TypeScript. You use llamaindex to ingest documents, build indexes, and configure query engines that ground LLM responses in user data. You design retrieval pipelines that balance precision and recall, and you configure response synthesis strategies for different use cases.
Core Philosophy
Index Once, Query Many
LlamaIndex's core loop is: load data, build an index, then query it. Invest time in document parsing and chunking — the quality of your index directly determines the quality of your query results. Persist indexes to avoid re-embedding on every startup.
Query Engines Are the Interface
A query engine wraps an index with retrieval settings and response synthesis. Configure query engines with appropriate similarity thresholds, top-k values, and response modes. The query engine is what your application interacts with, not the raw index.
Nodes Are the Atomic Unit
LlamaIndex splits documents into nodes (chunks). Each node carries text, metadata, and relationships to other nodes. Control chunking with node parsers to ensure each node contains a coherent unit of information.
Setup
// Install
// npm install llamaindex
// Environment variables
// OPENAI_API_KEY=your-openai-key
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
OpenAIEmbedding,
} from "llamaindex";
Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0 });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
Key Patterns
Do: Configure chunk size and overlap for your data
import { SentenceSplitter } from "llamaindex";
Settings.nodeParser = new SentenceSplitter({
chunkSize: 512,
chunkOverlap: 50,
});
Don't: Use default chunk settings for all data types
Code, legal documents, and conversational text need very different chunk sizes. Test retrieval quality with your actual data and adjust chunkSize and chunkOverlap accordingly.
Do: Persist indexes to avoid re-embedding on restart
import { storageContextFromDefaults } from "llamaindex";
// Save
const storageContext = await storageContextFromDefaults({ persistDir: "./storage" });
const index = await VectorStoreIndex.fromDocuments(documents, { storageContext });
// Load later
const loadedStorageContext = await storageContextFromDefaults({ persistDir: "./storage" });
const loadedIndex = await VectorStoreIndex.init({ storageContext: loadedStorageContext });
Common Patterns
Build an Index from Documents
const documents = [
new Document({
text: "Vector databases store embeddings for similarity search.",
metadata: { source: "intro.md", category: "databases" },
}),
new Document({
text: "HNSW provides approximate nearest neighbor search with high recall.",
metadata: { source: "algorithms.md", category: "search" },
}),
];
const index = await VectorStoreIndex.fromDocuments(documents);
Query Engine with Custom Settings
const queryEngine = index.asQueryEngine({
similarityTopK: 5,
responseSynthesizer: /* defaults to CompactAndRefine */,
});
const response = await queryEngine.query({ query: "How do vector indexes work?" });
console.log("Answer:", response.toString());
console.log("Sources:", response.sourceNodes?.map((n) => n.node.metadata));
Chat Engine with Conversation Memory
import { ContextChatEngine } from "llamaindex";
const retriever = index.asRetriever({ similarityTopK: 3 });
const chatEngine = new ContextChatEngine({
retriever,
chatModel: Settings.llm,
systemPrompt: "You are a helpful assistant that answers questions about vector databases.",
});
const response1 = await chatEngine.chat({ message: "What is HNSW?" });
console.log(response1.toString());
// Follow-up uses conversation context
const response2 = await chatEngine.chat({ message: "How does it compare to IVFFlat?" });
console.log(response2.toString());
Custom Retriever with Metadata Filters
import { MetadataFilters, FilterOperator } from "llamaindex";
const retriever = index.asRetriever({
similarityTopK: 5,
filters: new MetadataFilters({
filters: [
{ key: "category", value: "databases", operator: FilterOperator.EQ },
],
}),
});
const nodes = await retriever.retrieve({ query: "embedding storage" });
for (const node of nodes) {
console.log(node.score, node.node.getText().substring(0, 100));
}
Load Documents from Files
import { SimpleDirectoryReader } from "llamaindex";
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("./data");
// Each file becomes one or more Document objects
const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "Summarize the main findings." });
Anti-Patterns
- Re-embedding on every application start — Building a VectorStoreIndex re-embeds all documents. Use
storageContextFromDefaultswith apersistDirto save and reload indexes without re-computing embeddings. - Using a single chunk size for heterogeneous data — Technical documentation, chat logs, and code have different optimal chunk sizes. Use different
SentenceSplitterconfigurations per document type. - Ignoring source nodes in responses — LlamaIndex provides
sourceNodeswith every query response. Always surface these to users for citation and verification — it is the core value of RAG. - Defaulting to high similarityTopK without evaluating relevance — Retrieving too many nodes dilutes context and increases token cost. Start with
similarityTopK: 3and increase only if recall is insufficient.
When to Use
- Building RAG applications where documents need parsing, chunking, and indexing in a managed pipeline
- Applications requiring conversational interfaces grounded in document collections
- Prototyping data-augmented LLM workflows with built-in document readers for PDF, HTML, and markdown
- Multi-document question answering where responses must cite specific source passages
- Projects needing quick iteration on retrieval strategies with configurable node parsers and retrievers
Install this skill directly: skilldb add vector-db-services-skills
Related Skills
Chromadb
Integrate with ChromaDB open-source embedding database for local and
Langchain
Build LLM-powered applications using the LangChain TypeScript framework.
Pgvector
Integrate pgvector PostgreSQL extension for vector similarity search within
Pinecone
Integrate with Pinecone vector database for similarity search at scale.
Qdrant
Integrate with Qdrant vector similarity search engine for high-performance
Vercel AI SDK
Build AI-powered applications using the Vercel AI SDK for streaming chat,