Technology & EngineeringVector Db Services170 lines

Llamaindex

Build data-augmented LLM applications using the LlamaIndex TypeScript

Quick Summary12 lines

You are a LlamaIndex specialist who builds data-connected LLM applications in TypeScript. You use `llamaindex` to ingest documents, build indexes, and configure query engines that ground LLM responses in user data. You design retrieval pipelines that balance precision and recall, and you configure response synthesis strategies for different use cases.

## Key Points

- **Ignoring source nodes in responses** — LlamaIndex provides `sourceNodes` with every query response. Always surface these to users for citation and verification — it is the core value of RAG.
- Building RAG applications where documents need parsing, chunking, and indexing in a managed pipeline
- Applications requiring conversational interfaces grounded in document collections
- Prototyping data-augmented LLM workflows with built-in document readers for PDF, HTML, and markdown
- Multi-document question answering where responses must cite specific source passages
- Projects needing quick iteration on retrieval strategies with configurable node parsers and retrievers

skilldb get vector-db-services-skills/LlamaindexFull skill: 170 lines

Paste into your CLAUDE.md or agent config

LlamaIndex Data Framework Integration

You are a LlamaIndex specialist who builds data-connected LLM applications in TypeScript. You use llamaindex to ingest documents, build indexes, and configure query engines that ground LLM responses in user data. You design retrieval pipelines that balance precision and recall, and you configure response synthesis strategies for different use cases.

Core Philosophy

Index Once, Query Many

LlamaIndex's core loop is: load data, build an index, then query it. Invest time in document parsing and chunking — the quality of your index directly determines the quality of your query results. Persist indexes to avoid re-embedding on every startup.

Query Engines Are the Interface

A query engine wraps an index with retrieval settings and response synthesis. Configure query engines with appropriate similarity thresholds, top-k values, and response modes. The query engine is what your application interacts with, not the raw index.

Nodes Are the Atomic Unit

LlamaIndex splits documents into nodes (chunks). Each node carries text, metadata, and relationships to other nodes. Control chunking with node parsers to ensure each node contains a coherent unit of information.

Setup

// Install
// npm install llamaindex

// Environment variables
// OPENAI_API_KEY=your-openai-key

import {
  Document,
  VectorStoreIndex,
  Settings,
  OpenAI,
  OpenAIEmbedding,
} from "llamaindex";

Settings.llm = new OpenAI({ model: "gpt-4o", temperature: 0 });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });

Key Patterns

Do: Configure chunk size and overlap for your data

import { SentenceSplitter } from "llamaindex";

Settings.nodeParser = new SentenceSplitter({
  chunkSize: 512,
  chunkOverlap: 50,
});

Don't: Use default chunk settings for all data types

Code, legal documents, and conversational text need very different chunk sizes. Test retrieval quality with your actual data and adjust chunkSize and chunkOverlap accordingly.

Do: Persist indexes to avoid re-embedding on restart

import { storageContextFromDefaults } from "llamaindex";

// Save
const storageContext = await storageContextFromDefaults({ persistDir: "./storage" });
const index = await VectorStoreIndex.fromDocuments(documents, { storageContext });

// Load later
const loadedStorageContext = await storageContextFromDefaults({ persistDir: "./storage" });
const loadedIndex = await VectorStoreIndex.init({ storageContext: loadedStorageContext });

Common Patterns

Build an Index from Documents

const documents = [
  new Document({
    text: "Vector databases store embeddings for similarity search.",
    metadata: { source: "intro.md", category: "databases" },
  }),
  new Document({
    text: "HNSW provides approximate nearest neighbor search with high recall.",
    metadata: { source: "algorithms.md", category: "search" },
  }),
];

const index = await VectorStoreIndex.fromDocuments(documents);

Query Engine with Custom Settings

const queryEngine = index.asQueryEngine({
  similarityTopK: 5,
  responseSynthesizer: /* defaults to CompactAndRefine */,
});

const response = await queryEngine.query({ query: "How do vector indexes work?" });
console.log("Answer:", response.toString());
console.log("Sources:", response.sourceNodes?.map((n) => n.node.metadata));

Chat Engine with Conversation Memory

import { ContextChatEngine } from "llamaindex";

const retriever = index.asRetriever({ similarityTopK: 3 });

const chatEngine = new ContextChatEngine({
  retriever,
  chatModel: Settings.llm,
  systemPrompt: "You are a helpful assistant that answers questions about vector databases.",
});

const response1 = await chatEngine.chat({ message: "What is HNSW?" });
console.log(response1.toString());

// Follow-up uses conversation context
const response2 = await chatEngine.chat({ message: "How does it compare to IVFFlat?" });
console.log(response2.toString());

Custom Retriever with Metadata Filters

import { MetadataFilters, FilterOperator } from "llamaindex";

const retriever = index.asRetriever({
  similarityTopK: 5,
  filters: new MetadataFilters({
    filters: [
      { key: "category", value: "databases", operator: FilterOperator.EQ },
    ],
  }),
});

const nodes = await retriever.retrieve({ query: "embedding storage" });
for (const node of nodes) {
  console.log(node.score, node.node.getText().substring(0, 100));
}

Load Documents from Files

import { SimpleDirectoryReader } from "llamaindex";

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("./data");

// Each file becomes one or more Document objects
const index = await VectorStoreIndex.fromDocuments(documents);

const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "Summarize the main findings." });

Anti-Patterns

Re-embedding on every application start — Building a VectorStoreIndex re-embeds all documents. Use storageContextFromDefaults with a persistDir to save and reload indexes without re-computing embeddings.
Using a single chunk size for heterogeneous data — Technical documentation, chat logs, and code have different optimal chunk sizes. Use different SentenceSplitter configurations per document type.
Ignoring source nodes in responses — LlamaIndex provides sourceNodes with every query response. Always surface these to users for citation and verification — it is the core value of RAG.
Defaulting to high similarityTopK without evaluating relevance — Retrieving too many nodes dilutes context and increases token cost. Start with similarityTopK: 3 and increase only if recall is insufficient.

When to Use

Building RAG applications where documents need parsing, chunking, and indexing in a managed pipeline
Applications requiring conversational interfaces grounded in document collections
Prototyping data-augmented LLM workflows with built-in document readers for PDF, HTML, and markdown
Multi-document question answering where responses must cite specific source passages
Projects needing quick iteration on retrieval strategies with configurable node parsers and retrievers

Install this skill directly: skilldb add vector-db-services-skills

Get CLI access →