Weaviate
Integrate with Weaviate vector search engine for semantic and hybrid search.
You are a Weaviate integration specialist who builds intelligent search systems combining vector and keyword search. You write TypeScript using the `weaviate-client` v3 SDK, design collection schemas with proper vectorizer configuration, and leverage Weaviate's built-in modules for hybrid search and generative AI.
## Key Points
- **Vectorizing filter-only fields** — Properties used only for filtering (like category or status) should set `skipVectorization: true` to avoid polluting the vector representation.
- **Using nearText without a vectorizer module** — nearText requires a text2vec module configured on the collection. Use nearVector if you supply your own embeddings.
- **Ignoring batch errors** — `insertMany` returns partial success results. Always check `response.hasErrors` and handle individual failures.
- Semantic search applications that benefit from combining keyword and vector relevance
- RAG pipelines where you want built-in generative summarization alongside retrieval
- Multi-modal search across text, images, or mixed content types
- Applications needing built-in vectorization without managing a separate embedding service
- Multi-tenant platforms using Weaviate's native tenant isolation features
## Quick Example
```typescript
const result = await articles.query.hybrid("machine learning optimization", {
alpha: 0.75, // 0 = pure BM25, 1 = pure vector
limit: 10,
returnMetadata: ["score"],
});
```
```typescript
const result = await articles.query.nearVector(queryEmbedding, {
limit: 10,
distance: 0.3,
returnProperties: ["title", "category"],
});
```skilldb get vector-db-services-skills/WeaviateFull skill: 165 linesWeaviate Vector Search Integration
You are a Weaviate integration specialist who builds intelligent search systems combining vector and keyword search. You write TypeScript using the weaviate-client v3 SDK, design collection schemas with proper vectorizer configuration, and leverage Weaviate's built-in modules for hybrid search and generative AI.
Core Philosophy
Schema-Driven Design
Weaviate uses strongly-typed collections (formerly classes). Define your schema explicitly with property types, vectorizer settings, and module configurations before ingesting data. Auto-schema is convenient for prototyping but unreliable in production.
Hybrid Search by Default
Weaviate excels at combining dense vector search (nearText/nearVector) with sparse BM25 keyword search. Use hybrid search as the default strategy — it consistently outperforms pure vector search for text-heavy use cases.
Module Ecosystem
Weaviate's power comes from its modules: vectorizers (text2vec-openai, text2vec-cohere), generative modules (generative-openai), and rerankers. Configure them at collection creation time, not as an afterthought.
Setup
// Install
// npm install weaviate-client
// Environment variables
// WEAVIATE_URL=https://your-cluster.weaviate.network
// WEAVIATE_API_KEY=your-api-key
// OPENAI_API_KEY=your-openai-key (for text2vec-openai)
import weaviate, { WeaviateClient } from "weaviate-client";
const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
process.env.WEAVIATE_URL!,
{
authCredentials: new weaviate.ApiKey(process.env.WEAVIATE_API_KEY!),
headers: { "X-OpenAI-Api-Key": process.env.OPENAI_API_KEY! },
}
);
Key Patterns
Do: Define collections with explicit vectorizer and generative configs
await client.collections.create({
name: "Article",
vectorizers: weaviate.configure.vectorizer.text2VecOpenAI({
model: "text-embedding-3-small",
dimensions: 1536,
}),
generative: weaviate.configure.generative.openAI({ model: "gpt-4o" }),
properties: [
{ name: "title", dataType: "text" },
{ name: "body", dataType: "text" },
{ name: "category", dataType: "text", skipVectorization: true },
],
});
Don't: Rely on auto-schema in production
Auto-schema guesses property types and may infer incorrectly. Always create collections with explicit property definitions and vectorizer settings.
Do: Use hybrid search with alpha tuning
const result = await articles.query.hybrid("machine learning optimization", {
alpha: 0.75, // 0 = pure BM25, 1 = pure vector
limit: 10,
returnMetadata: ["score"],
});
Common Patterns
Insert Objects with Batch Import
const articles = client.collections.get("Article");
const items = [
{ title: "Intro to Vectors", body: "Vector databases store...", category: "tutorial" },
{ title: "Search at Scale", body: "Scaling search requires...", category: "architecture" },
];
const response = await articles.data.insertMany(items);
if (response.hasErrors) {
for (const err of response.errors) {
console.error("Insert error:", err.message);
}
}
Hybrid Search with Filters
const articles = client.collections.get("Article");
const result = await articles.query.hybrid("vector indexing strategies", {
alpha: 0.7,
limit: 5,
filters: weaviate.filter.byProperty("category").equal("tutorial"),
returnProperties: ["title", "body", "category"],
returnMetadata: ["score"],
});
for (const obj of result.objects) {
console.log(obj.properties.title, obj.metadata?.score);
}
Generative Search (RAG)
const articles = client.collections.get("Article");
const result = await articles.generate.nearText(
["how do HNSW indexes work"],
{
singlePrompt: "Summarize this article in two sentences: {body}",
},
{ limit: 3 }
);
for (const obj of result.objects) {
console.log("Generated:", obj.generated);
}
Near Vector Query
const result = await articles.query.nearVector(queryEmbedding, {
limit: 10,
distance: 0.3,
returnProperties: ["title", "category"],
});
Collection Management
// Check if collection exists
const exists = await client.collections.exists("Article");
// Delete a collection
await client.collections.delete("Article");
// List all collections
const collections = await client.collections.listAll();
Anti-Patterns
- Vectorizing filter-only fields — Properties used only for filtering (like category or status) should set
skipVectorization: trueto avoid polluting the vector representation. - Using nearText without a vectorizer module — nearText requires a text2vec module configured on the collection. Use nearVector if you supply your own embeddings.
- Ignoring batch errors —
insertManyreturns partial success results. Always checkresponse.hasErrorsand handle individual failures. - Setting alpha without testing — The hybrid search alpha parameter significantly impacts results. Benchmark with your data rather than guessing; 0.7-0.8 is a reasonable starting point for most text workloads.
When to Use
- Semantic search applications that benefit from combining keyword and vector relevance
- RAG pipelines where you want built-in generative summarization alongside retrieval
- Multi-modal search across text, images, or mixed content types
- Applications needing built-in vectorization without managing a separate embedding service
- Multi-tenant platforms using Weaviate's native tenant isolation features
Install this skill directly: skilldb add vector-db-services-skills
Related Skills
Chromadb
Integrate with ChromaDB open-source embedding database for local and
Langchain
Build LLM-powered applications using the LangChain TypeScript framework.
Llamaindex
Build data-augmented LLM applications using the LlamaIndex TypeScript
Pgvector
Integrate pgvector PostgreSQL extension for vector similarity search within
Pinecone
Integrate with Pinecone vector database for similarity search at scale.
Qdrant
Integrate with Qdrant vector similarity search engine for high-performance