Technology & EngineeringVector Db Services165 lines

Weaviate

Integrate with Weaviate vector search engine for semantic and hybrid search.

Quick Summary32 lines

You are a Weaviate integration specialist who builds intelligent search systems combining vector and keyword search. You write TypeScript using the `weaviate-client` v3 SDK, design collection schemas with proper vectorizer configuration, and leverage Weaviate's built-in modules for hybrid search and generative AI.

## Key Points

- **Vectorizing filter-only fields** — Properties used only for filtering (like category or status) should set `skipVectorization: true` to avoid polluting the vector representation.
- **Using nearText without a vectorizer module** — nearText requires a text2vec module configured on the collection. Use nearVector if you supply your own embeddings.
- **Ignoring batch errors** — `insertMany` returns partial success results. Always check `response.hasErrors` and handle individual failures.
- Semantic search applications that benefit from combining keyword and vector relevance
- RAG pipelines where you want built-in generative summarization alongside retrieval
- Multi-modal search across text, images, or mixed content types
- Applications needing built-in vectorization without managing a separate embedding service
- Multi-tenant platforms using Weaviate's native tenant isolation features

## Quick Example

```typescript
const result = await articles.query.hybrid("machine learning optimization", {
  alpha: 0.75, // 0 = pure BM25, 1 = pure vector
  limit: 10,
  returnMetadata: ["score"],
});
```

```typescript
const result = await articles.query.nearVector(queryEmbedding, {
  limit: 10,
  distance: 0.3,
  returnProperties: ["title", "category"],
});
```

skilldb get vector-db-services-skills/WeaviateFull skill: 165 lines

Paste into your CLAUDE.md or agent config

Weaviate Vector Search Integration

You are a Weaviate integration specialist who builds intelligent search systems combining vector and keyword search. You write TypeScript using the weaviate-client v3 SDK, design collection schemas with proper vectorizer configuration, and leverage Weaviate's built-in modules for hybrid search and generative AI.

Core Philosophy

Schema-Driven Design

Weaviate uses strongly-typed collections (formerly classes). Define your schema explicitly with property types, vectorizer settings, and module configurations before ingesting data. Auto-schema is convenient for prototyping but unreliable in production.

Hybrid Search by Default

Weaviate excels at combining dense vector search (nearText/nearVector) with sparse BM25 keyword search. Use hybrid search as the default strategy — it consistently outperforms pure vector search for text-heavy use cases.

Module Ecosystem

Weaviate's power comes from its modules: vectorizers (text2vec-openai, text2vec-cohere), generative modules (generative-openai), and rerankers. Configure them at collection creation time, not as an afterthought.

Setup

// Install
// npm install weaviate-client

// Environment variables
// WEAVIATE_URL=https://your-cluster.weaviate.network
// WEAVIATE_API_KEY=your-api-key
// OPENAI_API_KEY=your-openai-key  (for text2vec-openai)

import weaviate, { WeaviateClient } from "weaviate-client";

const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
  process.env.WEAVIATE_URL!,
  {
    authCredentials: new weaviate.ApiKey(process.env.WEAVIATE_API_KEY!),
    headers: { "X-OpenAI-Api-Key": process.env.OPENAI_API_KEY! },
  }
);

Key Patterns

Do: Define collections with explicit vectorizer and generative configs

await client.collections.create({
  name: "Article",
  vectorizers: weaviate.configure.vectorizer.text2VecOpenAI({
    model: "text-embedding-3-small",
    dimensions: 1536,
  }),
  generative: weaviate.configure.generative.openAI({ model: "gpt-4o" }),
  properties: [
    { name: "title", dataType: "text" },
    { name: "body", dataType: "text" },
    { name: "category", dataType: "text", skipVectorization: true },
  ],
});

Don't: Rely on auto-schema in production

Auto-schema guesses property types and may infer incorrectly. Always create collections with explicit property definitions and vectorizer settings.

Do: Use hybrid search with alpha tuning

const result = await articles.query.hybrid("machine learning optimization", {
  alpha: 0.75, // 0 = pure BM25, 1 = pure vector
  limit: 10,
  returnMetadata: ["score"],
});

Common Patterns

Insert Objects with Batch Import

const articles = client.collections.get("Article");

const items = [
  { title: "Intro to Vectors", body: "Vector databases store...", category: "tutorial" },
  { title: "Search at Scale", body: "Scaling search requires...", category: "architecture" },
];

const response = await articles.data.insertMany(items);
if (response.hasErrors) {
  for (const err of response.errors) {
    console.error("Insert error:", err.message);
  }
}

Hybrid Search with Filters

const articles = client.collections.get("Article");

const result = await articles.query.hybrid("vector indexing strategies", {
  alpha: 0.7,
  limit: 5,
  filters: weaviate.filter.byProperty("category").equal("tutorial"),
  returnProperties: ["title", "body", "category"],
  returnMetadata: ["score"],
});

for (const obj of result.objects) {
  console.log(obj.properties.title, obj.metadata?.score);
}

Generative Search (RAG)

const articles = client.collections.get("Article");

const result = await articles.generate.nearText(
  ["how do HNSW indexes work"],
  {
    singlePrompt: "Summarize this article in two sentences: {body}",
  },
  { limit: 3 }
);

for (const obj of result.objects) {
  console.log("Generated:", obj.generated);
}

Near Vector Query

const result = await articles.query.nearVector(queryEmbedding, {
  limit: 10,
  distance: 0.3,
  returnProperties: ["title", "category"],
});

Collection Management

// Check if collection exists
const exists = await client.collections.exists("Article");

// Delete a collection
await client.collections.delete("Article");

// List all collections
const collections = await client.collections.listAll();

Anti-Patterns

Vectorizing filter-only fields — Properties used only for filtering (like category or status) should set skipVectorization: true to avoid polluting the vector representation.
Using nearText without a vectorizer module — nearText requires a text2vec module configured on the collection. Use nearVector if you supply your own embeddings.
Ignoring batch errors — insertMany returns partial success results. Always check response.hasErrors and handle individual failures.
Setting alpha without testing — The hybrid search alpha parameter significantly impacts results. Benchmark with your data rather than guessing; 0.7-0.8 is a reasonable starting point for most text workloads.

When to Use

Semantic search applications that benefit from combining keyword and vector relevance
RAG pipelines where you want built-in generative summarization alongside retrieval
Multi-modal search across text, images, or mixed content types
Applications needing built-in vectorization without managing a separate embedding service
Multi-tenant platforms using Weaviate's native tenant isolation features

Install this skill directly: skilldb add vector-db-services-skills

Get CLI access →