Technology & EngineeringSearch Services356 lines

Elasticsearch

"Elasticsearch: full-text search, aggregations, mapping, bulk indexing, Node.js client, relevance tuning"

Quick Summary18 lines

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Its core tenets are:

## Key Points

- **Schema flexibility** — fields can be dynamically mapped or explicitly defined. Explicit mappings give you control over how text is analyzed and how fields are indexed.
- **Query DSL** — a powerful JSON-based query language supports full-text search, structured filters, aggregations, geo queries, and more in a single request.
- **Distributed by design** — indices are split into shards and replicated across nodes. Scaling is horizontal.
- **Near-real-time** — documents are searchable within one second of indexing by default (configurable refresh interval).
- **Aggregation engine** — beyond search, Elasticsearch performs analytics (bucketing, metrics, pipelines) directly on indexed data.
- **Always use explicit mappings.** Dynamic mapping is convenient for prototyping but causes type conflicts and wasted storage in production.
- **Use `bulk` for indexing.** Single-document indexing is orders of magnitude slower. Batch sizes of 1,000-5,000 documents work well.
- **Put filters in the `filter` context.** Filter clauses are cacheable and skip scoring, making them faster than `must` for non-text conditions.
- **Use `keyword` sub-fields for aggregations and sorting.** Text fields are analyzed and cannot be used for exact terms aggregations.
- **Set `number_of_replicas: 0` during initial bulk loads**, then increase replicas afterward. This speeds up indexing significantly.
- **Use index aliases** to decouple application code from physical index names. This enables zero-downtime reindexing and blue-green deployments.
- **Monitor shard sizes.** Keep shards between 10-50 GB. Over-sharding wastes resources; under-sharding limits parallelism.

skilldb get search-services-skills/ElasticsearchFull skill: 356 lines

Paste into your CLAUDE.md or agent config

Elasticsearch

Core Philosophy

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Its core tenets are:

Schema flexibility — fields can be dynamically mapped or explicitly defined. Explicit mappings give you control over how text is analyzed and how fields are indexed.
Query DSL — a powerful JSON-based query language supports full-text search, structured filters, aggregations, geo queries, and more in a single request.
Distributed by design — indices are split into shards and replicated across nodes. Scaling is horizontal.
Near-real-time — documents are searchable within one second of indexing by default (configurable refresh interval).
Aggregation engine — beyond search, Elasticsearch performs analytics (bucketing, metrics, pipelines) directly on indexed data.

Elasticsearch suits workloads ranging from site search to log analytics to security information and event management (SIEM).

Setup

Install the official Node.js client and connect:

// npm install @elastic/elasticsearch

import { Client } from "@elastic/elasticsearch";

const client = new Client({
  node: "http://localhost:9200",
  auth: { username: "elastic", password: "changeme" },
  // For Elastic Cloud:
  // cloud: { id: "deployment:abc123..." },
  // auth: { apiKey: "base64key" },
});

// Verify connection
const info = await client.info();
console.log(`Connected to Elasticsearch ${info.version.number}`);

Create an index with explicit mappings:

async function createProductIndex() {
  await client.indices.create({
    index: "products",
    body: {
      settings: {
        number_of_shards: 1,
        number_of_replicas: 1,
        analysis: {
          analyzer: {
            product_analyzer: {
              type: "custom",
              tokenizer: "standard",
              filter: ["lowercase", "asciifolding", "edge_ngram_filter"],
            },
          },
          filter: {
            edge_ngram_filter: {
              type: "edge_ngram",
              min_gram: 2,
              max_gram: 15,
            },
          },
        },
      },
      mappings: {
        properties: {
          name: {
            type: "text",
            analyzer: "product_analyzer",
            search_analyzer: "standard",
            fields: { keyword: { type: "keyword" } },
          },
          description: { type: "text" },
          price: { type: "float" },
          categories: { type: "keyword" },
          brand: { type: "keyword" },
          rating: { type: "float" },
          created_at: { type: "date" },
          location: { type: "geo_point" },
          in_stock: { type: "boolean" },
        },
      },
    },
  });
}

Key Techniques

Bulk Indexing

async function bulkIndex(products: Product[]) {
  const operations = products.flatMap((doc) => [
    { index: { _index: "products", _id: doc.id } },
    doc,
  ]);

  const { errors, items } = await client.bulk({
    refresh: true,
    operations,
  });

  if (errors) {
    const failedItems = items.filter((item) => item.index?.error);
    console.error(`${failedItems.length} documents failed`, failedItems);
  }
}

// Stream large datasets with a helper
async function bulkIndexStream(products: AsyncIterable<Product>) {
  const { total, failed, errors } = await client.helpers.bulk({
    datasource: products,
    onDocument(doc: Product) {
      return { index: { _index: "products", _id: doc.id } };
    },
    refreshOnCompletion: true,
  });

  console.log(`Indexed ${total} documents, ${failed} failed`);
  if (errors.length > 0) console.error("Errors:", errors);
}

Full-Text Search with Filters

async function searchProducts(query: string, filters: ProductFilters = {}) {
  const must: any[] = [];
  const filterClauses: any[] = [];

  if (query) {
    must.push({
      multi_match: {
        query,
        fields: ["name^3", "description"],
        type: "best_fields",
        fuzziness: "AUTO",
      },
    });
  }

  if (filters.categories?.length) {
    filterClauses.push({ terms: { categories: filters.categories } });
  }
  if (filters.minPrice !== undefined || filters.maxPrice !== undefined) {
    filterClauses.push({
      range: {
        price: {
          ...(filters.minPrice !== undefined && { gte: filters.minPrice }),
          ...(filters.maxPrice !== undefined && { lte: filters.maxPrice }),
        },
      },
    });
  }
  if (filters.inStock !== undefined) {
    filterClauses.push({ term: { in_stock: filters.inStock } });
  }

  const response = await client.search<Product>({
    index: "products",
    body: {
      query: {
        bool: {
          must: must.length > 0 ? must : [{ match_all: {} }],
          filter: filterClauses,
        },
      },
      highlight: {
        fields: {
          name: { number_of_fragments: 0 },
          description: { fragment_size: 150, number_of_fragments: 3 },
        },
        pre_tags: ["<mark>"],
        post_tags: ["</mark>"],
      },
      from: filters.offset ?? 0,
      size: filters.limit ?? 20,
    },
  });

  return {
    hits: response.hits.hits.map((h) => ({
      ...h._source!,
      _score: h._score,
      _highlight: h.highlight,
    })),
    total:
      typeof response.hits.total === "number"
        ? response.hits.total
        : response.hits.total?.value ?? 0,
  };
}

interface ProductFilters {
  categories?: string[];
  minPrice?: number;
  maxPrice?: number;
  inStock?: boolean;
  offset?: number;
  limit?: number;
}

Aggregations

async function getProductAggregations(query?: string) {
  const response = await client.search({
    index: "products",
    body: {
      size: 0, // no hits, only aggregations
      query: query ? { match: { name: query } } : { match_all: {} },
      aggs: {
        categories: {
          terms: { field: "categories", size: 20 },
        },
        brands: {
          terms: { field: "brand", size: 10 },
        },
        price_ranges: {
          range: {
            field: "price",
            ranges: [
              { key: "budget", to: 25 },
              { key: "mid", from: 25, to: 100 },
              { key: "premium", from: 100 },
            ],
          },
        },
        avg_rating: {
          avg: { field: "rating" },
        },
        price_stats: {
          stats: { field: "price" },
        },
      },
    },
  });

  return response.aggregations;
}

Relevance Tuning with Function Score

async function boostedSearch(query: string) {
  return client.search<Product>({
    index: "products",
    body: {
      query: {
        function_score: {
          query: {
            multi_match: {
              query,
              fields: ["name^3", "description"],
              fuzziness: "AUTO",
            },
          },
          functions: [
            {
              // Boost highly rated products
              field_value_factor: {
                field: "rating",
                factor: 1.2,
                modifier: "log1p",
                missing: 1,
              },
            },
            {
              // Boost in-stock items
              filter: { term: { in_stock: true } },
              weight: 2,
            },
            {
              // Decay older products
              gauss: {
                created_at: {
                  origin: "now",
                  scale: "30d",
                  decay: 0.5,
                },
              },
            },
          ],
          score_mode: "multiply",
          boost_mode: "multiply",
        },
      },
    },
  });
}

Index Aliases for Zero-Downtime Reindexing

async function reindex() {
  const newIndex = `products_${Date.now()}`;

  // Create the new index with the same mappings
  const { mappings, settings } = await client.indices.get({
    index: "products",
  }).then((r) => Object.values(r)[0]);

  await client.indices.create({
    index: newIndex,
    body: { mappings, settings: { index: { number_of_shards: settings?.index?.number_of_shards } } },
  });

  // Reindex data
  await client.reindex({
    body: {
      source: { index: "products" },
      dest: { index: newIndex },
    },
    wait_for_completion: true,
  });

  // Swap alias atomically
  await client.indices.updateAliases({
    body: {
      actions: [
        { remove: { index: "products_*", alias: "products" } },
        { add: { index: newIndex, alias: "products" } },
      ],
    },
  });
}

Best Practices

Always use explicit mappings. Dynamic mapping is convenient for prototyping but causes type conflicts and wasted storage in production.
Use bulk for indexing. Single-document indexing is orders of magnitude slower. Batch sizes of 1,000-5,000 documents work well.
Put filters in the filter context. Filter clauses are cacheable and skip scoring, making them faster than must for non-text conditions.
Use keyword sub-fields for aggregations and sorting. Text fields are analyzed and cannot be used for exact terms aggregations.
Set number_of_replicas: 0 during initial bulk loads, then increase replicas afterward. This speeds up indexing significantly.
Use index aliases to decouple application code from physical index names. This enables zero-downtime reindexing and blue-green deployments.
Monitor shard sizes. Keep shards between 10-50 GB. Over-sharding wastes resources; under-sharding limits parallelism.

Anti-Patterns

Using Elasticsearch as a primary database. It is not ACID-compliant. Always keep a source-of-truth store and treat Elasticsearch as a derived index.
Mapping everything as text. Numeric, date, keyword, and boolean fields should use their native types for correct filtering, sorting, and aggregation.
Deep pagination with from + size. Beyond 10,000 results this is rejected by default. Use search_after or the Scroll API for deep pagination.
Creating one index per user or per tenant. This leads to thousands of small indices and shard explosion. Use filtered aliases or a tenant ID field instead.
Not handling bulk errors. The bulk API returns a 200 even when individual documents fail. Always inspect the errors flag and items array.
Running unscoped match_all queries in production. They can return massive result sets and stress the cluster. Always set a reasonable size.
Ignoring the refresh_interval. Calling refresh=true on every index operation kills performance. Use the default 1-second interval or batch refreshes.

Install this skill directly: skilldb add search-services-skills

Get CLI access →

Related Skills

Algolia

"Algolia: instant search, faceted search, InstantSearch.js/React, indexing, ranking, search analytics"

Search Services•257L

Fuse Js

Fuse.js is a lightweight, powerful fuzzy-search library for JavaScript that runs entirely client-side. It's ideal for quickly adding flexible, typo-tolerant search capabilities to web applications without server-side infrastructure.

Search Services•257L

Lunr

Lunr is a small, fast JavaScript search library for browsers and Node.js. It allows you to build a search index directly within your application, providing full-text search capabilities without a backend API or external service. It's ideal for static sites, documentation, or client-side applications requiring offline-capable search.

Search Services•228L

Manticore Search

"Manticore Search: open-source full-text search, SQL-based queries, real-time indexes, columnar storage, Elasticsearch-compatible API"

Search Services•247L

Meilisearch

"Meilisearch: self-hosted search engine, typo tolerance, faceting, filtering, sorting, REST API, JavaScript SDK"

Search Services•291L

Opensearch

OpenSearch is a community-driven, open-source search and analytics suite derived from Elasticsearch. It's ideal for powering full-text search, log analytics, security monitoring, and real-time application monitoring, offering powerful scalability and flexibility for diverse data needs.

Search Services•242L