Elasticsearch
"Elasticsearch: full-text search, aggregations, mapping, bulk indexing, Node.js client, relevance tuning"
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Its core tenets are: ## Key Points - **Schema flexibility** — fields can be dynamically mapped or explicitly defined. Explicit mappings give you control over how text is analyzed and how fields are indexed. - **Query DSL** — a powerful JSON-based query language supports full-text search, structured filters, aggregations, geo queries, and more in a single request. - **Distributed by design** — indices are split into shards and replicated across nodes. Scaling is horizontal. - **Near-real-time** — documents are searchable within one second of indexing by default (configurable refresh interval). - **Aggregation engine** — beyond search, Elasticsearch performs analytics (bucketing, metrics, pipelines) directly on indexed data. - **Always use explicit mappings.** Dynamic mapping is convenient for prototyping but causes type conflicts and wasted storage in production. - **Use `bulk` for indexing.** Single-document indexing is orders of magnitude slower. Batch sizes of 1,000-5,000 documents work well. - **Put filters in the `filter` context.** Filter clauses are cacheable and skip scoring, making them faster than `must` for non-text conditions. - **Use `keyword` sub-fields for aggregations and sorting.** Text fields are analyzed and cannot be used for exact terms aggregations. - **Set `number_of_replicas: 0` during initial bulk loads**, then increase replicas afterward. This speeds up indexing significantly. - **Use index aliases** to decouple application code from physical index names. This enables zero-downtime reindexing and blue-green deployments. - **Monitor shard sizes.** Keep shards between 10-50 GB. Over-sharding wastes resources; under-sharding limits parallelism.
skilldb get search-services-skills/ElasticsearchFull skill: 356 linesElasticsearch
Core Philosophy
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Its core tenets are:
- Schema flexibility — fields can be dynamically mapped or explicitly defined. Explicit mappings give you control over how text is analyzed and how fields are indexed.
- Query DSL — a powerful JSON-based query language supports full-text search, structured filters, aggregations, geo queries, and more in a single request.
- Distributed by design — indices are split into shards and replicated across nodes. Scaling is horizontal.
- Near-real-time — documents are searchable within one second of indexing by default (configurable refresh interval).
- Aggregation engine — beyond search, Elasticsearch performs analytics (bucketing, metrics, pipelines) directly on indexed data.
Elasticsearch suits workloads ranging from site search to log analytics to security information and event management (SIEM).
Setup
Install the official Node.js client and connect:
// npm install @elastic/elasticsearch
import { Client } from "@elastic/elasticsearch";
const client = new Client({
node: "http://localhost:9200",
auth: { username: "elastic", password: "changeme" },
// For Elastic Cloud:
// cloud: { id: "deployment:abc123..." },
// auth: { apiKey: "base64key" },
});
// Verify connection
const info = await client.info();
console.log(`Connected to Elasticsearch ${info.version.number}`);
Create an index with explicit mappings:
async function createProductIndex() {
await client.indices.create({
index: "products",
body: {
settings: {
number_of_shards: 1,
number_of_replicas: 1,
analysis: {
analyzer: {
product_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "asciifolding", "edge_ngram_filter"],
},
},
filter: {
edge_ngram_filter: {
type: "edge_ngram",
min_gram: 2,
max_gram: 15,
},
},
},
},
mappings: {
properties: {
name: {
type: "text",
analyzer: "product_analyzer",
search_analyzer: "standard",
fields: { keyword: { type: "keyword" } },
},
description: { type: "text" },
price: { type: "float" },
categories: { type: "keyword" },
brand: { type: "keyword" },
rating: { type: "float" },
created_at: { type: "date" },
location: { type: "geo_point" },
in_stock: { type: "boolean" },
},
},
},
});
}
Key Techniques
Bulk Indexing
async function bulkIndex(products: Product[]) {
const operations = products.flatMap((doc) => [
{ index: { _index: "products", _id: doc.id } },
doc,
]);
const { errors, items } = await client.bulk({
refresh: true,
operations,
});
if (errors) {
const failedItems = items.filter((item) => item.index?.error);
console.error(`${failedItems.length} documents failed`, failedItems);
}
}
// Stream large datasets with a helper
async function bulkIndexStream(products: AsyncIterable<Product>) {
const { total, failed, errors } = await client.helpers.bulk({
datasource: products,
onDocument(doc: Product) {
return { index: { _index: "products", _id: doc.id } };
},
refreshOnCompletion: true,
});
console.log(`Indexed ${total} documents, ${failed} failed`);
if (errors.length > 0) console.error("Errors:", errors);
}
Full-Text Search with Filters
async function searchProducts(query: string, filters: ProductFilters = {}) {
const must: any[] = [];
const filterClauses: any[] = [];
if (query) {
must.push({
multi_match: {
query,
fields: ["name^3", "description"],
type: "best_fields",
fuzziness: "AUTO",
},
});
}
if (filters.categories?.length) {
filterClauses.push({ terms: { categories: filters.categories } });
}
if (filters.minPrice !== undefined || filters.maxPrice !== undefined) {
filterClauses.push({
range: {
price: {
...(filters.minPrice !== undefined && { gte: filters.minPrice }),
...(filters.maxPrice !== undefined && { lte: filters.maxPrice }),
},
},
});
}
if (filters.inStock !== undefined) {
filterClauses.push({ term: { in_stock: filters.inStock } });
}
const response = await client.search<Product>({
index: "products",
body: {
query: {
bool: {
must: must.length > 0 ? must : [{ match_all: {} }],
filter: filterClauses,
},
},
highlight: {
fields: {
name: { number_of_fragments: 0 },
description: { fragment_size: 150, number_of_fragments: 3 },
},
pre_tags: ["<mark>"],
post_tags: ["</mark>"],
},
from: filters.offset ?? 0,
size: filters.limit ?? 20,
},
});
return {
hits: response.hits.hits.map((h) => ({
...h._source!,
_score: h._score,
_highlight: h.highlight,
})),
total:
typeof response.hits.total === "number"
? response.hits.total
: response.hits.total?.value ?? 0,
};
}
interface ProductFilters {
categories?: string[];
minPrice?: number;
maxPrice?: number;
inStock?: boolean;
offset?: number;
limit?: number;
}
Aggregations
async function getProductAggregations(query?: string) {
const response = await client.search({
index: "products",
body: {
size: 0, // no hits, only aggregations
query: query ? { match: { name: query } } : { match_all: {} },
aggs: {
categories: {
terms: { field: "categories", size: 20 },
},
brands: {
terms: { field: "brand", size: 10 },
},
price_ranges: {
range: {
field: "price",
ranges: [
{ key: "budget", to: 25 },
{ key: "mid", from: 25, to: 100 },
{ key: "premium", from: 100 },
],
},
},
avg_rating: {
avg: { field: "rating" },
},
price_stats: {
stats: { field: "price" },
},
},
},
});
return response.aggregations;
}
Relevance Tuning with Function Score
async function boostedSearch(query: string) {
return client.search<Product>({
index: "products",
body: {
query: {
function_score: {
query: {
multi_match: {
query,
fields: ["name^3", "description"],
fuzziness: "AUTO",
},
},
functions: [
{
// Boost highly rated products
field_value_factor: {
field: "rating",
factor: 1.2,
modifier: "log1p",
missing: 1,
},
},
{
// Boost in-stock items
filter: { term: { in_stock: true } },
weight: 2,
},
{
// Decay older products
gauss: {
created_at: {
origin: "now",
scale: "30d",
decay: 0.5,
},
},
},
],
score_mode: "multiply",
boost_mode: "multiply",
},
},
},
});
}
Index Aliases for Zero-Downtime Reindexing
async function reindex() {
const newIndex = `products_${Date.now()}`;
// Create the new index with the same mappings
const { mappings, settings } = await client.indices.get({
index: "products",
}).then((r) => Object.values(r)[0]);
await client.indices.create({
index: newIndex,
body: { mappings, settings: { index: { number_of_shards: settings?.index?.number_of_shards } } },
});
// Reindex data
await client.reindex({
body: {
source: { index: "products" },
dest: { index: newIndex },
},
wait_for_completion: true,
});
// Swap alias atomically
await client.indices.updateAliases({
body: {
actions: [
{ remove: { index: "products_*", alias: "products" } },
{ add: { index: newIndex, alias: "products" } },
],
},
});
}
Best Practices
- Always use explicit mappings. Dynamic mapping is convenient for prototyping but causes type conflicts and wasted storage in production.
- Use
bulkfor indexing. Single-document indexing is orders of magnitude slower. Batch sizes of 1,000-5,000 documents work well. - Put filters in the
filtercontext. Filter clauses are cacheable and skip scoring, making them faster thanmustfor non-text conditions. - Use
keywordsub-fields for aggregations and sorting. Text fields are analyzed and cannot be used for exact terms aggregations. - Set
number_of_replicas: 0during initial bulk loads, then increase replicas afterward. This speeds up indexing significantly. - Use index aliases to decouple application code from physical index names. This enables zero-downtime reindexing and blue-green deployments.
- Monitor shard sizes. Keep shards between 10-50 GB. Over-sharding wastes resources; under-sharding limits parallelism.
Anti-Patterns
- Using Elasticsearch as a primary database. It is not ACID-compliant. Always keep a source-of-truth store and treat Elasticsearch as a derived index.
- Mapping everything as
text. Numeric, date, keyword, and boolean fields should use their native types for correct filtering, sorting, and aggregation. - Deep pagination with
from+size. Beyond 10,000 results this is rejected by default. Usesearch_afteror the Scroll API for deep pagination. - Creating one index per user or per tenant. This leads to thousands of small indices and shard explosion. Use filtered aliases or a tenant ID field instead.
- Not handling bulk errors. The bulk API returns a 200 even when individual documents fail. Always inspect the
errorsflag anditemsarray. - Running unscoped
match_allqueries in production. They can return massive result sets and stress the cluster. Always set a reasonablesize. - Ignoring the
refresh_interval. Callingrefresh=trueon every index operation kills performance. Use the default 1-second interval or batch refreshes.
Install this skill directly: skilldb add search-services-skills
Related Skills
Algolia
"Algolia: instant search, faceted search, InstantSearch.js/React, indexing, ranking, search analytics"
Fuse Js
Fuse.js is a lightweight, powerful fuzzy-search library for JavaScript that runs entirely client-side. It's ideal for quickly adding flexible, typo-tolerant search capabilities to web applications without server-side infrastructure.
Lunr
Lunr is a small, fast JavaScript search library for browsers and Node.js. It allows you to build a search index directly within your application, providing full-text search capabilities without a backend API or external service. It's ideal for static sites, documentation, or client-side applications requiring offline-capable search.
Manticore Search
"Manticore Search: open-source full-text search, SQL-based queries, real-time indexes, columnar storage, Elasticsearch-compatible API"
Meilisearch
"Meilisearch: self-hosted search engine, typo tolerance, faceting, filtering, sorting, REST API, JavaScript SDK"
Opensearch
OpenSearch is a community-driven, open-source search and analytics suite derived from Elasticsearch. It's ideal for powering full-text search, log analytics, security monitoring, and real-time application monitoring, offering powerful scalability and flexibility for diverse data needs.