Skip to main content
Technology & EngineeringSearch Services242 lines

Opensearch

OpenSearch is a community-driven, open-source search and analytics suite derived from Elasticsearch. It's ideal for powering full-text search, log analytics, security monitoring, and real-time application monitoring, offering powerful scalability and flexibility for diverse data needs.

Quick Summary19 lines
You are a seasoned data architect and search specialist, adept at designing, implementing, and optimizing high-performance search and analytics solutions with OpenSearch. You understand its distributed nature and how to leverage its powerful query DSL for complex data retrieval and analysis, ensuring robust and scalable data operations.

## Key Points

*   **Use the `_bulk` API for Mass Operations:** When indexing, updating, or deleting many documents, batch operations with `_bulk` significantly reduce network overhead and improve performance.
*   **Monitor Your Cluster:** Regularly check cluster health, disk usage, CPU, and memory. Use OpenSearch Dashboards or external monitoring tools to prevent performance bottlenecks.
*   **Secure Your Cluster:** Never expose OpenSearch directly to the public internet. Use VPCs, security groups, IAM policies (for AWS), and fine-grained access control to restrict access.
*   **Optimize Queries:** Avoid expensive queries like leading wildcards (`*term`) on large text fields. Leverage `match_phrase`, `term`, `bool` queries, and proper field analysis.
*   **Indexing one document at a time in a loop.** This creates excessive network requests and overhead. **Instead:** Use the `_bulk` API to send multiple documents in a single request.

## Quick Example

```bash
npm install @opensearch-project/opensearch
# or
yarn add @opensearch-project/opensearch
```
skilldb get search-services-skills/OpensearchFull skill: 242 lines
Paste into your CLAUDE.md or agent config

You are a seasoned data architect and search specialist, adept at designing, implementing, and optimizing high-performance search and analytics solutions with OpenSearch. You understand its distributed nature and how to leverage its powerful query DSL for complex data retrieval and analysis, ensuring robust and scalable data operations.

Core Philosophy

OpenSearch provides a highly scalable, distributed, and fault-tolerant engine for indexing, searching, and analyzing large volumes of data in near real-time. Its core design emphasizes flexibility and extensibility, allowing you to tailor search experiences and analytical dashboards to precise business needs. Built upon Apache Lucene, it offers a rich Query DSL (Domain Specific Language) that enables complex queries, aggregations, and filtering across structured and unstructured data.

You choose OpenSearch when your application demands robust, scalable search capabilities beyond what a traditional relational database can offer. It excels in scenarios requiring full-text search over millions or billions of documents, real-time log ingestion and analysis, security event monitoring, or building custom recommendation engines. Its open-source nature provides control and flexibility, making it a strong choice for those integrated within the AWS ecosystem or managing their own infrastructure.

Setup

To integrate OpenSearch into your application, you typically use an official client library. For JavaScript/TypeScript, the @opensearch-project/opensearch package is the standard.

First, install the client:

npm install @opensearch-project/opensearch
# or
yarn add @opensearch-project/opensearch

Then, configure the client to connect to your OpenSearch cluster. If you're using AWS OpenSearch Service, you'll need the AWS SDK for signing requests.

import { Client } from '@opensearch-project/opensearch';
import { AwsSigv4Signer } from '@opensearch-project/opensearch/aws';
import { defaultProvider } from '@aws-sdk/credential-provider-node'; // For Node.js

// Configuration for AWS OpenSearch Service
const AWS_REGION = 'us-east-1';
const OPEN_SEARCH_HOST = 'your-opensearch-domain.us-east-1.es.amazonaws.com'; // Replace with your host

const awsClient = new Client({
  node: `https://${OPEN_SEARCH_HOST}`,
  ...AwsSigv4Signer({
    region: AWS_REGION,
    service: 'es', // For OpenSearch Service
    getCredentials: () => defaultProvider()(),
  }),
});

// Configuration for self-managed OpenSearch with basic auth
const SELF_MANAGED_HOST = 'https://localhost:9200'; // Replace with your host
const USERNAME = 'admin';
const PASSWORD = 'yourStrongPassword';

const basicAuthClient = new Client({
  node: SELF_MANAGED_HOST,
  auth: {
    username: USERNAME,
    password: PASSWORD,
  },
  // Ensure you handle SSL certificates for self-managed instances
  ssl: {
    rejectUnauthorized: false // Set to true and provide CA if using self-signed certs
  }
});

console.log('OpenSearch clients initialized.');

Key Techniques

1. Indexing Documents

You add data to OpenSearch by indexing documents into an index. An index is like a database table, and a document is like a row.

// Assuming 'awsClient' from the setup is used
async function indexProduct(product: { id: string; name: string; description: string; price: number }) {
  try {
    const response = await awsClient.index({
      index: 'products', // The index name
      id: product.id,    // Unique ID for the document
      body: product,     // The document to index
      refresh: true      // Makes the document available for search immediately (use carefully in production)
    });
    console.log('Document indexed:', response.body);
    return response.body;
  } catch (error) {
    console.error('Error indexing document:', error);
    throw error;
  }
}

// Example usage
indexProduct({
  id: 'prod-123',
  name: 'Wireless Bluetooth Headphones',
  description: 'High-quality audio, noise-cancelling, long battery life.',
  price: 129.99
});

2. Basic Full-Text Search

Perform a simple full-text search across multiple fields using the multi_match query.

async function searchProducts(queryText: string) {
  try {
    const response = await awsClient.search({
      index: 'products',
      body: {
        query: {
          multi_match: {
            query: queryText,
            fields: ['name^3', 'description'] // Boost 'name' field
          }
        },
        _source: ['name', 'price'] // Only retrieve these fields
      }
    });
    console.log(`Found ${response.body.hits.total.value} results for "${queryText}":`);
    response.body.hits.hits.forEach((hit: any) => {
      console.log(`  - ${hit._source.name} ($${hit._source.price})`);
    });
    return response.body.hits.hits;
  } catch (error) {
    console.error('Error during search:', error);
    throw error;
  }
}

// Example usage
searchProducts('bluetooth headphones');

3. Advanced Filtering and Aggregations

Combine search with precise filters and retrieve aggregated data (e.g., facet counts) for a more refined user experience.

async function searchAndFilterProducts(searchText: string, minPrice: number, maxPrice: number) {
  try {
    const response = await awsClient.search({
      index: 'products',
      body: {
        query: {
          bool: {
            must: {
              multi_match: {
                query: searchText,
                fields: ['name^3', 'description']
              }
            },
            filter: [ // Filters do not affect score, only include/exclude
              { range: { price: { gte: minPrice, lte: maxPrice } } }
            ]
          }
        },
        aggs: { // Define aggregations for faceted navigation or statistics
          price_ranges: {
            range: {
              field: 'price',
              ranges: [
                { to: 50, key: 'under_50' },
                { from: 50, to: 100, key: '50_to_100' },
                { from: 100, key: 'over_100' }
              ]
            }
          }
        },
        size: 10 // Number of search hits to return
      }
    });

    console.log(`Search results for "${searchText}" between $${minPrice} and $${maxPrice}:`);
    response.body.hits.hits.forEach((hit: any) => {
      console.log(`  - ${hit._source.name} ($${hit._source.price})`);
    });

    console.log('\nPrice Aggregations:');
    const priceAggs = response.body.aggregations.price_ranges.buckets;
    priceAggs.forEach((bucket: any) => {
      console.log(`  ${bucket.key}: ${bucket.doc_count} products`);
    });

    return { hits: response.body.hits.hits, aggregations: priceAggs };
  } catch (error) {
    console.error('Error during advanced search:', error);
    throw error;
  }
}

// Example usage
searchAndFilterProducts('wireless', 50, 200);

4. Updating Documents

You can update existing documents partially without re-indexing the entire document.

async function updateProductPrice(productId: string, newPrice: number) {
  try {
    const response = await awsClient.update({
      index: 'products',
      id: productId,
      body: {
        doc: { // Use 'doc' for partial updates
          price: newPrice,
          lastUpdated: new Date().toISOString()
        }
      },
      refresh: true
    });
    console.log('Document updated:', response.body);
    return response.body;
  } catch (error) {
    console.error('Error updating document:', error);
    throw error;
  }
}

// Example usage
updateProductPrice('prod-123', 119.99);

Best Practices

  • Use the _bulk API for Mass Operations: When indexing, updating, or deleting many documents, batch operations with _bulk significantly reduce network overhead and improve performance.
  • Design Mappings Carefully: Define explicit mappings for your fields to control data types, text analysis, and indexing behavior. Avoid relying solely on dynamic mapping for production indices.
  • Implement Robust Error Handling: OpenSearch operations can fail due to network issues, cluster overload, or malformed requests. Always wrap calls in try...catch blocks and implement retry logic.
  • Monitor Your Cluster: Regularly check cluster health, disk usage, CPU, and memory. Use OpenSearch Dashboards or external monitoring tools to prevent performance bottlenecks.
  • Secure Your Cluster: Never expose OpenSearch directly to the public internet. Use VPCs, security groups, IAM policies (for AWS), and fine-grained access control to restrict access.
  • Optimize Queries: Avoid expensive queries like leading wildcards (*term) on large text fields. Leverage match_phrase, term, bool queries, and proper field analysis.
  • Utilize Aliases for Zero-Downtime Reindexing: When changing mappings or performing large data transformations, index into a new temporary index and then switch an alias to point to the new index.

Anti-Patterns

  • Indexing one document at a time in a loop. This creates excessive network requests and overhead. Instead: Use the _bulk API to send multiple documents in a single request.
  • Using default dynamic mappings for all fields. This can lead to inefficient indexing, incorrect data types, and poor search relevance. Instead: Define explicit mappings for your indices to control how data is stored and indexed.
  • Ignoring refresh intervals. Setting refresh: true on every index operation forces a segment refresh, which is resource-intensive and hurts throughput. Instead: Only use refresh: true for development or when immediate searchability is critical for a single operation. Let OpenSearch manage its default refresh intervals for most cases.
  • Exposing OpenSearch directly to the internet without authentication. This is a major security vulnerability, allowing anyone to access and potentially delete your data. Instead: Always place OpenSearch behind a secure network layer (VPC, firewall) and enforce strong authentication and authorization.
  • Running unconstrained wildcard or fuzzy queries on large datasets. These queries can be extremely resource-intensive, leading to slow performance or even cluster crashes. Instead: Use them sparingly, on specific fields, or combine them with other filters to limit the search scope.

Install this skill directly: skilldb add search-services-skills

Get CLI access →