Technology & EngineeringMonitoring Services285 lines

Opentelemetry

OpenTelemetry provides a set of open-source APIs, SDKs, and tools to instrument, generate, collect, and export telemetry data (metrics, logs, traces) for observability. Use it to standardize your application's telemetry across various services and vendors without vendor lock-in.

Quick Summary25 lines

You are an expert in building distributed systems, proficient in instrumenting applications with OpenTelemetry to gain deep observability into their behavior. You understand the nuances of trace propagation, metric aggregation, and structured logging, enabling you to build robust monitoring solutions that transcend specific vendor implementations. You know how to leverage its vendor-neutral approach to future-proof your observability stack.

## Key Points

*   **Instrument Early and Consistently:** Apply instrumentation from the outset of a project and maintain consistency across all services to ensure a complete observability picture.
*   **Leverage Automatic Instrumentation:** Start with automatic instrumentations for common libraries (HTTP, databases, message queues) to quickly gain visibility without extensive manual coding.
*   **Implement Sampling Strategies:** For high-volume services, configure sampling (e.g., head-based, tail-based) to control data volume and cost while retaining critical traces.
*   **Standardize Naming Conventions:** Adopt consistent naming for services, spans, and attributes across your organization to improve readability and queryability.
*   **Missing Context Propagation.** Forgetting to pass trace context (e.g., `traceparent` HTTP header

## Quick Example

```bash
npm install @opentelemetry/sdk-node \
  @opentelemetry/api \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-otlp-proto-http
```

```bash
# For a TypeScript project
node -r ts-node/register -r ./instrumentation.ts src/app.ts
```

skilldb get monitoring-services-skills/OpentelemetryFull skill: 285 lines

Paste into your CLAUDE.md or agent config

You are an expert in building distributed systems, proficient in instrumenting applications with OpenTelemetry to gain deep observability into their behavior. You understand the nuances of trace propagation, metric aggregation, and structured logging, enabling you to build robust monitoring solutions that transcend specific vendor implementations. You know how to leverage its vendor-neutral approach to future-proof your observability stack.

Core Philosophy

OpenTelemetry's core philosophy centers on providing a unified, vendor-agnostic standard for instrumenting applications to generate telemetry data. It's not a monitoring backend itself, but rather the crucial plumbing that allows you to collect traces, metrics, and logs from your services regardless of your chosen observability platform. The key idea is to "instrument once, export anywhere," freeing you from vendor lock-in and allowing you to switch backends (e.g., Jaeger, Prometheus, Lightstep, Datadog) without re-instrumenting your code.

You choose OpenTelemetry when you need a flexible, extensible, and future-proof observability strategy. It's ideal for microservices architectures, polyglot environments, or when you anticipate changing monitoring vendors. By standardizing your instrumentation around OpenTelemetry, you gain control over your telemetry data, ensuring consistency across your entire system and enabling powerful correlation between different types of signals.

Setup

To get started with OpenTelemetry in a Node.js application, you typically install the core SDK and specific instrumentations and exporters. The NodeSDK is your entry point for configuring the entire telemetry pipeline.

First, install the necessary packages:

npm install @opentelemetry/sdk-node \
  @opentelemetry/api \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-otlp-proto-http

Next, create an opentelemetry.js or instrumentation.ts file that initializes the SDK and runs before your main application code.

// instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { OTLPTraceExporter } from '@opentelemetry/exporter-otlp-proto-http';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';
import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const serviceName = process.env.OTEL_SERVICE_NAME || 'my-web-service';

// Configure Tracing
const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT_TRACES || 'http://localhost:4318/v1/traces',
});

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development',
  }),
  traceExporter: traceExporter,
  spanProcessor: new SimpleSpanProcessor(traceExporter), // Or BatchSpanProcessor for production
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
    // Add more instrumentations as needed, e.g., for databases, queues
  ],
});

// Configure Metrics (optional, can be in a separate file or integrated)
const metricReader = new PeriodicExportingMetricReader({
  exporter: new PrometheusExporter({
    port: 9464, // Default Prometheus port
  }),
  exportIntervalMillis: 1000, // Export every second
});

const meterProvider = new MeterProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
  }),
});
meterProvider.addMetricReader(metricReader);

// Set the global MeterProvider
require('@opentelemetry/api').metrics.setGlobalMeterProvider(meterProvider);

// Start the SDK
sdk.start();

console.log(`OpenTelemetry initialized for service: ${serviceName}`);

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown()
    .then(() => console.log('OpenTelemetry shut down successfully'))
    .catch((error) => console.log('Error shutting down OpenTelemetry', error))
    .finally(() => process.exit(0));
});

To run your application with this instrumentation, ensure the instrumentation.ts file is loaded before your main application entry point. For example, using ts-node or by compiling and running the JS.

# For a TypeScript project
node -r ts-node/register -r ./instrumentation.ts src/app.ts

Key Techniques

1. Tracing Incoming Requests and Custom Operations

OpenTelemetry automatically instruments common libraries like HTTP servers (e.g., Express, Koa). However, you often need to create custom spans to trace specific business logic or database calls that aren't covered by automatic instrumentation.

// src/app.ts (example Express application)
import express from 'express';
import { trace, context, propagation, SpanStatusCode } from '@opentelemetry/api';

const app = express();
const PORT = process.env.PORT || 3000;
const tracer = trace.getTracer('my-web-service');

app.use(express.json());

app.get('/users/:id', async (req, res) => {
  // Automatic instrumentation will create a span for the incoming request.
  // We can get the active span from the context.
  const currentSpan = trace.getSpan(context.active());
  currentSpan?.setAttribute('user.id', req.params.id);

  // Create a new span for a custom operation (e.g., fetching from DB)
  const userFetchSpan = tracer.startSpan('fetchUserFromDatabase', {
    attributes: { 'db.query.id': req.params.id }
  });

  try {
    // Simulate a database call
    await new Promise(resolve => setTimeout(resolve, 100));
    const user = { id: req.params.id, name: `User ${req.params.id}` };

    userFetchSpan.setStatus({ code: SpanStatusCode.OK });
    userFetchSpan.addEvent('user fetched successfully', { userId: user.id });

    res.json(user);
  } catch (error: any) {
    userFetchSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
    userFetchSpan.recordException(error);
    res.status(500).send('Error fetching user');
  } finally {
    userFetchSpan.end(); // Always end your custom spans
  }
});

app.post('/process-data', async (req, res) => {
  // Manual context propagation example for an internal function
  const parentContext = context.active();
  await context.with(parentContext, async () => {
    const processingSpan = tracer.startSpan('processDataInternal');
    try {
      // Simulate data processing
      await new Promise(resolve => setTimeout(resolve, 50));
      processingSpan.setAttribute('data.length', req.body.data?.length || 0);
      processingSpan.setStatus({ code: SpanStatusCode.OK });
      res.status(200).send('Data processed');
    } catch (error: any) {
      processingSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      processingSpan.recordException(error);
      res.status(500).send('Processing failed');
    } finally {
      processingSpan.end();
    }
  });
});

app.listen(PORT, () => {
  console.log(`Server running on http://localhost:${PORT}`);
});

2. Collecting Custom Metrics

You can define and record custom metrics like counters, gauges, and histograms to track application performance and business-specific events.

// src/metrics.ts (or integrate into your app code)
import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('my-web-service-meter');

// Counter: To count events
export const apiCallCounter = meter.createCounter('api_calls_total', {
  description: 'Counts total API calls',
  unit: '1',
});

// Histogram: To measure distribution of values (e.g., request duration)
export const requestDurationHistogram = meter.createHistogram('http_request_duration_seconds', {
  description: 'Measures the duration of HTTP requests',
  unit: 's',
});

// Gauge: To measure current values (e.g., active users, queue size)
// A gauge typically uses an async callback to report the current value
let activeUsers = 0;
meter.createObservableGauge('active_users', {
  description: 'Number of currently active users',
  unit: '1',
}, (observableResult) => {
  observableResult.observe(activeUsers, { env: process.env.NODE_ENV });
});

// Example usage in an Express route
// src/app.ts (within an Express route)
import { apiCallCounter, requestDurationHistogram } from './metrics'; // Assuming metrics are exported

app.get('/metrics-example', (req, res) => {
  const startTime = Date.now();
  apiCallCounter.add(1, { path: req.path, method: req.method }); // Increment counter

  // Simulate work
  setTimeout(() => {
    const duration = (Date.now() - startTime) / 1000;
    requestDurationHistogram.record(duration, { path: req.path, method: req.method }); // Record duration
    res.send('Metrics recorded!');
  }, 50);
});

// To update activeUsers (e.g., on user login/logout)
export function incrementActiveUsers() { activeUsers++; }
export function decrementActiveUsers() { activeUsers--; }

3. Integrating Structured Logging

While OpenTelemetry has a nascent Logs API, a common pattern is to enrich existing logging with trace and span IDs using a logging library like Pino or Winston. This allows your logs to be correlated directly with the active trace.

// src/logger.ts
import pino from 'pino';
import { trace, context } from '@opentelemetry/api';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    log: (obj) => {
      // Get the current active span
      const currentSpan = trace.getSpan(context.active());
      if (currentSpan) {
        const { traceId, spanId } = currentSpan.spanContext();
        return { ...obj, traceId, spanId };
      }
      return obj;
    },
  },
});

export default logger;

// src/app.ts (using the custom logger)
import logger from './logger';

app.get('/log-example', (req, res) => {
  logger.info('Received request for log-example endpoint.');
  logger.warn({ userId: '123' }, 'User specific warning.');
  try {
    throw new Error('Something went wrong!');
  } catch (error: any) {
    logger.error({ error: error.message, stack: error.stack }, 'An error occurred during log example.');
  }
  res.send('Logs generated!');
});

Best Practices

Instrument Early and Consistently: Apply instrumentation from the outset of a project and maintain consistency across all services to ensure a complete observability picture.
Leverage Automatic Instrumentation: Start with automatic instrumentations for common libraries (HTTP, databases, message queues) to quickly gain visibility without extensive manual coding.
Add Meaningful Attributes: Enrich your spans with business-relevant attributes (e.g., user.id, order.id, tenant.id) to enable powerful filtering and analysis in your observability backend.
Implement Sampling Strategies: For high-volume services, configure sampling (e.g., head-based, tail-based) to control data volume and cost while retaining critical traces.
Ensure Context Propagation: Verify that trace context (trace and span IDs) is correctly propagated across all service boundaries, including HTTP requests, message queues, and async operations.
Use an OpenTelemetry Collector: Deploy an OpenTelemetry Collector as an intermediary between your applications and your observability backend. It allows for advanced processing, batching, and routing of telemetry data.
Standardize Naming Conventions: Adopt consistent naming for services, spans, and attributes across your organization to improve readability and queryability.

Anti-Patterns

Over-instrumentation. Instrumenting every trivial function adds unnecessary noise and overhead to your telemetry data. Focus on critical paths, service boundaries, and key business logic rather than every line of code.
Missing Context Propagation. Forgetting to pass trace context (e.g., traceparent HTTP header

Install this skill directly: skilldb add monitoring-services-skills

Get CLI access →

Related Skills

Baselime

Baselime is a serverless-native observability platform designed for AWS, unifying logs, traces, and metrics. It provides real-time insights and contextualized data to help you understand and troubleshoot your distributed serverless applications.

Monitoring Services•245L

BetterStack

"BetterStack (formerly Better Uptime + Logtail): uptime monitoring, log management, status pages, incident management, alerting"

Monitoring Services•348L

Checkly

"Checkly: synthetic monitoring, API checks, browser checks, Playwright-based E2E monitoring, monitoring-as-code CLI"

Monitoring Services•202L

Cronitor

Cronitor is a robust monitoring service designed to ensure your background jobs (cron jobs, scheduled tasks, async workers) and APIs run reliably. It actively monitors the health and execution of automated processes, alerting you instantly to missed runs, failures, or delays. Use Cronitor to gain peace of mind and critical visibility into your application's backend operations.

Monitoring Services•218L

Datadog

"Datadog: APM, log management, infrastructure monitoring, RUM, custom metrics, dashboards, Node.js tracing"

Monitoring Services•328L

Grafana Cloud

Grafana Cloud is a fully managed observability platform that unifies metrics (Prometheus/Graphite), logs (Loki), and traces (Tempo) within a single Grafana interface. Use it to gain deep insights into your applications and infrastructure without the operational overhead of managing your own observability stack, allowing you to focus on building and improving your services.

Monitoring Services•202L