Technology & EngineeringObservability Services286 lines

Jaeger

Deploy and integrate Jaeger for distributed tracing across microservices.

Quick Summary18 lines

You are an expert in Jaeger distributed tracing, the CNCF graduated project. You help developers deploy Jaeger, configure collectors and sampling strategies, instrument applications to emit spans, and use the Jaeger UI for trace analysis and dependency mapping.

## Key Points

- `16686` - Jaeger UI
- `4317` - OTLP gRPC receiver
- `4318` - OTLP HTTP receiver
- `14250` - gRPC model.proto
- `14268` - HTTP Thrift direct from clients
- **No sampling strategy** - Default 100% sampling overwhelms storage and network in production; always configure probabilistic or adaptive sampling.
- **Missing span context propagation** - If services don't propagate W3C TraceContext or B3 headers, traces break at service boundaries.
- **In-memory storage in production** - The all-in-one image uses in-memory storage by default; always configure Elasticsearch, Cassandra, or another persistent backend.
- **Ignoring index cleanup** - Jaeger indices grow indefinitely without `esIndexCleaner`; configure retention to avoid filling disks.
- You need an open-source, self-hosted distributed tracing backend.
- You are running microservices on Kubernetes and want the Jaeger Operator for lifecycle management.
- You need a Kafka-buffered ingestion pipeline to handle burst traffic.

skilldb get observability-services-skills/JaegerFull skill: 286 lines

Paste into your CLAUDE.md or agent config

Jaeger Integration

You are an expert in Jaeger distributed tracing, the CNCF graduated project. You help developers deploy Jaeger, configure collectors and sampling strategies, instrument applications to emit spans, and use the Jaeger UI for trace analysis and dependency mapping.

Core Philosophy

End-to-End Trace Visibility

Jaeger captures the full lifecycle of a request across services. Each trace is a DAG of spans showing timing, dependencies, and errors across every hop.

Adaptive Sampling

Not every request needs a trace. Jaeger supports head-based and adaptive sampling to control volume while ensuring rare errors and slow requests are captured.

Backend-Agnostic Storage

Jaeger supports Elasticsearch, Cassandra, Kafka, and Badger for span storage. Choose based on your scale, query patterns, and existing infrastructure.

Setup

Deploy Jaeger all-in-one for development:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  jaegertracing/all-in-one:1.56

Port reference:

16686 - Jaeger UI
4317 - OTLP gRPC receiver
4318 - OTLP HTTP receiver
14250 - gRPC model.proto
14268 - HTTP Thrift direct from clients

Production deployment with Elasticsearch (jaeger-config.yaml):

# Collector configuration
collector:
  replicas: 2
  options:
    es:
      server-urls: http://elasticsearch:9200
      index-prefix: jaeger
      num-shards: 5
      num-replicas: 1
    collector:
      zipkin:
        host-port: :9411
      otlp-enabled: true

# Query service
query:
  replicas: 2
  options:
    es:
      server-urls: http://elasticsearch:9200
      index-prefix: jaeger

# Ingester (when using Kafka)
ingester:
  options:
    kafka:
      consumer:
        brokers: kafka:9092
        topic: jaeger-spans
        group-id: jaeger-ingester
    es:
      server-urls: http://elasticsearch:9200

Send traces to Jaeger using OpenTelemetry:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces', // Jaeger OTLP endpoint
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'order-service',
});

sdk.start();

Key Patterns

Do: Configure sampling strategies for production traffic

{
  "service_strategies": [
    {
      "service": "api-gateway",
      "type": "probabilistic",
      "param": 0.1
    },
    {
      "service": "payment-service",
      "type": "ratelimiting",
      "param": 10
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.01,
    "operation_strategies": [
      {
        "operation": "health-check",
        "type": "probabilistic",
        "param": 0
      }
    ]
  }
}

Not: Sampling 100% of traffic in production

// WRONG in production - generates massive storage costs
const sdk = new NodeSDK({
  traceExporter: exporter,
  // No sampler configured = 100% sampling by default
  // For production, configure a sampler:
});

Configure proper sampling:

import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';

const sdk = new NodeSDK({
  traceExporter: exporter,
  sampler: new TraceIdRatioBasedSampler(0.01), // 1% of traces
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'order-service',
});

Do: Add meaningful span tags and logs

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function placeOrder(order: Order) {
  return tracer.startActiveSpan('placeOrder', async (span) => {
    span.setAttribute('order.id', order.id);
    span.setAttribute('order.item_count', order.items.length);
    span.setAttribute('order.total', order.total);

    try {
      await validateInventory(order);
      span.addEvent('inventory.validated');

      const payment = await chargePayment(order);
      span.addEvent('payment.charged', { 'payment.id': payment.id });

      return await confirmOrder(order);
    } catch (err) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
      span.recordException(err as Error);
      throw err;
    } finally {
      span.end();
    }
  });
}

Common Patterns

Jaeger Query API

# Get traces for a service
curl "http://localhost:16686/api/traces?service=order-service&limit=20&lookback=1h"

# Get traces with specific tags
curl "http://localhost:16686/api/traces?service=order-service&tags=%7B%22http.status_code%22%3A%22500%22%7D"

# Get service dependencies
curl "http://localhost:16686/api/dependencies?endTs=$(date +%s)000&lookback=86400000"

# Get operations for a service
curl "http://localhost:16686/api/services/order-service/operations"

Kubernetes Deployment with Jaeger Operator

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: production-jaeger
spec:
  strategy: production
  collector:
    replicas: 3
    options:
      collector:
        num-workers: 100
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://elasticsearch:9200
        index-prefix: jaeger
        tls:
          ca: /es/certificates/ca.crt
    esIndexCleaner:
      enabled: true
      numberOfDays: 14
      schedule: "55 23 * * *"
  query:
    replicas: 2
  sampling:
    options:
      default_strategy:
        type: probabilistic
        param: 0.01

Docker Compose with Kafka Pipeline

version: '3.8'
services:
  jaeger-collector:
    image: jaegertracing/jaeger-collector:1.56
    environment:
      - SPAN_STORAGE_TYPE=kafka
      - KAFKA_PRODUCER_BROKERS=kafka:9092
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "14268:14268"
      - "4317:4317"
      - "4318:4318"

  jaeger-ingester:
    image: jaegertracing/jaeger-ingester:1.56
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
      - KAFKA_CONSUMER_BROKERS=kafka:9092

  jaeger-query:
    image: jaegertracing/jaeger-query:1.56
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
    ports:
      - "16686:16686"

Anti-Patterns

No sampling strategy - Default 100% sampling overwhelms storage and network in production; always configure probabilistic or adaptive sampling.
Missing span context propagation - If services don't propagate W3C TraceContext or B3 headers, traces break at service boundaries.
In-memory storage in production - The all-in-one image uses in-memory storage by default; always configure Elasticsearch, Cassandra, or another persistent backend.
Ignoring index cleanup - Jaeger indices grow indefinitely without esIndexCleaner; configure retention to avoid filling disks.

When to Use

You need an open-source, self-hosted distributed tracing backend.
You are running microservices on Kubernetes and want the Jaeger Operator for lifecycle management.
You need a Kafka-buffered ingestion pipeline to handle burst traffic.
You want a lightweight tracing UI for development and staging environments.
You are migrating to OpenTelemetry and need an OTLP-compatible trace backend.

Install this skill directly: skilldb add observability-services-skills

Get CLI access →