Jaeger
Deploy and integrate Jaeger for distributed tracing across microservices.
You are an expert in Jaeger distributed tracing, the CNCF graduated project. You help developers deploy Jaeger, configure collectors and sampling strategies, instrument applications to emit spans, and use the Jaeger UI for trace analysis and dependency mapping. ## Key Points - `16686` - Jaeger UI - `4317` - OTLP gRPC receiver - `4318` - OTLP HTTP receiver - `14250` - gRPC model.proto - `14268` - HTTP Thrift direct from clients - **No sampling strategy** - Default 100% sampling overwhelms storage and network in production; always configure probabilistic or adaptive sampling. - **Missing span context propagation** - If services don't propagate W3C TraceContext or B3 headers, traces break at service boundaries. - **In-memory storage in production** - The all-in-one image uses in-memory storage by default; always configure Elasticsearch, Cassandra, or another persistent backend. - **Ignoring index cleanup** - Jaeger indices grow indefinitely without `esIndexCleaner`; configure retention to avoid filling disks. - You need an open-source, self-hosted distributed tracing backend. - You are running microservices on Kubernetes and want the Jaeger Operator for lifecycle management. - You need a Kafka-buffered ingestion pipeline to handle burst traffic.
skilldb get observability-services-skills/JaegerFull skill: 286 linesJaeger Integration
You are an expert in Jaeger distributed tracing, the CNCF graduated project. You help developers deploy Jaeger, configure collectors and sampling strategies, instrument applications to emit spans, and use the Jaeger UI for trace analysis and dependency mapping.
Core Philosophy
End-to-End Trace Visibility
Jaeger captures the full lifecycle of a request across services. Each trace is a DAG of spans showing timing, dependencies, and errors across every hop.
Adaptive Sampling
Not every request needs a trace. Jaeger supports head-based and adaptive sampling to control volume while ensuring rare errors and slow requests are captured.
Backend-Agnostic Storage
Jaeger supports Elasticsearch, Cassandra, Kafka, and Badger for span storage. Choose based on your scale, query patterns, and existing infrastructure.
Setup
Deploy Jaeger all-in-one for development:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
jaegertracing/all-in-one:1.56
Port reference:
16686- Jaeger UI4317- OTLP gRPC receiver4318- OTLP HTTP receiver14250- gRPC model.proto14268- HTTP Thrift direct from clients
Production deployment with Elasticsearch (jaeger-config.yaml):
# Collector configuration
collector:
replicas: 2
options:
es:
server-urls: http://elasticsearch:9200
index-prefix: jaeger
num-shards: 5
num-replicas: 1
collector:
zipkin:
host-port: :9411
otlp-enabled: true
# Query service
query:
replicas: 2
options:
es:
server-urls: http://elasticsearch:9200
index-prefix: jaeger
# Ingester (when using Kafka)
ingester:
options:
kafka:
consumer:
brokers: kafka:9092
topic: jaeger-spans
group-id: jaeger-ingester
es:
server-urls: http://elasticsearch:9200
Send traces to Jaeger using OpenTelemetry:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces', // Jaeger OTLP endpoint
}),
instrumentations: [getNodeAutoInstrumentations()],
serviceName: 'order-service',
});
sdk.start();
Key Patterns
Do: Configure sampling strategies for production traffic
{
"service_strategies": [
{
"service": "api-gateway",
"type": "probabilistic",
"param": 0.1
},
{
"service": "payment-service",
"type": "ratelimiting",
"param": 10
}
],
"default_strategy": {
"type": "probabilistic",
"param": 0.01,
"operation_strategies": [
{
"operation": "health-check",
"type": "probabilistic",
"param": 0
}
]
}
}
Not: Sampling 100% of traffic in production
// WRONG in production - generates massive storage costs
const sdk = new NodeSDK({
traceExporter: exporter,
// No sampler configured = 100% sampling by default
// For production, configure a sampler:
});
Configure proper sampling:
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
traceExporter: exporter,
sampler: new TraceIdRatioBasedSampler(0.01), // 1% of traces
instrumentations: [getNodeAutoInstrumentations()],
serviceName: 'order-service',
});
Do: Add meaningful span tags and logs
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('order-service');
async function placeOrder(order: Order) {
return tracer.startActiveSpan('placeOrder', async (span) => {
span.setAttribute('order.id', order.id);
span.setAttribute('order.item_count', order.items.length);
span.setAttribute('order.total', order.total);
try {
await validateInventory(order);
span.addEvent('inventory.validated');
const payment = await chargePayment(order);
span.addEvent('payment.charged', { 'payment.id': payment.id });
return await confirmOrder(order);
} catch (err) {
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
span.recordException(err as Error);
throw err;
} finally {
span.end();
}
});
}
Common Patterns
Jaeger Query API
# Get traces for a service
curl "http://localhost:16686/api/traces?service=order-service&limit=20&lookback=1h"
# Get traces with specific tags
curl "http://localhost:16686/api/traces?service=order-service&tags=%7B%22http.status_code%22%3A%22500%22%7D"
# Get service dependencies
curl "http://localhost:16686/api/dependencies?endTs=$(date +%s)000&lookback=86400000"
# Get operations for a service
curl "http://localhost:16686/api/services/order-service/operations"
Kubernetes Deployment with Jaeger Operator
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: production-jaeger
spec:
strategy: production
collector:
replicas: 3
options:
collector:
num-workers: 100
storage:
type: elasticsearch
options:
es:
server-urls: https://elasticsearch:9200
index-prefix: jaeger
tls:
ca: /es/certificates/ca.crt
esIndexCleaner:
enabled: true
numberOfDays: 14
schedule: "55 23 * * *"
query:
replicas: 2
sampling:
options:
default_strategy:
type: probabilistic
param: 0.01
Docker Compose with Kafka Pipeline
version: '3.8'
services:
jaeger-collector:
image: jaegertracing/jaeger-collector:1.56
environment:
- SPAN_STORAGE_TYPE=kafka
- KAFKA_PRODUCER_BROKERS=kafka:9092
- COLLECTOR_OTLP_ENABLED=true
ports:
- "14268:14268"
- "4317:4317"
- "4318:4318"
jaeger-ingester:
image: jaegertracing/jaeger-ingester:1.56
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
- KAFKA_CONSUMER_BROKERS=kafka:9092
jaeger-query:
image: jaegertracing/jaeger-query:1.56
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
ports:
- "16686:16686"
Anti-Patterns
- No sampling strategy - Default 100% sampling overwhelms storage and network in production; always configure probabilistic or adaptive sampling.
- Missing span context propagation - If services don't propagate W3C TraceContext or B3 headers, traces break at service boundaries.
- In-memory storage in production - The all-in-one image uses in-memory storage by default; always configure Elasticsearch, Cassandra, or another persistent backend.
- Ignoring index cleanup - Jaeger indices grow indefinitely without
esIndexCleaner; configure retention to avoid filling disks.
When to Use
- You need an open-source, self-hosted distributed tracing backend.
- You are running microservices on Kubernetes and want the Jaeger Operator for lifecycle management.
- You need a Kafka-buffered ingestion pipeline to handle burst traffic.
- You want a lightweight tracing UI for development and staging environments.
- You are migrating to OpenTelemetry and need an OTLP-compatible trace backend.
Install this skill directly: skilldb add observability-services-skills
Related Skills
Axiom
Integrate Axiom for log management, analytics, and real-time dashboards.
Elastic Apm
Instrument applications with Elastic APM and the ELK Stack for traces, logs, and metrics.
Grafana
Build Grafana dashboards, configure data sources, and set up alerting rules.
Honeycomb
Integrate Honeycomb for event-driven observability with high-cardinality tracing.
New Relic
Integrate New Relic APM for application performance monitoring and distributed tracing.
Opentelemetry
Instrument applications with OpenTelemetry for distributed traces, metrics, and logs.