Logstash / ELK Stack
ELK Stack logging — Logstash pipelines, Elasticsearch indexing, Kibana dashboards, and Filebeat shippers
You are an expert in integrating the ELK Stack (Elasticsearch, Logstash, Kibana) and Beats for application logging and observability. ## Key Points - **Skipping Index Lifecycle Management** -- Without ILM, indices grow until the cluster runs out of disk space. Define rollover, shrink, and delete policies on day one. - **Forgetting `vm.max_map_count` on Elasticsearch hosts** -- The kernel parameter must be at least 262144. Without it, Elasticsearch crashes on startup with no obvious error message. - type: log 1. Create a Data View in Kibana matching `logs-*`. 2. Build saved searches for common queries (e.g., `level:error AND service:myapp`). 3. Create dashboards combining log count histograms, top error tables, and latency line charts. - pipeline.id: nginx - pipeline.id: application - Use Filebeat (not Logstash) as the shipper on application hosts — it is far lighter on resources and handles backpressure gracefully. - Always define ILM policies; without them, disk usage grows unbounded and cluster health degrades. - Use `keyword` type for fields you filter or aggregate on (service, level, user_id) and `text` type only for fields you full-text search. - Set `json.keys_under_root: true` in Filebeat when your app outputs structured JSON so fields land at the top level in Elasticsearch. ## Quick Example ```bash npm install winston winston-elasticsearch ``` ```yaml # logstash/pipelines.yml - pipeline.id: nginx path.config: "/usr/share/logstash/pipeline/nginx.conf" - pipeline.id: application path.config: "/usr/share/logstash/pipeline/app.conf" ```
skilldb get logging-services-skills/Logstash / ELK StackFull skill: 241 linesLogstash / ELK Stack — Logging Integration
You are an expert in integrating the ELK Stack (Elasticsearch, Logstash, Kibana) and Beats for application logging and observability.
Core Philosophy
The ELK Stack gives you full control over your log infrastructure at the cost of operational responsibility. Unlike SaaS log platforms, you own the data, control retention policies, and are not subject to per-GB pricing surprises. But you also manage Elasticsearch cluster health, Logstash pipeline throughput, and Kibana availability. Choose ELK when data sovereignty, customization, or cost predictability at high volume outweigh the operational burden.
Ship lightweight, transform centrally. Filebeat belongs on application hosts -- it is a single Go binary with a tiny memory footprint that tails log files and forwards them efficiently. Logstash belongs on dedicated infrastructure where it can apply grok parsing, field enrichment, and conditional routing without competing for resources with your application. Running Logstash on every application host is a common anti-pattern that wastes memory (it is a JVM process requiring 1 GB+ heap) and complicates operations.
Index lifecycle management is not optional in production. Without ILM policies, Elasticsearch indices grow unbounded until the cluster runs out of disk space, search performance degrades, and the cluster enters a red health state. Define rollover, shrink, freeze, and delete phases that match your retention requirements. Hot data stays on fast storage for recent queries, warm data compacts for occasional access, and cold data freezes or deletes automatically.
Anti-Patterns
- Running Logstash on every application host -- Logstash is a JVM process that consumes 1 GB+ of heap memory. Use Filebeat on application hosts and centralize Logstash on dedicated aggregation nodes.
- Skipping Index Lifecycle Management -- Without ILM, indices grow until the cluster runs out of disk space. Define rollover, shrink, and delete policies on day one.
- Using dynamic mapping for all fields -- Elasticsearch creates a mapping entry for every new JSON key. Unchecked dynamic mapping leads to mapping explosions (thousands of fields) that destabilize the cluster. Define explicit mappings and set
dynamic: falsefor unknown fields. - Shipping debug-level logs to Elasticsearch in production -- The volume overwhelms the cluster, balloons storage costs, and drowns important logs in noise. Index only warn-level and above; use Filebeat's log level filtering.
- Forgetting
vm.max_map_counton Elasticsearch hosts -- The kernel parameter must be at least 262144. Without it, Elasticsearch crashes on startup with no obvious error message.
Overview
The ELK Stack is the most widely deployed open-source log management solution. Elasticsearch stores and indexes logs, Logstash transforms and routes them, and Kibana provides search and visualization. Filebeat (part of the Beats family) is a lightweight shipper that tails log files and forwards them to Logstash or Elasticsearch. The stack is self-hosted (or available as Elastic Cloud), giving teams full control over data retention, parsing, and access.
Setup & Configuration
Docker Compose (Development)
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
volumes:
- es-data:/usr/share/elasticsearch/data
logstash:
image: docker.elastic.co/logstash/logstash:8.12.0
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline
ports:
- "5044:5044" # Beats input
- "5000:5000" # TCP input
depends_on:
- elasticsearch
kibana:
image: docker.elastic.co/kibana/kibana:8.12.0
ports:
- "5601:5601"
depends_on:
- elasticsearch
volumes:
es-data:
Filebeat Configuration
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/myapp/*.log
fields:
service: myapp
env: production
fields_under_root: true
json.keys_under_root: true
json.add_error_key: true
output.logstash:
hosts: ["logstash:5044"]
Logstash Pipeline
# logstash/pipeline/main.conf
input {
beats {
port => 5044
}
tcp {
port => 5000
codec => json_lines
}
}
filter {
if [service] == "nginx" {
grok {
match => {
"message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status:int} %{NUMBER:bytes:int}'
}
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
if [level] == "error" or [level] == "fatal" {
mutate {
add_tag => ["alert-worthy"]
}
}
# Remove sensitive fields
mutate {
remove_field => ["password", "credit_card", "ssn"]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logs-%{[service]}-%{+YYYY.MM.dd}"
}
}
Shipping Logs Directly from Node.js
npm install winston winston-elasticsearch
const winston = require('winston');
const { ElasticsearchTransport } = require('winston-elasticsearch');
const esTransport = new ElasticsearchTransport({
level: 'info',
clientOpts: { node: 'http://localhost:9200' },
indexPrefix: 'logs-myapp',
});
const logger = winston.createLogger({
transports: [
new winston.transports.Console(),
esTransport,
],
});
logger.info('User signed in', { userId: 'u-123', method: 'oauth' });
Core Patterns
Index Lifecycle Management (ILM)
Prevent unbounded index growth by defining lifecycle policies:
PUT _ilm/policy/logs-policy
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" } } },
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
"cold": { "min_age": "30d", "actions": { "freeze": {} } },
"delete": { "min_age": "90d", "actions": { "delete": {} } }
}
}
}
Index Templates
Apply consistent mappings and settings to all log indices:
PUT _index_template/logs-template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy"
},
"mappings": {
"properties": {
"service": { "type": "keyword" },
"level": { "type": "keyword" },
"message": { "type": "text" },
"@timestamp": { "type": "date" }
}
}
}
}
Kibana Saved Searches and Dashboards
- Create a Data View in Kibana matching
logs-*. - Build saved searches for common queries (e.g.,
level:error AND service:myapp). - Create dashboards combining log count histograms, top error tables, and latency line charts.
Multi-Pipeline Logstash
For complex setups, separate concerns into multiple pipeline files:
# logstash/pipelines.yml
- pipeline.id: nginx
path.config: "/usr/share/logstash/pipeline/nginx.conf"
- pipeline.id: application
path.config: "/usr/share/logstash/pipeline/app.conf"
Best Practices
- Use Filebeat (not Logstash) as the shipper on application hosts — it is far lighter on resources and handles backpressure gracefully.
- Always define ILM policies; without them, disk usage grows unbounded and cluster health degrades.
- Use
keywordtype for fields you filter or aggregate on (service, level, user_id) andtexttype only for fields you full-text search. - Set
json.keys_under_root: truein Filebeat when your app outputs structured JSON so fields land at the top level in Elasticsearch. - Pin Elastic Stack component versions together — mixing Filebeat 8.x with Elasticsearch 7.x causes subtle compatibility issues.
Common Pitfalls
- Running Logstash on application hosts instead of Filebeat, consuming excessive memory (Logstash is a JVM process requiring 1 GB+ heap).
- Not setting
max_sizeormax_agerollover conditions, leading to individual indices growing to hundreds of gigabytes and slowing searches. - Using dynamic mapping for everything — Elasticsearch creates a field for every new JSON key, leading to mapping explosions and cluster instability.
- Forgetting to increase the
vm.max_map_countkernel parameter on Elasticsearch hosts (must be at least 262144), causing the process to crash on startup. - Shipping debug-level logs to Elasticsearch in production, overwhelming the cluster with volume it was not sized for.
Install this skill directly: skilldb add logging-services-skills
Related Skills
Better Stack / Logtail
Better Stack (Logtail) logging — structured log ingestion, live tail, SQL-based querying, alerting, and uptime monitoring
Datadog Logging
Datadog log management — agent setup, library integration, log pipelines, facets, monitors, and APM correlation
Fluentd
Fluentd unified logging — input/output plugins, routing with tags, buffering, Kubernetes DaemonSet, and Fluent Bit
Papertrail
Papertrail cloud logging — syslog forwarding, live tail, search, alerts, and integration with app frameworks
Pino Logger
Pino: fast JSON logger for Node.js — child loggers, serializers, transports (pino-pretty, pino-http), redaction, Next.js integration, and log levels
Structured Logging Patterns
Structured logging patterns for TypeScript — correlation IDs, request context, log levels, error serialization, sensitive data redaction, and observability best practices