Skip to main content
Technology & EngineeringLogging Services241 lines

Logstash / ELK Stack

ELK Stack logging — Logstash pipelines, Elasticsearch indexing, Kibana dashboards, and Filebeat shippers

Quick Summary32 lines
You are an expert in integrating the ELK Stack (Elasticsearch, Logstash, Kibana) and Beats for application logging and observability.

## Key Points

- **Skipping Index Lifecycle Management** -- Without ILM, indices grow until the cluster runs out of disk space. Define rollover, shrink, and delete policies on day one.
- **Forgetting `vm.max_map_count` on Elasticsearch hosts** -- The kernel parameter must be at least 262144. Without it, Elasticsearch crashes on startup with no obvious error message.
- type: log
1. Create a Data View in Kibana matching `logs-*`.
2. Build saved searches for common queries (e.g., `level:error AND service:myapp`).
3. Create dashboards combining log count histograms, top error tables, and latency line charts.
- pipeline.id: nginx
- pipeline.id: application
- Use Filebeat (not Logstash) as the shipper on application hosts — it is far lighter on resources and handles backpressure gracefully.
- Always define ILM policies; without them, disk usage grows unbounded and cluster health degrades.
- Use `keyword` type for fields you filter or aggregate on (service, level, user_id) and `text` type only for fields you full-text search.
- Set `json.keys_under_root: true` in Filebeat when your app outputs structured JSON so fields land at the top level in Elasticsearch.

## Quick Example

```bash
npm install winston winston-elasticsearch
```

```yaml
# logstash/pipelines.yml
- pipeline.id: nginx
  path.config: "/usr/share/logstash/pipeline/nginx.conf"
- pipeline.id: application
  path.config: "/usr/share/logstash/pipeline/app.conf"
```
skilldb get logging-services-skills/Logstash / ELK StackFull skill: 241 lines
Paste into your CLAUDE.md or agent config

Logstash / ELK Stack — Logging Integration

You are an expert in integrating the ELK Stack (Elasticsearch, Logstash, Kibana) and Beats for application logging and observability.

Core Philosophy

The ELK Stack gives you full control over your log infrastructure at the cost of operational responsibility. Unlike SaaS log platforms, you own the data, control retention policies, and are not subject to per-GB pricing surprises. But you also manage Elasticsearch cluster health, Logstash pipeline throughput, and Kibana availability. Choose ELK when data sovereignty, customization, or cost predictability at high volume outweigh the operational burden.

Ship lightweight, transform centrally. Filebeat belongs on application hosts -- it is a single Go binary with a tiny memory footprint that tails log files and forwards them efficiently. Logstash belongs on dedicated infrastructure where it can apply grok parsing, field enrichment, and conditional routing without competing for resources with your application. Running Logstash on every application host is a common anti-pattern that wastes memory (it is a JVM process requiring 1 GB+ heap) and complicates operations.

Index lifecycle management is not optional in production. Without ILM policies, Elasticsearch indices grow unbounded until the cluster runs out of disk space, search performance degrades, and the cluster enters a red health state. Define rollover, shrink, freeze, and delete phases that match your retention requirements. Hot data stays on fast storage for recent queries, warm data compacts for occasional access, and cold data freezes or deletes automatically.

Anti-Patterns

  • Running Logstash on every application host -- Logstash is a JVM process that consumes 1 GB+ of heap memory. Use Filebeat on application hosts and centralize Logstash on dedicated aggregation nodes.
  • Skipping Index Lifecycle Management -- Without ILM, indices grow until the cluster runs out of disk space. Define rollover, shrink, and delete policies on day one.
  • Using dynamic mapping for all fields -- Elasticsearch creates a mapping entry for every new JSON key. Unchecked dynamic mapping leads to mapping explosions (thousands of fields) that destabilize the cluster. Define explicit mappings and set dynamic: false for unknown fields.
  • Shipping debug-level logs to Elasticsearch in production -- The volume overwhelms the cluster, balloons storage costs, and drowns important logs in noise. Index only warn-level and above; use Filebeat's log level filtering.
  • Forgetting vm.max_map_count on Elasticsearch hosts -- The kernel parameter must be at least 262144. Without it, Elasticsearch crashes on startup with no obvious error message.

Overview

The ELK Stack is the most widely deployed open-source log management solution. Elasticsearch stores and indexes logs, Logstash transforms and routes them, and Kibana provides search and visualization. Filebeat (part of the Beats family) is a lightweight shipper that tails log files and forwards them to Logstash or Elasticsearch. The stack is self-hosted (or available as Elastic Cloud), giving teams full control over data retention, parsing, and access.

Setup & Configuration

Docker Compose (Development)

version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - es-data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.12.0
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    ports:
      - "5044:5044"   # Beats input
      - "5000:5000"   # TCP input
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.12.0
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

volumes:
  es-data:

Filebeat Configuration

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/myapp/*.log
    fields:
      service: myapp
      env: production
    fields_under_root: true
    json.keys_under_root: true
    json.add_error_key: true

output.logstash:
  hosts: ["logstash:5044"]

Logstash Pipeline

# logstash/pipeline/main.conf
input {
  beats {
    port => 5044
  }
  tcp {
    port => 5000
    codec => json_lines
  }
}

filter {
  if [service] == "nginx" {
    grok {
      match => {
        "message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status:int} %{NUMBER:bytes:int}'
      }
    }
    date {
      match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
  }

  if [level] == "error" or [level] == "fatal" {
    mutate {
      add_tag => ["alert-worthy"]
    }
  }

  # Remove sensitive fields
  mutate {
    remove_field => ["password", "credit_card", "ssn"]
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logs-%{[service]}-%{+YYYY.MM.dd}"
  }
}

Shipping Logs Directly from Node.js

npm install winston winston-elasticsearch
const winston = require('winston');
const { ElasticsearchTransport } = require('winston-elasticsearch');

const esTransport = new ElasticsearchTransport({
  level: 'info',
  clientOpts: { node: 'http://localhost:9200' },
  indexPrefix: 'logs-myapp',
});

const logger = winston.createLogger({
  transports: [
    new winston.transports.Console(),
    esTransport,
  ],
});

logger.info('User signed in', { userId: 'u-123', method: 'oauth' });

Core Patterns

Index Lifecycle Management (ILM)

Prevent unbounded index growth by defining lifecycle policies:

PUT _ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" } } },
      "warm":   { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Index Templates

Apply consistent mappings and settings to all log indices:

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy"
    },
    "mappings": {
      "properties": {
        "service":    { "type": "keyword" },
        "level":      { "type": "keyword" },
        "message":    { "type": "text" },
        "@timestamp": { "type": "date" }
      }
    }
  }
}

Kibana Saved Searches and Dashboards

  1. Create a Data View in Kibana matching logs-*.
  2. Build saved searches for common queries (e.g., level:error AND service:myapp).
  3. Create dashboards combining log count histograms, top error tables, and latency line charts.

Multi-Pipeline Logstash

For complex setups, separate concerns into multiple pipeline files:

# logstash/pipelines.yml
- pipeline.id: nginx
  path.config: "/usr/share/logstash/pipeline/nginx.conf"
- pipeline.id: application
  path.config: "/usr/share/logstash/pipeline/app.conf"

Best Practices

  • Use Filebeat (not Logstash) as the shipper on application hosts — it is far lighter on resources and handles backpressure gracefully.
  • Always define ILM policies; without them, disk usage grows unbounded and cluster health degrades.
  • Use keyword type for fields you filter or aggregate on (service, level, user_id) and text type only for fields you full-text search.
  • Set json.keys_under_root: true in Filebeat when your app outputs structured JSON so fields land at the top level in Elasticsearch.
  • Pin Elastic Stack component versions together — mixing Filebeat 8.x with Elasticsearch 7.x causes subtle compatibility issues.

Common Pitfalls

  • Running Logstash on application hosts instead of Filebeat, consuming excessive memory (Logstash is a JVM process requiring 1 GB+ heap).
  • Not setting max_size or max_age rollover conditions, leading to individual indices growing to hundreds of gigabytes and slowing searches.
  • Using dynamic mapping for everything — Elasticsearch creates a field for every new JSON key, leading to mapping explosions and cluster instability.
  • Forgetting to increase the vm.max_map_count kernel parameter on Elasticsearch hosts (must be at least 262144), causing the process to crash on startup.
  • Shipping debug-level logs to Elasticsearch in production, overwhelming the cluster with volume it was not sized for.

Install this skill directly: skilldb add logging-services-skills

Get CLI access →