Technology & EngineeringMonitoring Services348 lines

BetterStack

"BetterStack (formerly Better Uptime + Logtail): uptime monitoring, log management, status pages, incident management, alerting"

Quick Summary18 lines

BetterStack combines uptime monitoring (formerly Better Uptime) with structured log management (formerly Logtail) into a unified observability platform. Its principles are:

## Key Points

- **Uptime is the baseline** — if your service is down, nothing else matters. Automated checks from global locations catch outages before users report them.
- **Structured logs are queryable data** — treat logs as structured events, not text blobs. Ship JSON, query with SQL-like syntax, and build dashboards from log data.
- **Status pages build trust** — public status pages with real-time incident updates reduce support ticket volume and demonstrate transparency.
- **Escalation policies save sleep** — route alerts through on-call schedules with escalation chains so the right person is paged at the right time.
- **Correlate uptime with logs** — when a monitor fires, jump directly to the logs from that time window to find root cause without context-switching.
1. **Use structured JSON logs** — BetterStack's query engine works best with structured fields; avoid unstructured text logs.
2. **Set a confirmation period on monitors** — require 2-3 consecutive failures before alerting to avoid false positives from transient network issues.
3. **Monitor from multiple regions** — a single-region check cannot distinguish between your outage and a regional network problem.
4. **Include version in log metadata** — tie log entries to deploys so you can correlate spikes with releases.
5. **Create a health endpoint that checks dependencies** — a 200 from your app means nothing if the database is down; check all critical dependencies.
6. **Set up escalation policies** — primary on-call gets paged immediately, secondary after 5 minutes, manager after 15.
7. **Flush logs on process exit** — the SDK batches logs; call `logtail.flush()` in shutdown handlers to avoid losing final entries.

skilldb get monitoring-services-skills/BetterStackFull skill: 348 lines

Paste into your CLAUDE.md or agent config

BetterStack Monitoring Skill

Core Philosophy

BetterStack combines uptime monitoring (formerly Better Uptime) with structured log management (formerly Logtail) into a unified observability platform. Its principles are:

Uptime is the baseline — if your service is down, nothing else matters. Automated checks from global locations catch outages before users report them.
Structured logs are queryable data — treat logs as structured events, not text blobs. Ship JSON, query with SQL-like syntax, and build dashboards from log data.
Status pages build trust — public status pages with real-time incident updates reduce support ticket volume and demonstrate transparency.
Escalation policies save sleep — route alerts through on-call schedules with escalation chains so the right person is paged at the right time.
Correlate uptime with logs — when a monitor fires, jump directly to the logs from that time window to find root cause without context-switching.

Setup

Log Management with the Node.js SDK

// lib/logger.ts
import { Logtail } from "@logtail/node";
import { LogtailTransport } from "@logtail/winston";
import winston from "winston";

const logtail = new Logtail(process.env.BETTERSTACK_SOURCE_TOKEN!, {
  batchInterval: 1000,
  batchSize: 100,
  retryCount: 3,
});

export const logger = winston.createLogger({
  level: process.env.LOG_LEVEL ?? "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: process.env.SERVICE_NAME ?? "api",
    environment: process.env.NODE_ENV,
    version: process.env.APP_VERSION,
  },
  transports: [
    new LogtailTransport(logtail),
    ...(process.env.NODE_ENV !== "production"
      ? [new winston.transports.Console({ format: winston.format.simple() })]
      : []),
  ],
});

// Flush logs before process exits
process.on("beforeExit", async () => {
  await logtail.flush();
});

Pino Integration

// lib/pino-logger.ts
import pino from "pino";

export const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  transport: {
    targets: [
      {
        target: "@logtail/pino",
        options: { sourceToken: process.env.BETTERSTACK_SOURCE_TOKEN },
        level: "info",
      },
      ...(process.env.NODE_ENV !== "production"
        ? [{ target: "pino-pretty", options: { colorize: true }, level: "debug" as const }]
        : []),
    ],
  },
  base: {
    service: process.env.SERVICE_NAME,
    environment: process.env.NODE_ENV,
  },
});

Uptime Monitor Configuration via API

// scripts/setup-monitors.ts
const BETTERSTACK_API_TOKEN = process.env.BETTERSTACK_API_TOKEN!;
const BASE_URL = "https://uptime.betterstack.com/api/v2";

interface MonitorConfig {
  url: string;
  monitor_type: "status" | "keyword" | "ping";
  check_frequency: number;
  regions: string[];
  expected_status_codes: number[];
  request_timeout: number;
  confirmation_period: number;
}

const monitors: MonitorConfig[] = [
  {
    url: "https://app.example.com/api/health",
    monitor_type: "status",
    check_frequency: 30,
    regions: ["us", "eu", "ap"],
    expected_status_codes: [200],
    request_timeout: 15,
    confirmation_period: 120,
  },
  {
    url: "https://app.example.com",
    monitor_type: "keyword",
    check_frequency: 60,
    regions: ["us", "eu"],
    expected_status_codes: [200],
    request_timeout: 20,
    confirmation_period: 180,
  },
];

async function createMonitor(config: MonitorConfig) {
  const response = await fetch(`${BASE_URL}/monitors`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${BETTERSTACK_API_TOKEN}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(config),
  });

  if (!response.ok) {
    throw new Error(`Failed to create monitor: ${response.statusText}`);
  }

  const data = await response.json();
  console.log(`Created monitor: ${data.data.attributes.url}`);
  return data;
}

async function main() {
  for (const monitor of monitors) {
    await createMonitor(monitor);
  }
}

main().catch(console.error);

Key Techniques

Structured Request Logging Middleware

// middleware/request-logger.ts
import { type NextRequest, NextResponse } from "next/server";
import { logger } from "@/lib/logger";

export function withRequestLogging(
  handler: (req: NextRequest) => Promise<NextResponse>
) {
  return async (req: NextRequest): Promise<NextResponse> => {
    const start = Date.now();
    const requestId = crypto.randomUUID();

    logger.info("request.start", {
      requestId,
      method: req.method,
      path: req.nextUrl.pathname,
      query: Object.fromEntries(req.nextUrl.searchParams),
      userAgent: req.headers.get("user-agent"),
      ip: req.headers.get("x-forwarded-for"),
    });

    try {
      const response = await handler(req);
      const duration = Date.now() - start;

      logger.info("request.complete", {
        requestId,
        method: req.method,
        path: req.nextUrl.pathname,
        status: response.status,
        duration,
      });

      response.headers.set("x-request-id", requestId);
      return response;
    } catch (error) {
      const duration = Date.now() - start;

      logger.error("request.error", {
        requestId,
        method: req.method,
        path: req.nextUrl.pathname,
        error: error instanceof Error ? error.message : "Unknown",
        stack: error instanceof Error ? error.stack : undefined,
        duration,
      });

      return NextResponse.json(
        { error: "Internal Server Error", requestId },
        { status: 500 }
      );
    }
  };
}

Health Check Endpoint for Uptime Monitors

// app/api/health/route.ts
import { NextResponse } from "next/server";
import { prisma } from "@/lib/prisma";
import { redis } from "@/lib/redis";

interface HealthCheck {
  name: string;
  check: () => Promise<boolean>;
}

const checks: HealthCheck[] = [
  {
    name: "database",
    check: async () => {
      await prisma.$queryRaw`SELECT 1`;
      return true;
    },
  },
  {
    name: "redis",
    check: async () => {
      const pong = await redis.ping();
      return pong === "PONG";
    },
  },
];

export async function GET() {
  const results: Record<string, { status: string; latency: number }> = {};
  let allHealthy = true;

  for (const { name, check } of checks) {
    const start = Date.now();
    try {
      await check();
      results[name] = { status: "healthy", latency: Date.now() - start };
    } catch {
      results[name] = { status: "unhealthy", latency: Date.now() - start };
      allHealthy = false;
    }
  }

  return NextResponse.json(
    {
      status: allHealthy ? "healthy" : "degraded",
      checks: results,
      timestamp: new Date().toISOString(),
      version: process.env.APP_VERSION ?? "unknown",
    },
    { status: allHealthy ? 200 : 503 }
  );
}

Incident Webhook Handler

// app/api/webhooks/betterstack/route.ts
import { NextResponse } from "next/server";
import { logger } from "@/lib/logger";

interface BetterStackWebhook {
  data: {
    id: string;
    attributes: {
      url: string;
      status: "up" | "down" | "validating" | "paused";
      started_at: string;
      resolved_at: string | null;
      cause: string;
    };
  };
}

export async function POST(request: Request) {
  const payload: BetterStackWebhook = await request.json();
  const { attributes } = payload.data;

  logger.warn("incident.webhook", {
    monitorUrl: attributes.url,
    status: attributes.status,
    cause: attributes.cause,
    startedAt: attributes.started_at,
    resolvedAt: attributes.resolved_at,
  });

  if (attributes.status === "down") {
    await notifySlack({
      text: `Monitor DOWN: ${attributes.url}\nCause: ${attributes.cause}`,
      channel: "#incidents",
    });
  }

  if (attributes.resolved_at) {
    await notifySlack({
      text: `Monitor RECOVERED: ${attributes.url}`,
      channel: "#incidents",
    });
  }

  return NextResponse.json({ received: true });
}

async function notifySlack(message: { text: string; channel: string }) {
  await fetch(process.env.SLACK_WEBHOOK_URL!, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(message),
  });
}

Best Practices

Use structured JSON logs — BetterStack's query engine works best with structured fields; avoid unstructured text logs.
Set a confirmation period on monitors — require 2-3 consecutive failures before alerting to avoid false positives from transient network issues.
Monitor from multiple regions — a single-region check cannot distinguish between your outage and a regional network problem.
Include version in log metadata — tie log entries to deploys so you can correlate spikes with releases.
Create a health endpoint that checks dependencies — a 200 from your app means nothing if the database is down; check all critical dependencies.
Set up escalation policies — primary on-call gets paged immediately, secondary after 5 minutes, manager after 15.
Flush logs on process exit — the SDK batches logs; call logtail.flush() in shutdown handlers to avoid losing final entries.
Use log-based alerts — set up BetterStack alerts that trigger when error log frequency exceeds a threshold.

Anti-Patterns

Logging PII in plain text — user emails, IPs, and tokens in logs create compliance risks. Mask or hash sensitive fields.
Check frequency under 30 seconds without need — aggressive polling wastes quota and rarely provides faster detection than 30-second intervals.
No confirmation period — a single failed check triggers an alert; transient network blips cause false pages at 3 AM.
Ignoring log volume — shipping debug-level logs to BetterStack in production burns through quotas; use info or warn as the production floor.
Health checks that always return 200 — if your health endpoint catches all exceptions and returns 200, the uptime monitor will never fire.
Status page with no automation — a manual-only status page goes stale during incidents when the team is busy firefighting.
Alerting without escalation — sending alerts to a shared channel means nobody owns the response; always assign to on-call schedules.

Install this skill directly: skilldb add monitoring-services-skills

Get CLI access →

Related Skills

Baselime

Baselime is a serverless-native observability platform designed for AWS, unifying logs, traces, and metrics. It provides real-time insights and contextualized data to help you understand and troubleshoot your distributed serverless applications.

Monitoring Services•245L

Checkly

"Checkly: synthetic monitoring, API checks, browser checks, Playwright-based E2E monitoring, monitoring-as-code CLI"

Monitoring Services•202L

Cronitor

Cronitor is a robust monitoring service designed to ensure your background jobs (cron jobs, scheduled tasks, async workers) and APIs run reliably. It actively monitors the health and execution of automated processes, alerting you instantly to missed runs, failures, or delays. Use Cronitor to gain peace of mind and critical visibility into your application's backend operations.

Monitoring Services•218L

Datadog

"Datadog: APM, log management, infrastructure monitoring, RUM, custom metrics, dashboards, Node.js tracing"

Monitoring Services•328L

Grafana Cloud

Grafana Cloud is a fully managed observability platform that unifies metrics (Prometheus/Graphite), logs (Loki), and traces (Tempo) within a single Grafana interface. Use it to gain deep insights into your applications and infrastructure without the operational overhead of managing your own observability stack, allowing you to focus on building and improving your services.

Monitoring Services•202L

Highlight.io

"Highlight.io: open-source monitoring, session replay, error tracking, logging, tracing, Next.js SDK, self-hosted option"

Monitoring Services•354L