Technology & EngineeringMonitoring Services218 lines

Cronitor

Cronitor is a robust monitoring service designed to ensure your background jobs (cron jobs, scheduled tasks, async workers) and APIs run reliably. It actively monitors the health and execution of automated processes, alerting you instantly to missed runs, failures, or delays. Use Cronitor to gain peace of mind and critical visibility into your application's backend operations.

Quick Summary27 lines

You are a seasoned DevOps engineer and backend developer, adept at ensuring the reliability of critical asynchronous processes and scheduled tasks. You understand that silent failures in background jobs can devastate an application's integrity, leading to data inconsistencies, missed deadlines, and poor user experiences. You leverage Cronitor to provide robust, real-time visibility and alerting for every automated operation, transforming potential outages into proactively managed events.

## Key Points

1. **Create a Monitor:** Log into your Cronitor dashboard and create a new monitor. You'll be given a unique monitor key (e.g., `abcdef123456`).
3. **Configure API Key:** Set your Cronitor API key, preferably via an environment variable.
* **Monitor All Critical Background Jobs:** Assume nothing runs perfectly. Every automated task that impacts your application's integrity or user experience should have a Cronitor monitor.
* **Use `run`/`complete`/`fail` for Long-Running Tasks:** Don't just ping at the end. Explicitly signal job start (`.run()`) to detect jobs that never begin or hang indefinitely.
* **Include Diagnostic Data in Failures:** When calling `.fail()`, always provide a meaningful `message` and, if available, the `stacktrace`. This accelerates debugging significantly.
* **Set Appropriate Alert Thresholds:** Configure "Grace Periods" and "Max Runtime" in Cronitor to match your job's expected behavior. Avoid overly sensitive alerts that lead to fatigue.
* **Group Related Monitors:** Use Cronitor's tags and projects to organize monitors logically, making it easier to manage and respond to alerts for specific application areas.
* **Use Environment Variables for Keys:** Never hardcode monitor keys or API keys in your codebase. Utilize environment variables for security and easier deployment across environments.
* **Test Your Monitoring:** Periodically trigger a failure or delay in a test environment to ensure your Cronitor integration and alert channels are working as expected.

## Quick Example

```bash
pip install cronitor-python
```

```bash
npm install cronitor
# or
yarn add cronitor
```

skilldb get monitoring-services-skills/CronitorFull skill: 218 lines

Paste into your CLAUDE.md or agent config

Core Philosophy

Cronitor's core philosophy centers on the principle that every critical background process deserves active monitoring. Unlike traditional APM tools that focus primarily on request-response cycles, Cronitor specializes in the "fire and forget" or "run on schedule" paradigm, providing a simple yet powerful "heartbeat" mechanism.

You choose Cronitor when the reliability of your cron jobs, queue workers, data pipelines, or API endpoints is paramount. It excels at detecting three primary failure modes: jobs that never start (missed), jobs that take too long (delayed), and jobs that explicitly fail. Its strength lies in its simplicity, requiring minimal integration to gain maximum insight, ensuring that your automated infrastructure doesn't silently break down.

Setup

Integrating Cronitor primarily involves making HTTP requests to specific endpoints or using one of their lightweight SDKs.

Create a Monitor: Log into your Cronitor dashboard and create a new monitor. You'll be given a unique monitor key (e.g., abcdef123456).
Install SDK (Optional but Recommended for complex tasks): While curl or fetch works for simple pings, for more complex scenarios, using a dedicated SDK (like cronitor-python or cronitor-node) simplifies integration.

Python:
```
pip install cronitor-python
```
Node.js:
```
npm install cronitor
# or
yarn add cronitor
```

Configure API Key: Set your Cronitor API key, preferably via an environment variable.

Python Example:

import os
import cronitor

# Best practice: Load from environment variable
cronitor.api_key = os.getenv("CRONITOR_API_KEY")

if not cronitor.api_key:
    raise ValueError("CRONITOR_API_KEY environment variable not set.")

# Example: Define a monitor key for a specific task
DAILY_REPORT_MONITOR_KEY = os.getenv("CRONITOR_DAILY_REPORT_KEY", "your_default_key")

Node.js Example:

import { Cronitor } from 'cronitor';

// Best practice: Load from environment variable
const cronitor = new Cronitor(process.env.CRONITOR_API_KEY);

if (!process.env.CRONITOR_API_KEY) {
  console.error("CRONITOR_API_KEY environment variable not set.");
  process.exit(1);
}

const DAILY_REPORT_MONITOR_KEY = process.env.CRONITOR_DAILY_REPORT_KEY || "your_default_key";

Key Techniques

1. Monitoring a Simple Scheduled Task (Cron Job)

For basic cron jobs, you want to signal when the job starts and when it completes. If Cronitor doesn't receive the "complete" signal within the expected timeframe, it will alert you.

Using curl in a Shell Script:

#!/bin/bash

MONITOR_KEY="your_cronitor_monitor_key" # Replace with your monitor's key

echo "Starting daily backup..."
# Signal job start
curl -m 10 --retry 5 "https://cronitor.link/p/${MONITOR_KEY}/run" > /dev/null 2>&1

# --- Your actual backup logic goes here ---
# Example:
# rsync -avz /data/app /mnt/backups/daily/

if [ $? -eq 0 ]; then
  echo "Backup completed successfully."
  # Signal job completion
  curl -m 10 --retry 5 "https://cronitor.link/p/${MONITOR_KEY}/complete" > /dev/null 2>&1
else
  echo "Backup failed!"
  # Signal job failure with an optional message
  curl -m 10 --retry 5 -X POST -d "message=Backup failed for unknown reason" "https://cronitor.link/p/${MONITOR_KEY}/fail" > /dev/null 2>&1
  exit 1
fi

2. Monitoring Long-Running or Asynchronous Tasks with Detailed Status

For tasks that might run for extended periods, or those managed by task queues (like Celery, BullMQ), you'll use the run, complete, and fail endpoints to provide more granular lifecycle monitoring. This allows you to track progress, capture failure reasons, and differentiate between a job that's running late and one that simply hasn't started.

Using cronitor-python for a long-running task:

import os
import time
import traceback
import cronitor

cronitor.api_key = os.getenv("CRONITOR_API_KEY")
DATA_PROCESSING_MONITOR_KEY = os.getenv("CRONITOR_DATA_PROCESSING_KEY", "your_data_monitor_key")

def process_large_dataset():
    monitor = cronitor.Monitor(DATA_PROCESSING_MONITOR_KEY)
    try:
        # Signal that the job has started
        monitor.run(message="Starting data processing pipeline...")
        print("Data processing started.")

        # Simulate a long-running task
        for i in range(1, 6):
            time.sleep(2) # Simulate work
            # You can send progress updates if desired
            monitor.ping(message=f"Processing batch {i}/5", series=f"batch-{i}")
            print(f"Processed batch {i}/5")

        # Simulate a potential failure for demonstration
        # if time.time() % 2 == 0:
        #     raise ValueError("Simulated processing error!")

        # Signal successful completion
        monitor.complete(message="Data processing pipeline completed successfully.")
        print("Data processing finished.")

    except Exception as e:
        # Signal failure, including error message and stack trace
        error_message = f"Data processing failed: {e}"
        stack_trace = traceback.format_exc()
        monitor.fail(message=error_message, stacktrace=stack_trace)
        print(f"Error: {error_message}\n{stack_trace}")
        raise # Re-raise the exception to propagate it

if __name__ == "__main__":
    process_large_dataset()

3. Integrating with a Web API Endpoint

You can monitor the availability and response time of a critical API endpoint using Cronitor's ping mechanism. This is useful for external API dependencies or your own internal microservices.

Using Node.js fetch to monitor an external API:

import { Cronitor } from 'cronitor';
import fetch from 'node-fetch'; // For older Node.js versions, or use built-in fetch for Node 18+

const cronitor = new Cronitor(process.env.CRONITOR_API_KEY);
const EXTERNAL_API_MONITOR_KEY = process.env.CRONITOR_EXTERNAL_API_KEY || "your_api_monitor_key";
const TARGET_API_URL = "https://api.example.com/health"; // The API endpoint to monitor

async function checkExternalApi() {
  const monitor = cronitor.Monitor(EXTERNAL_API_MONITOR_KEY);
  let startTime = Date.now();

  try {
    monitor.run(); // Indicate the check has started

    const response = await fetch(TARGET_API_URL, { timeout: 5000 }); // 5 second timeout
    const latency = Date.now() - startTime;

    if (!response.ok) {
      throw new Error(`API returned status ${response.status}: ${response.statusText}`);
    }

    // Optional: Parse response to verify content
    // const data = await response.json();
    // if (data.status !== 'healthy') {
    //   throw new Error("API health check content unhealthy.");
    // }

    monitor.complete({
      message: `API healthy. Latency: ${latency}ms`,
      metrics: { latency } // Send custom metrics
    });
    console.log(`API check successful. Latency: ${latency}ms`);

  } catch (error) {
    monitor.fail({
      message: `API check failed: ${error.message}`,
      stacktrace: error.stack
    });
    console.error(`API check failed: ${error.message}`);
  }
}

// Execute the check, perhaps on a schedule or as part of a synthetic test runner
checkExternalApi();

Best Practices

Monitor All Critical Background Jobs: Assume nothing runs perfectly. Every automated task that impacts your application's integrity or user experience should have a Cronitor monitor.
Use run/complete/fail for Long-Running Tasks: Don't just ping at the end. Explicitly signal job start (.run()) to detect jobs that never begin or hang indefinitely.
Include Diagnostic Data in Failures: When calling .fail(), always provide a meaningful message and, if available, the stacktrace. This accelerates debugging significantly.
Set Appropriate Alert Thresholds: Configure "Grace Periods" and "Max Runtime" in Cronitor to match your job's expected behavior. Avoid overly sensitive alerts that lead to fatigue.
Group Related Monitors: Use Cronitor's tags and projects to organize monitors logically, making it easier to manage and respond to alerts for specific application areas.
Use Environment Variables for Keys: Never hardcode monitor keys or API keys in your codebase. Utilize environment variables for security and easier deployment across environments.
Test Your Monitoring: Periodically trigger a failure or delay in a test environment to ensure your Cronitor integration and alert channels are working as expected.

Anti-Patterns

Monitoring Only Job Completion. Fails to detect jobs that never start or hang indefinitely. Instead: Always ping at the start of a job using .run() and at the end using .complete() or .fail().
Hardcoding Monitor Keys. Directly embedding monitor keys in your code leads to configuration headaches, makes environment-specific setups difficult, and poses a security risk. Instead: Use environment variables to inject monitor keys.
Ignoring Alert Fatigue. Overly aggressive or untuned alerts can desensitize your team to actual problems. Instead: Fine-tune Cronitor's grace periods, maximum runtimes, and notification channels to match your job's true behavior and importance.
Using a Single Monitor for Multiple, Distinct Jobs. This makes it difficult to pinpoint which specific job failed and to gather accurate metrics for individual tasks. Instead: Create a unique Cronitor monitor for each distinct background task or process.
Assuming a Simple Ping is Enough for Complex Tasks. A basic ping only tells you if a job finished. It provides no insight into intermediate failures, progress, or specific error causes. Instead: Leverage .run(), .ping(), .complete(), and .fail() with rich message and stacktrace payloads for detailed lifecycle tracking.

Install this skill directly: skilldb add monitoring-services-skills

Get CLI access →

Related Skills

Baselime

Baselime is a serverless-native observability platform designed for AWS, unifying logs, traces, and metrics. It provides real-time insights and contextualized data to help you understand and troubleshoot your distributed serverless applications.

Monitoring Services•245L

BetterStack

"BetterStack (formerly Better Uptime + Logtail): uptime monitoring, log management, status pages, incident management, alerting"

Monitoring Services•348L

Checkly

"Checkly: synthetic monitoring, API checks, browser checks, Playwright-based E2E monitoring, monitoring-as-code CLI"

Monitoring Services•202L

Datadog

"Datadog: APM, log management, infrastructure monitoring, RUM, custom metrics, dashboards, Node.js tracing"

Monitoring Services•328L

Grafana Cloud

Grafana Cloud is a fully managed observability platform that unifies metrics (Prometheus/Graphite), logs (Loki), and traces (Tempo) within a single Grafana interface. Use it to gain deep insights into your applications and infrastructure without the operational overhead of managing your own observability stack, allowing you to focus on building and improving your services.

Monitoring Services•202L

Highlight.io

"Highlight.io: open-source monitoring, session replay, error tracking, logging, tracing, Next.js SDK, self-hosted option"

Monitoring Services•354L