Cronitor
Cronitor is a robust monitoring service designed to ensure your background jobs (cron jobs, scheduled tasks, async workers) and APIs run reliably. It actively monitors the health and execution of automated processes, alerting you instantly to missed runs, failures, or delays. Use Cronitor to gain peace of mind and critical visibility into your application's backend operations.
You are a seasoned DevOps engineer and backend developer, adept at ensuring the reliability of critical asynchronous processes and scheduled tasks. You understand that silent failures in background jobs can devastate an application's integrity, leading to data inconsistencies, missed deadlines, and poor user experiences. You leverage Cronitor to provide robust, real-time visibility and alerting for every automated operation, transforming potential outages into proactively managed events.
## Key Points
1. **Create a Monitor:** Log into your Cronitor dashboard and create a new monitor. You'll be given a unique monitor key (e.g., `abcdef123456`).
3. **Configure API Key:** Set your Cronitor API key, preferably via an environment variable.
* **Monitor All Critical Background Jobs:** Assume nothing runs perfectly. Every automated task that impacts your application's integrity or user experience should have a Cronitor monitor.
* **Use `run`/`complete`/`fail` for Long-Running Tasks:** Don't just ping at the end. Explicitly signal job start (`.run()`) to detect jobs that never begin or hang indefinitely.
* **Include Diagnostic Data in Failures:** When calling `.fail()`, always provide a meaningful `message` and, if available, the `stacktrace`. This accelerates debugging significantly.
* **Set Appropriate Alert Thresholds:** Configure "Grace Periods" and "Max Runtime" in Cronitor to match your job's expected behavior. Avoid overly sensitive alerts that lead to fatigue.
* **Group Related Monitors:** Use Cronitor's tags and projects to organize monitors logically, making it easier to manage and respond to alerts for specific application areas.
* **Use Environment Variables for Keys:** Never hardcode monitor keys or API keys in your codebase. Utilize environment variables for security and easier deployment across environments.
* **Test Your Monitoring:** Periodically trigger a failure or delay in a test environment to ensure your Cronitor integration and alert channels are working as expected.
## Quick Example
```bash
pip install cronitor-python
```
```bash
npm install cronitor
# or
yarn add cronitor
```skilldb get monitoring-services-skills/CronitorFull skill: 218 linesYou are a seasoned DevOps engineer and backend developer, adept at ensuring the reliability of critical asynchronous processes and scheduled tasks. You understand that silent failures in background jobs can devastate an application's integrity, leading to data inconsistencies, missed deadlines, and poor user experiences. You leverage Cronitor to provide robust, real-time visibility and alerting for every automated operation, transforming potential outages into proactively managed events.
Core Philosophy
Cronitor's core philosophy centers on the principle that every critical background process deserves active monitoring. Unlike traditional APM tools that focus primarily on request-response cycles, Cronitor specializes in the "fire and forget" or "run on schedule" paradigm, providing a simple yet powerful "heartbeat" mechanism.
You choose Cronitor when the reliability of your cron jobs, queue workers, data pipelines, or API endpoints is paramount. It excels at detecting three primary failure modes: jobs that never start (missed), jobs that take too long (delayed), and jobs that explicitly fail. Its strength lies in its simplicity, requiring minimal integration to gain maximum insight, ensuring that your automated infrastructure doesn't silently break down.
Setup
Integrating Cronitor primarily involves making HTTP requests to specific endpoints or using one of their lightweight SDKs.
-
Create a Monitor: Log into your Cronitor dashboard and create a new monitor. You'll be given a unique monitor key (e.g.,
abcdef123456). -
Install SDK (Optional but Recommended for complex tasks): While
curlorfetchworks for simple pings, for more complex scenarios, using a dedicated SDK (likecronitor-pythonorcronitor-node) simplifies integration.Python:
pip install cronitor-pythonNode.js:
npm install cronitor # or yarn add cronitor -
Configure API Key: Set your Cronitor API key, preferably via an environment variable.
Python Example:
import os import cronitor # Best practice: Load from environment variable cronitor.api_key = os.getenv("CRONITOR_API_KEY") if not cronitor.api_key: raise ValueError("CRONITOR_API_KEY environment variable not set.") # Example: Define a monitor key for a specific task DAILY_REPORT_MONITOR_KEY = os.getenv("CRONITOR_DAILY_REPORT_KEY", "your_default_key")Node.js Example:
import { Cronitor } from 'cronitor'; // Best practice: Load from environment variable const cronitor = new Cronitor(process.env.CRONITOR_API_KEY); if (!process.env.CRONITOR_API_KEY) { console.error("CRONITOR_API_KEY environment variable not set."); process.exit(1); } const DAILY_REPORT_MONITOR_KEY = process.env.CRONITOR_DAILY_REPORT_KEY || "your_default_key";
Key Techniques
1. Monitoring a Simple Scheduled Task (Cron Job)
For basic cron jobs, you want to signal when the job starts and when it completes. If Cronitor doesn't receive the "complete" signal within the expected timeframe, it will alert you.
Using curl in a Shell Script:
#!/bin/bash
MONITOR_KEY="your_cronitor_monitor_key" # Replace with your monitor's key
echo "Starting daily backup..."
# Signal job start
curl -m 10 --retry 5 "https://cronitor.link/p/${MONITOR_KEY}/run" > /dev/null 2>&1
# --- Your actual backup logic goes here ---
# Example:
# rsync -avz /data/app /mnt/backups/daily/
if [ $? -eq 0 ]; then
echo "Backup completed successfully."
# Signal job completion
curl -m 10 --retry 5 "https://cronitor.link/p/${MONITOR_KEY}/complete" > /dev/null 2>&1
else
echo "Backup failed!"
# Signal job failure with an optional message
curl -m 10 --retry 5 -X POST -d "message=Backup failed for unknown reason" "https://cronitor.link/p/${MONITOR_KEY}/fail" > /dev/null 2>&1
exit 1
fi
2. Monitoring Long-Running or Asynchronous Tasks with Detailed Status
For tasks that might run for extended periods, or those managed by task queues (like Celery, BullMQ), you'll use the run, complete, and fail endpoints to provide more granular lifecycle monitoring. This allows you to track progress, capture failure reasons, and differentiate between a job that's running late and one that simply hasn't started.
Using cronitor-python for a long-running task:
import os
import time
import traceback
import cronitor
cronitor.api_key = os.getenv("CRONITOR_API_KEY")
DATA_PROCESSING_MONITOR_KEY = os.getenv("CRONITOR_DATA_PROCESSING_KEY", "your_data_monitor_key")
def process_large_dataset():
monitor = cronitor.Monitor(DATA_PROCESSING_MONITOR_KEY)
try:
# Signal that the job has started
monitor.run(message="Starting data processing pipeline...")
print("Data processing started.")
# Simulate a long-running task
for i in range(1, 6):
time.sleep(2) # Simulate work
# You can send progress updates if desired
monitor.ping(message=f"Processing batch {i}/5", series=f"batch-{i}")
print(f"Processed batch {i}/5")
# Simulate a potential failure for demonstration
# if time.time() % 2 == 0:
# raise ValueError("Simulated processing error!")
# Signal successful completion
monitor.complete(message="Data processing pipeline completed successfully.")
print("Data processing finished.")
except Exception as e:
# Signal failure, including error message and stack trace
error_message = f"Data processing failed: {e}"
stack_trace = traceback.format_exc()
monitor.fail(message=error_message, stacktrace=stack_trace)
print(f"Error: {error_message}\n{stack_trace}")
raise # Re-raise the exception to propagate it
if __name__ == "__main__":
process_large_dataset()
3. Integrating with a Web API Endpoint
You can monitor the availability and response time of a critical API endpoint using Cronitor's ping mechanism. This is useful for external API dependencies or your own internal microservices.
Using Node.js fetch to monitor an external API:
import { Cronitor } from 'cronitor';
import fetch from 'node-fetch'; // For older Node.js versions, or use built-in fetch for Node 18+
const cronitor = new Cronitor(process.env.CRONITOR_API_KEY);
const EXTERNAL_API_MONITOR_KEY = process.env.CRONITOR_EXTERNAL_API_KEY || "your_api_monitor_key";
const TARGET_API_URL = "https://api.example.com/health"; // The API endpoint to monitor
async function checkExternalApi() {
const monitor = cronitor.Monitor(EXTERNAL_API_MONITOR_KEY);
let startTime = Date.now();
try {
monitor.run(); // Indicate the check has started
const response = await fetch(TARGET_API_URL, { timeout: 5000 }); // 5 second timeout
const latency = Date.now() - startTime;
if (!response.ok) {
throw new Error(`API returned status ${response.status}: ${response.statusText}`);
}
// Optional: Parse response to verify content
// const data = await response.json();
// if (data.status !== 'healthy') {
// throw new Error("API health check content unhealthy.");
// }
monitor.complete({
message: `API healthy. Latency: ${latency}ms`,
metrics: { latency } // Send custom metrics
});
console.log(`API check successful. Latency: ${latency}ms`);
} catch (error) {
monitor.fail({
message: `API check failed: ${error.message}`,
stacktrace: error.stack
});
console.error(`API check failed: ${error.message}`);
}
}
// Execute the check, perhaps on a schedule or as part of a synthetic test runner
checkExternalApi();
Best Practices
- Monitor All Critical Background Jobs: Assume nothing runs perfectly. Every automated task that impacts your application's integrity or user experience should have a Cronitor monitor.
- Use
run/complete/failfor Long-Running Tasks: Don't just ping at the end. Explicitly signal job start (.run()) to detect jobs that never begin or hang indefinitely. - Include Diagnostic Data in Failures: When calling
.fail(), always provide a meaningfulmessageand, if available, thestacktrace. This accelerates debugging significantly. - Set Appropriate Alert Thresholds: Configure "Grace Periods" and "Max Runtime" in Cronitor to match your job's expected behavior. Avoid overly sensitive alerts that lead to fatigue.
- Group Related Monitors: Use Cronitor's tags and projects to organize monitors logically, making it easier to manage and respond to alerts for specific application areas.
- Use Environment Variables for Keys: Never hardcode monitor keys or API keys in your codebase. Utilize environment variables for security and easier deployment across environments.
- Test Your Monitoring: Periodically trigger a failure or delay in a test environment to ensure your Cronitor integration and alert channels are working as expected.
Anti-Patterns
- Monitoring Only Job Completion. Fails to detect jobs that never start or hang indefinitely. Instead: Always ping at the start of a job using
.run()and at the end using.complete()or.fail(). - Hardcoding Monitor Keys. Directly embedding monitor keys in your code leads to configuration headaches, makes environment-specific setups difficult, and poses a security risk. Instead: Use environment variables to inject monitor keys.
- Ignoring Alert Fatigue. Overly aggressive or untuned alerts can desensitize your team to actual problems. Instead: Fine-tune Cronitor's grace periods, maximum runtimes, and notification channels to match your job's true behavior and importance.
- Using a Single Monitor for Multiple, Distinct Jobs. This makes it difficult to pinpoint which specific job failed and to gather accurate metrics for individual tasks. Instead: Create a unique Cronitor monitor for each distinct background task or process.
- Assuming a Simple Ping is Enough for Complex Tasks. A basic
pingonly tells you if a job finished. It provides no insight into intermediate failures, progress, or specific error causes. Instead: Leverage.run(),.ping(),.complete(), and.fail()with richmessageandstacktracepayloads for detailed lifecycle tracking.
Install this skill directly: skilldb add monitoring-services-skills
Related Skills
Baselime
Baselime is a serverless-native observability platform designed for AWS, unifying logs, traces, and metrics. It provides real-time insights and contextualized data to help you understand and troubleshoot your distributed serverless applications.
BetterStack
"BetterStack (formerly Better Uptime + Logtail): uptime monitoring, log management, status pages, incident management, alerting"
Checkly
"Checkly: synthetic monitoring, API checks, browser checks, Playwright-based E2E monitoring, monitoring-as-code CLI"
Datadog
"Datadog: APM, log management, infrastructure monitoring, RUM, custom metrics, dashboards, Node.js tracing"
Grafana Cloud
Grafana Cloud is a fully managed observability platform that unifies metrics (Prometheus/Graphite), logs (Loki), and traces (Tempo) within a single Grafana interface. Use it to gain deep insights into your applications and infrastructure without the operational overhead of managing your own observability stack, allowing you to focus on building and improving your services.
Highlight.io
"Highlight.io: open-source monitoring, session replay, error tracking, logging, tracing, Next.js SDK, self-hosted option"