All posts
nodejs8 min read

How to Monitor Cron Jobs in Node.js (The Right Way)

From bare crontab pings to full start/fail/duration tracking — a practical guide to making sure your Node.js scheduled jobs don't silently die on you.

Marco

Marco

How to Monitor Cron Jobs in Node.js (The Right Way)

There's a universal law of software: cron jobs fail silently. Not with a bang, not with a stack trace that catches your eye — they just quietly stop running and you find out three weeks later when a customer asks why their weekly report stopped arriving.

This post is about breaking that pattern. We'll cover five different Node.js scheduling setups and how to wrap each one with proper monitoring, from dead-simple to production-grade.


Why "Just Check the Logs" Isn't Enough

Before jumping to code, let's be clear on why this matters.

A cron job can fail in ways logs won't catch:

  • The container restarted and the scheduler never came back up
  • The job ran fine but took 4 hours instead of 4 minutes (and blocked the next run)
  • The server clock drifted and the job is running off-schedule
  • A dependency (database, external API) was down so the job skipped itself
  • The deployment removed the cron definition entirely

The only reliable way to know a cron job ran completely and on time is to have it proactively tell you. If it doesn't check in when expected, something went wrong.

That's the concept behind dead-man's switch monitoring — and it's what Cronping is built for.


The Monitoring Pattern

Regardless of which scheduler you use, the pattern is the same:

  1. Before the job starts → send a start ping
  2. After success → send a success ping (usually just GET /ping-url)
  3. On failure → send a fail ping
  4. Timeout → if no ping arrives within the grace period, Cronping alerts you
Your Job ──► /start ──► [runs] ──► / (success)
                                └──► /fail (on error)

Cronping watches for the success ping. If it doesn't arrive, it pages you.

The ping URL format in Cronping:

https://ping.cronping.com/{your-ping-key}         # success
https://ping.cronping.com/{your-ping-key}/start    # start signal
https://ping.cronping.com/{your-ping-key}/fail     # explicit failure
https://ping.cronping.com/{your-ping-key}/{code}   # exit code (0=ok)

Setup 1: System Crontab (Shell Script calling Node)

The old-school approach — a crontab entry that runs a Node.js script.

crontab entry:

# Run invoice processor daily at 7am
0 7 * * * /usr/bin/node /app/scripts/process-invoices.js >> /var/log/invoices.log 2>&1

process-invoices.js:

const https = require("https");

const PING_URL = "https://ping.cronping.com/abc123xyz";

async function ping(path = "") {
  const url = `${PING_URL}${path}`;
  return new Promise((resolve) => {
    https
      .get(url, (res) => {
        res.resume();
        resolve();
      })
      .on("error", () => resolve()); // Don't let a failed ping kill the job
  });
}

async function main() {
  await ping("/start");

  try {
    // Your actual job logic
    const result = await processMonthlyInvoices();
    console.log(`Processed ${result.count} invoices`);

    await ping(); // success
    process.exit(0);
  } catch (err) {
    console.error("Invoice processing failed:", err);

    // Optionally send log data with the failure
    const body = err.message;
    // POST to /fail with error message body
    await pingWithBody(`${PING_URL}/fail`, body);
    process.exit(1);
  }
}

main();

Simple. No dependencies. This covers most cases.


Setup 2: node-cron (In-Process Scheduler)

node-cron is the most popular Node.js cron library. It runs schedules inside your Node.js process — which means if your process dies, so do your jobs. Worth knowing.

npm install node-cron

Without monitoring (the way most people do it):

import cron from "node-cron";

cron.schedule("0 2 * * *", async () => {
  await runBackup();
});

With proper monitoring:

import cron from "node-cron";

const PING_URL = "https://ping.cronping.com/abc123xyz";

async function safePing(path = "") {
  try {
    await fetch(`${PING_URL}${path}`, { signal: AbortSignal.timeout(10000) });
  } catch {
    // Silent — don't let monitoring failure affect the job
  }
}

cron.schedule(
  "0 2 * * *",
  async () => {
    await safePing("/start");

    try {
      await runBackup();
      await safePing(); // success
    } catch (err) {
      console.error("Backup failed:", err);

      // POST the error message for context in Cronping's log
      try {
        await fetch(`${PING_URL}/fail`, {
          method: "POST",
          body: err.message,
          signal: AbortSignal.timeout(10000),
        });
      } catch {
        /* ignore */
      }
    }
  },
  {
    timezone: "America/Sao_Paulo", // Always set this
  },
);

Tip: Wrap the safePing helper in a shared utility. You'll use it in every job.


Setup 3: BullMQ / Bull (Queue-Based Jobs)

If you're using BullMQ for background jobs with repeatable patterns, monitoring looks a bit different because the job is defined once and the queue handles scheduling.

npm install bullmq
import { Queue, Worker } from "bullmq";
import IORedis from "ioredis";

const connection = new IORedis({ maxRetriesPerRequest: null });

// The Queue definition
const emailQueue = new Queue("emails", { connection });

// Add the repeatable job (once, at startup)
await emailQueue.add(
  "weekly-digest",
  { type: "digest" },
  {
    repeat: {
      pattern: "0 8 * * MON", // Every Monday 8am
      tz: "America/Sao_Paulo",
    },
  },
);

// The Worker — this is where monitoring goes
const PING_URLS = {
  "weekly-digest": "https://ping.cronping.com/abc123xyz",
  "monthly-report": "https://ping.cronping.com/def456uvw",
};

const worker = new Worker(
  "emails",
  async (job) => {
    const pingUrl = PING_URLS[job.name];

    if (pingUrl) {
      await fetch(`${pingUrl}/start`, {
        signal: AbortSignal.timeout(5000),
      }).catch(() => {});
    }

    // Job logic
    if (job.name === "weekly-digest") {
      await sendWeeklyDigest();
    }

    if (pingUrl) {
      await fetch(pingUrl, { signal: AbortSignal.timeout(5000) }).catch(
        () => {},
      );
    }
  },
  { connection },
);

worker.on("failed", async (job, err) => {
  const pingUrl = PING_URLS[job?.name ?? ""];
  if (pingUrl) {
    await fetch(`${pingUrl}/fail`, {
      method: "POST",
      body: err.message,
      signal: AbortSignal.timeout(5000),
    }).catch(() => {});
  }
});

One advantage here: BullMQ's failed event fires for all retries exhausted, so you're not spamming failure pings on the first retry.


Setup 4: Vercel Cron / Next.js Route Handlers

If you're on Next.js/Vercel and using vercel.json for scheduled invocations, your "cron job" is really an HTTP endpoint that Vercel calls on a schedule.

vercel.json:

{
  "crons": [
    {
      "path": "/api/jobs/daily-sync",
      "schedule": "0 3 * * *"
    }
  ]
}

app/api/jobs/daily-sync/route.ts:

import { NextResponse } from "next/server";

const PING_URL = "https://ping.cronping.com/abc123xyz";

export async function GET(request: Request) {
  // Optional: verify this is actually called by Vercel Cron
  const authHeader = request.headers.get("authorization");
  if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  await fetch(`${PING_URL}/start`).catch(() => {});

  try {
    await runDailySync();

    await fetch(PING_URL).catch(() => {});
    return NextResponse.json({ success: true });
  } catch (err) {
    await fetch(`${PING_URL}/fail`, {
      method: "POST",
      body: err instanceof Error ? err.message : "Unknown error",
    }).catch(() => {});

    // Still return 200 or Vercel will retry automatically
    return NextResponse.json({ success: false, error: String(err) });
  }
}

Note: if you throw or return a 5xx, Vercel will retry the job. Usually you want to handle errors internally and always return 200, unless retrying is intentional.


Setup 5: Kubernetes CronJobs

For those running K8s, your cron jobs are CronJob resources that spin up pods on schedule. The monitoring pattern is typically in your container entrypoint or a wrapper script.

Dockerfile (or entrypoint):

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci --production
CMD ["node", "scripts/run-with-ping.js"]

run-with-ping.js:

import { execSync } from "child_process";

const PING_URL = process.env.CRONPING_URL; // Inject via K8s secret/configmap

if (!PING_URL) {
  console.error("CRONPING_URL not set — running job without monitoring");
}

async function ping(path = "") {
  if (!PING_URL) return;
  const url = `${PING_URL}${path}`;
  try {
    const res = await fetch(url, { signal: AbortSignal.timeout(10000) });
    if (!res.ok) console.warn(`Cronping ping returned ${res.status}`);
  } catch (err) {
    console.warn("Cronping ping failed (non-fatal):", err.message);
  }
}

async function main() {
  await ping("/start");

  try {
    // Dynamically load the job
    const jobName = process.env.JOB_NAME || "default";
    const { run } = await import(`./jobs/${jobName}.js`);
    await run();

    await ping();
    process.exit(0);
  } catch (err) {
    console.error("Job failed:", err);
    await ping("/fail");
    process.exit(1);
  }
}

main();

CronJob manifest:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-cleanup
spec:
  schedule: "0 4 * * *"
  timeZone: "UTC"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: worker
              image: myapp/cron-worker:latest
              env:
                - name: JOB_NAME
                  value: "db-cleanup"
                - name: CRONPING_URL
                  valueFrom:
                    secretKeyRef:
                      name: cronping-secrets
                      key: DB_CLEANUP_PING_URL
          restartPolicy: Never

A Reusable Helper (TypeScript)

If you're doing this across multiple jobs, extract it:

// lib/cronping.ts

const BASE_URL = "https://ping.cronping.com";

type PingSignal = "start" | "fail" | "log" | number;

export function createMonitor(pingKey: string) {
  async function ping(signal?: PingSignal, body?: string) {
    if (!pingKey) return;

    const path = signal === undefined ? "" : `/${signal}`;
    const url = `${BASE_URL}/${pingKey}${path}`;

    try {
      await fetch(url, {
        method: body ? "POST" : "GET",
        body,
        signal: AbortSignal.timeout(10000),
      });
    } catch {
      // Never let monitoring break the job
    }
  }

  async function wrap<T>(fn: () => Promise<T>): Promise<T> {
    await ping("start");
    try {
      const result = await fn();
      await ping();
      return result;
    } catch (err) {
      await ping("fail", err instanceof Error ? err.message : String(err));
      throw err;
    }
  }

  return { ping, wrap };
}

Usage:

import { createMonitor } from "@/lib/cronping";

const monitor = createMonitor(process.env.BACKUP_PING_KEY!);

// Option A: wrap style
await monitor.wrap(async () => {
  await runDatabaseBackup();
});

// Option B: explicit signals
await monitor.ping("start");
try {
  await runDatabaseBackup();
  await monitor.ping();
} catch (err) {
  await monitor.ping("fail", err.message);
}

Common Mistakes to Avoid

1. Letting monitoring failures crash the job

Always .catch(() => {}) on ping calls. If Cronping is down (or your network has a hiccup), the job should still run. Never let monitoring infrastructure become a single point of failure for your actual work.

2. Sending the success ping before the job finishes

// ❌ Wrong — pings success before running
await ping();
await doActualWork();

// ✅ Correct
await doActualWork();
await ping();

3. Not using /start for long jobs

If a job takes 20 minutes and your Cronping grace period is 15 minutes, it'll alert before the job even has time to finish. The /start ping resets the clock — Cronping will wait for the success ping from when the job started, not from when it was expected to start.

4. Hardcoding ping keys

Put them in environment variables or your secrets manager. You don't want these in git.

5. Using the same ping key for different jobs

Each job should have its own heartbeat in Cronping. One key per job = clean alerting and history.


Wrapping Up

Monitoring cron jobs in Node.js comes down to two things: wrapping every job with start/success/fail pings, and configuring the expected schedule in a monitoring tool like Cronping.

The code overhead is minimal — the reusable helper above is 30 lines. What you get in return is peace of mind that your scheduled jobs are actually running, on time, every time.

Silently broken cron jobs are a solved problem. The only question is whether you've set up the solution yet.