10 Cron Job Mistakes That Will Bite You in Production

Cron jobs are easy to write and easy to break. They run unattended, produce no visible output on success, and fail in ways that don't show up in your error tracker.

Most teams learn this the hard way. Here are the 10 mistakes that cause the most pain, and how to fix them before they ruin your weekend.

1. Using a Relative Path Instead of an Absolute Path

code

# ❌ Broken
* * * * * python3 process.py

# ✅ Works
* * * * * /usr/bin/python3 /opt/scripts/process.py

Cron runs in a minimal environment. The PATH variable is not your shell's PATH. Commands that work in your terminal fail silently in cron because the binary simply can't be found.

Fix: Always use absolute paths for both the interpreter and the script. Run which python3 to find the full path.

2. Ignoring stderr

code

# ❌ Errors disappear
* * * * * /opt/scripts/process.py > /var/log/process.log

# ✅ Capture both stdout and stderr
* * * * * /opt/scripts/process.py > /var/log/process.log 2>&1

Without 2>&1, error output (stack traces, permission errors, "file not found" messages) goes to cron's mail queue, which either doesn't exist or nobody reads. Errors vanish.

Fix: Always append 2>&1 or redirect stderr to a dedicated file: 2> /var/log/process.err.

3. No Timeout on Long-Running Jobs

code

# ❌ Can run forever
* * * * * /opt/scripts/fetch-data.sh

# ✅ Killed if it runs too long
* * * * * timeout 300 /opt/scripts/fetch-data.sh

A job that hangs (waiting on a locked database query, an HTTP call that never times out, a file write to a full disk) will sit there consuming resources. The next scheduled run fires, another process stacks up, and you end up with 47 processes all hanging.

Fix: Wrap the command with timeout <seconds>. Pick a timeout that's 2–3x the normal expected duration.

4. No Locking Against Concurrent Runs

code

# ❌ Overlapping runs if job takes longer than its interval
*/5 * * * * /opt/scripts/process-queue.sh

# ✅ Skip if already running
*/5 * * * * flock -n /tmp/process-queue.lock /opt/scripts/process-queue.sh

If a job takes 8 minutes and runs every 5 minutes, you'll have two (then three, then four) overlapping instances. Each one reads the same queue, processes the same records, and produces duplicate results.

Fix: Use flock on Linux for file-based locking. For Python jobs, use the filelock library. For Node.js, use a Redis-based lock if the job runs across multiple machines.

5. Setting a MAILTO Nobody Reads

code

# ❌ Goes to a dead inbox
MAILTO=devops@company.com

# ✅ Either actively monitor it, or disable it
MAILTO=""

MAILTO sends cron output to an email address. This sounds useful but in practice the devops alias hasn't been checked since 2019 and has 40,000 unread messages. Failures disappear into the inbox nobody opens.

Fix: Either delete MAILTO entirely and use proper monitoring, or route to a channel someone actively monitors (a ticket system, a dedicated Slack-to-email bridge, etc.).

6. Not Testing the Cron Expression Before Production

code

# What you think "every weekday at 9am" means
0 9 * * 1-5

# What you actually wrote (you forgot the server is UTC)
0 9 * * 1-5  # = 6am New York, 2am LA, wrong for everyone

Cron expression mistakes are common and subtle. The classic traps:

Mixing up field order (minute vs hour vs day)
Server is in UTC, schedule assumes local time
Forgetting Daylight Saving Time shifts the effective local time
*/15 on the day-of-month field doesn't mean "every 15 days"

Fix: Use crontab.guru to validate expressions. Set TZ= explicitly in the crontab for clarity. Test with a tight interval first (every minute), watch it fire, then change to the real schedule.

7. No Monitoring for "Job Never Ran"

code

# ❌ No way to know this ran
0 3 * * * /scripts/run-backup.sh

# ✅ Dead man's switch: alerts if ping doesn't arrive
0 3 * * * /scripts/run-backup.sh && curl -fsS https://ping.cronping.com/abc123xyz

This is the biggest one. Logs tell you what happened when the job ran. They don't tell you when the job doesn't run at all: when the server rebooted, the crontab was wiped, or the scheduler died.

Fix: Add a ping URL to every critical job. If the ping doesn't arrive within the grace period after the expected run time, you get an alert. This is the only reliable way to detect missing executions.

8. Jobs That Depend on Other Jobs With No Coordination

code

# ❌ Both jobs assume the other ran first
0 4 * * * /scripts/extract-data.sh
0 5 * * * /scripts/process-data.sh  # assumes extract ran and succeeded

Cron has no concept of job dependencies. If extract-data.sh fails or runs long, process-data.sh fires anyway and processes stale, incomplete, or missing data. Silently, successfully, incorrectly.

Fix: Either chain the jobs in a single script (extract && process) or use a proper orchestrator (Airflow, Prefect, Temporal) for complex pipelines. For simple cases, have the second job check for the presence of the expected output from the first.

9. Not Monitoring Job Duration

code

# You only know if it ran, not if it ran correctly
curl https://ping.cronping.com/abc123xyz

# ✅ Also capture duration
START=$(date +%s)
/scripts/run-backup.sh
DURATION=$(( $(date +%s) - START ))
curl -X POST https://ping.cronping.com/abc123xyz \
  -H 'Content-Type: application/json' \
  -d "{\"duration_seconds\": $DURATION}"

A backup that normally takes 3 minutes and suddenly takes 45 minutes is a warning signal, even if it ultimately succeeds. Disk is filling up. A table grew unexpectedly. The network is degraded.

Fix: Send duration (and optionally rows processed, bytes written, etc.) as a JSON payload with your ping. Set alerts in Cronping if duration_seconds > threshold.

10. No Inventory of What Jobs Exist

This one doesn't have a code snippet because the fix isn't a code fix.

Most teams have no single place that lists all their cron jobs. Jobs live across:

Production server crontabs
Kubernetes CronJob YAML files
Lambda/EventBridge rules
Vercel/Netlify cron settings
GitHub Actions scheduled workflows
Cloud scheduler configurations

When something breaks, nobody knows which jobs exist, what they do, or who owns them. When a server is decommissioned, jobs silently disappear and nobody notices for months.

Fix: Maintain a shared inventory: a Notion table, a README, a spreadsheet, anything. For each job: name, what it does, expected schedule, owner, ping key (if monitored). Review it quarterly. When you add a job, add it to the inventory. When you remove one, remove the row.

Simple. Not glamorous. Absolutely worth doing.

The Scorecard

Rate your cron setup from 0–10 (one point per item):

0–3: Your cron jobs are a liability. Start with #7. 4–6: Decent foundation. Finish the checklist. 7–10: Your team will sleep well on-call. Well done.

If you're starting with #7, Cronping takes about 2 minutes to set up. Free plan covers 5 monitors. Add the curl to your most critical jobs today.