Skip to content

Built Option B (custom Python utility) over Option A (healthchecks.io SaaS). Rationale: WSL always runs, external dependency concern, notify_manager already in stack, richer diagnostics (exit code + staleness vs. just “missed ping”).

/home/ta/utils/system/job_monitor/ — standalone uv Python package, git repo initialized.

FilePurpose
heartbeat.shCalled by each cron job: ; heartbeat.sh <name> $?
src/job_monitor/main.pyChecks staleness + exit codes, fires CRITICAL alert, writes status.json
config.yaml7 jobs with max_age_hours thresholds
README.mdSetup + “adding a new job” guide

Installed as CLI tool: job-monitor in PATH.

JobMax Age
virus_scan192h (8 days)
create_system_image840h (35 days)
my_backup_full_maintenance840h (35 days)
send_status_report192h (8 days)
asset_history_update192h (8 days)
mbr_health_check26h
mbr_daily_run26h

All 7 jobs got ; heartbeat.sh <name> $? appended. New daily monitor entry:

0 6 * * * /home/ta/.local/bin/job-monitor >> /home/ta/utils/system/job_monitor/logs/job_monitor.log 2>&1

Every run writes status.json (overall: ok/degraded, per-job status/exit_code/issue). Follow-on task drafted to add Cron Health widget to MBR ops dashboard: mbr-ops-dashboard-cron-health.md.

First 6 AM run alerts all 7 jobs “never ran” — expected, not a failure. Heartbeats populate as jobs run on their normal schedules.

  • job_monitor repo: 00d5ed2 — initial utility
  • KB Business: 3da44d9 — task files