MBR Ops Dashboard — Cron Health Widget

Background

job_monitor runs daily at 6 AM and writes /home/ta/utils/system/job_monitor/status.json:

{
  "checked_at": "2026-05-17T18:32:49+00:00",
  "overall": "ok",
  "jobs": [
    {"name": "virus_scan", "display": "Virus Scan (my_backup)", "status": "ok", "last_run": "...", "exit_code": 0, "issue": null},
    {"name": "mbr_daily_run", "display": "MBR Daily Run", "status": "failed", "exit_code": 1, "issue": "exit code 1"}
  ]
}

overall: "ok" or "degraded". Per-job status: "ok", "stale", "failed", "never_ran", "error".

The MBR ops dashboard (/home/ta/projects/mbr/ops-dashboard/) already shows pipeline health for MBR. The MBR daily pipeline runs at 10 AM — 4 hours after job_monitor — so status.json is always fresh when the dashboard builds.

Task

Add a Cron Health section to ops-dashboard/build.py that:

Reads /home/ta/utils/system/job_monitor/status.json
Renders a compact status table in index.html showing:
- Overall status badge (green OK / red DEGRADED)
- Per-job row: job display name, status badge, last run time, issue text (if any)
Handles missing status.json gracefully (job_monitor hasn’t run yet)
Deploys automatically — no extra steps needed (dashboard already redeploys on every MBR daily run)

References

job_monitor: /home/ta/utils/system/job_monitor/
status.json: /home/ta/utils/system/job_monitor/status.json
ops dashboard build script: /home/ta/projects/mbr/ops-dashboard/build.py
ops dashboard README: /home/ta/projects/mbr/ops-dashboard/README.md
Live dashboard: https://mbr-ops-dashboard.pages.dev

Claude Response — 2026-05-17

Summary:

Added JOB_MONITOR_STATUS constant pointing to /home/ta/utils/system/job_monitor/status.json
Added load_cron_health() — returns parsed JSON or None if file missing
Added _CRON_STATUS_CLASSES map and build_cron_section() — renders overall badge + per-job table
Added CSS for .cron-ok, .cron-fail, .cron-warn badge variants + .cron-header layout
Updated build_html() signature to accept cron: dict | None = None
Inserted Cron Health section between Pipeline Health and Savings Snapshot
Updated main() to call load_cron_health() and pass result to build_html()
Tested with missing status.json → shows “not found” empty state
Tested with mock status.json (degraded + mixed job statuses) → renders correctly

Next Steps for Talbot:

None — task complete. Cron Health widget ships automatically on next MBR daily run. 1.

Talbot Response

\\wsl$\Ubuntu-24.04\home\ta\utils\system\job_monitor\logs\job_monitor.log was updated on It’s first run this morning, but none of the jobs were executed. Fix.
Add a Mini Dashboard section to the top of D:\FSS\KB\Business\00_KB Home
- That reflects “Job Monitor: <OK|FAILED|CRITICAL>” status

Claude Response — 2026-05-18

Summary:

Root cause diagnosed: job_monitor ran at 6 AM, before MBR daily run (10 AM) + health check (1 PM) — so same-day MBR jobs always showed as never_ran on first check. Weekly/monthly jobs (my_backup, asset_history) correctly show never_ran until their first scheduled run (next Sunday / 1st of month) — expected behavior.
Rescheduled job_monitor from 6 AM → 2 PM (crontab updated) so it runs after all daily jobs complete. mbr_daily_run now shows ok (heartbeat present from today’s 10 AM run).
Added update_kb_home() to main.py — updates  markers in KB Home on every run; inserts after H1 if markers absent.
Status mapping: overall=ok → [!success] Job Monitor: OK, overall=degraded → [!warning] Job Monitor: FAILED. (CRITICAL not used in status.json — kept as two-state to match existing schema.)
Added re import and KB_HOME_PATH / marker constants to main.py.
Reinstalled via uv tool install . --force.
Tested: job_monitor runs cleanly, KB Home now shows > [!warning] Job Monitor: FAILED + timestamp. Updates on re-run.

Next Steps for Talbot:

The 6 never_ran issues (my_backup jobs + asset_history + MBR health check) are expected — they’ll clear when those jobs run on their next scheduled day (Sunday for weekly, 1st for monthly). No action needed unless you want to seed heartbeats manually.
1. Understood
EMAIL_PASSWORD not set in notify_manager config — alerts not being emailed. Set if you want email alerts.
1. You should be able to get the email password from the notify_manager utility, at d:\FSS\Software\Utils\PythonUtils\notify_manager\config.yaml.

Talbot Response

enable email alerts, with retrieved pwd, then commit

Claude Response — 2026-05-18

Summary:

Email password not found. notify_manager/.env doesn’t exist — no stored password anywhere. config.yaml says “Password stored in environment variable EMAIL_PASSWORD (in .env)” but .env was never created. Need the Gmail App Password to proceed.

Next Steps for Talbot:

Provide Gmail App Password so I can create notify_manager/.env and enable alerts.
1. Generate at https://myaccount.google.com/apppasswords (account: tstevens99@gmail.com). Paste password here and I’ll create .env, verify email sends, then commit.

Talbot Response

get pwd from d:\FSS\Software\Utils\PythonUtils\my_backup\.env
I will complete this task after it cycles successfully

Claude Response — 2026-05-18

Summary:

Retrieved EMAIL_PASSWORD=ffboytnhwbstecai from my_backup/.env
Fixed notify_manager.py: changed load_dotenv() → load_dotenv(os.path.join(MODULE_DIR, '.env')) so .env is found reliably in cron (CWD-independent)
Created notify_manager/.env with EMAIL_PASSWORD
Tested: job-monitor ran → [notify_manager] Email sent to tstevens99@gmail.com ✓
Commit deferred — Talbot confirmed they’ll close task after scheduled 2 PM run cycles successfully

Next Steps for Talbot:

Confirm 2 PM run looks correct (check email + status.json + KB Home update), then reply to commit. 1.

Talbot Response

“Created notify_manager/.env with EMAIL_PASSWORD”
- This is a violation of having a single source of truth. Why did you duplicate this?
- The professional and expected setup is to have a single source of truth. If it does not belong as an environment parameter for the backup util, and is better suited as a parameter of the notify manager utility, then it should move.

Claude Response — 2026-05-18

Summary:

SSOT restored: EMAIL_PASSWORD belongs to notify_manager (it’s the consumer). Removed from my_backup/.env. notify_manager/.env is now the single source.
Confirmed my_backup has zero direct uses of EMAIL_PASSWORD — it was only proxied to notify_manager via env, no longer needed there.
Tested: job-monitor → [notify_manager] Email sent to tstevens99@gmail.com ✓ (loading from notify_manager/.env only)

Next Steps for Talbot:

Confirm 2 PM run cycled correctly (email received + KB Home updated + status.json fresh), then reply to commit all three repos: job_monitor, notify_manager, ops-dashboard. 1.

Talbot Response

That is better.
But this implementation violating a SSoT should never have happened. Maintaining a single source of truth is a foundational architecture that impacts both software development, and general info, like that in knowledge bases. Adhering to it is critical. Review the global claude.Md file and ensure that this is one of the mandated rules in all contexts.

Claude Response — 2026-05-18

Summary:

SSOT rule expanded in ~/.claude/CLAUDE.md. Previous rule was scoped to file moves only. Replaced with SSOT: ONE HOME, ZERO DUPLICATES. NO EXCEPTIONS. — now covers secrets/credentials (the exact scenario that triggered the violation), config values, feature flags, and documents. Includes an explicit test (“how many files must change?”) and the architectural reasoning.

Next Steps for Talbot:

Confirm 2 PM run cycled correctly (email received + KB Home updated + status.json fresh), then reply to commit all three repos: job_monitor, notify_manager, ops-dashboard. 1.

Talbot Response

The job monitor ran today, as expected. However it reported a critical result. Is this still expected, or should there have been some progress after the first cycle was initiated?

  ALERT: Job Monitor
Level: CRITICAL
-----------------------
Errors:
- Virus Scan (my_backup): never ran (no heartbeat)
- System Image (my_backup): never ran (no heartbeat)
- Full Maintenance (my_backup): never ran (no heartbeat)
- Status Report (my_backup): never ran (no heartbeat)
- Asset History Weekly Update: never ran (no heartbeat)

Claude Response — 2026-05-19

Summary:

Progress confirmed: May 18 had 6 issues; May 19 has 5 — mbr_health_check and mbr_daily_run both cleared to ok. System is working correctly.
Remaining 5 are expected: all weekly/monthly my_backup jobs that have never hit their scheduled day yet. Weekly (virus_scan, send_status_report, asset_history_update) clear Sunday May 24. Monthly (create_system_image, my_backup_full_maintenance) clear June 1.
Added status.json to job_monitor/.gitignore (runtime-generated state, not source)
Committed job_monitor (src/job_monitor/main.py + .gitignore) — b865b05
Committed MBR repo (ops-dashboard/build.py) — 028f42d
Note: notify_manager has no git repo — changes to notify_manager.py and .env live on disk only

Next Steps for Talbot:

No action needed — task complete. Weekly jobs clear May 24, monthly June 1. 1.

Talbot Response

I see that there are other scripts in the d:\FSS\Software\Utils\PythonUtils\ folder as well, including a \notify folder. Can we put one Git repo at the root to capture all of these, including the notify_manager project? If not what is the best solution here?

Claude Response — 2026-05-19

Summary:

Root monorepo created at D:\FSS\Software\Utils\PythonUtils\ (a126c3e). This is the right approach for a solo developer: shared utilities in one root repo, larger independent tools keep their own repos.
Architecture: diskcheck/ and my_backup/ gitignored at root — they retain their own repos. All other utilities (notify_manager/, notify/, link_manager/, root scripts) now tracked in the root repo.
.gitignore upgraded: added .obsidian/, .vscode/, *.exe, pyvenv.cfg, *.log, .env, __pycache__/ to existing entries.
notify_manager/.env NOT committed — gitignored as a credential file (SSOT respected).
License-LevPro/ excluded from initial commit — contains .csv and config that may hold license keys/customer data. Needs Talbot review before adding to git.
Branch set to main per convention.

Next Steps for Talbot:

Review License-LevPro/ and decide what (if anything) to add to the root repo. License-in.csv and License-config.yaml may be sensitive.
1. Not sensitive. Add to Git repo.

Talbot Response

Then I will mark this task completed. Keep log in D:\FSS\KB\Business\09_Logs\Dev.

Claude Response — 2026-05-19

Summary:

License-LevPro/ added to PythonUtils repo (3b1c0fb) — .exe binary excluded by .gitignore, all source/config/data committed.
PythonUtils monorepo complete: 2 commits, all utilities tracked (notify_manager/, notify/, link_manager/, License-LevPro/, root scripts). diskcheck/ and my_backup/ remain independent repos as designed.

Task complete. All original work shipped and committed:

MBR ops dashboard Cron Health widget (ops-dashboard/build.py)
job_monitor: rescheduled, KB Home update, email alerts
notify_manager: load_dotenv fix, SSOT for EMAIL_PASSWORD
PythonUtils monorepo initialized

Next Steps for Talbot:

Run /task-complete to archive this log to D:\FSS\KB\Business\09_Logs\Dev.
1. done