Skip to content

Date: 2026-04-06

This task implements the Tier 2 AI agent layer. The strategic decisions are settled and logged in 09_Logs/Decisions/2026-04-06_AI-Team-Strategy — this task is purely implementation-focused.

The Three-Tier Model (established in AI-Team-Strategy.md):

  • Tier 1 — Architect: CC / Cursor (frontier models) for design, planning, complex reasoning
  • Tier 2 — Builder: Local agents for overnight batch execution of well-specified tasks ← this task
  • Tier 3 — QA Gate: CC / API for final quality/security review before deployment

Obsidian KB as the coordination layer: All tiers read context from and write outputs to the Obsidian KB markdown vaults. No vector database needed. Human-auditable, git-backed, already in use.

ComponentSpec
CPUAMD Ryzen 7 1700 (8 core / 16 thread, 3.0 GHz) — AM4 socket
RAM32 GB
GPUNVIDIA GeForce GTX 1050 — 2 GB GDDR5 VRAM
OSWindows 11 Home with WSL2

GPU assessment: GTX 1050 (2 GB VRAM) cannot run GPU-accelerated LLM inference. All inference falls to CPU until upgraded.

CPU-only inference capability:

  • Gemma 4 E2B (2.3B effective): ~15–20 tok/s — fast enough for simple tasks
  • Gemma 4 E4B (4.5B effective): ~8–12 tok/s — adequate for small overnight tasks
  • Qwen3 7B Q4: ~5–8 tok/s — slow but functional
  • 13B+ models: ~2–4 tok/s — too slow for meaningful overnight work

A GPU upgrade on the existing machine is potentially a much better ROI than buying a new mini-PC:

GPUVRAMModels unlockedEst. Used Price (CAD)Speed
RTX 3060 12 GB12 GB GDDR6Qwen3 14B Q4, Gemma 4 26B MoE~$220–28020–30 tok/s
RTX 3070 8 GB8 GB GDDR6Qwen3 8B, Gemma 4 E4B GPU-accel~$180–22025–35 tok/s

RTX 3060 12 GB is the recommended upgrade — 12 GB VRAM fits 13B models fully in VRAM, AM4/PCIe 3.0 compatible, ~120–170W (safe for existing PSU). Available used from Canada Computers or Kijiji London.

ROI: ~$250 CAD / $60/month savings = ~4-month payback vs 4-year payback for a Beelink mini-PC.

Hardware decision sequence:

  1. Test CPU-only first (Phase 0 below) — zero cost
  2. If CPU throughput is genuinely insufficient for target workflows: source RTX 3060 12 GB
  3. If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (verify actual Amazon.ca direct price)

Orchestration Stack (decided in AI-Team-Strategy.md)

Section titled “Orchestration Stack (decided in AI-Team-Strategy.md)”
LayerToolWhy
Scheduler + orchestratorn8n (self-hosted)Visual, 400+ integrations, 70+ AI nodes, no Python required
Code + implementation agentOpenHandsAutonomous coding agent; WSL2/Docker native
Complex multi-role crewsCrewAI (as needed)Role-based Python framework; YAML config
Local model serverOllamaRuns local models; OpenAI-compatible API
Memory / coordinationObsidian KBMarkdown, git-backed, shared across all tiers
Inter-agent protocolMCPAlready in CC; standardizing across ecosystem

Models to evaluate (all free via Ollama):

  • Gemma 4 E4B — start here (CPU-only, zero friction)
  • Gemma 4 26B MoE — test after GPU upgrade
  • Qwen3 14B Q4 — strong coding alternative
  • DeepSeek V3.2 — best for complex reasoning/tool-use (needs 24+ GB VRAM)

Business: MyBetterRates (MBR) — KB path: /mnt/d/FSS/KB/MBR/ Work type: Astro monorepo site development, KB structure, mini apps

First test task: One small MBR implementation task:

  • Claude (Tier 1) writes a spec to MBR KB
  • Gemma 4 (Tier 2) reads the spec, produces code output, writes result back to KB
  • Talbot reviews the output the next morning

Phase 0 — Test CPU-only (zero cost, existing hardware)

Section titled “Phase 0 — Test CPU-only (zero cost, existing hardware)”
  • Install Ollama in WSL2 (curl -fsSL https://ollama.com/install.sh | sh)
  • Pull Gemma 4 E4B: ollama pull gemma4:e4b
  • Test one small MBR task manually via CLI: provide spec from KB, measure output quality and timing
  • Evaluate: Is output quality acceptable? How long per call? Is CPU throughput sufficient for overnight use?
  • Install n8n via Docker in WSL2 (self-hosted AI starter kit)
  • Build first KB-reading workflow: scheduled trigger → read task from MBR KB → pass to Ollama → write output back to KB
  • Run overnight
  • Review morning output: quality, completeness, issues

Phase 2 — GPU upgrade decision (if Phase 0/1 shows CPU is insufficient)

Section titled “Phase 2 — GPU upgrade decision (if Phase 0/1 shows CPU is insufficient)”
  • If CPU throughput is the bottleneck: source used RTX 3060 12 GB from Canada Computers London or Kijiji London (~$250 CAD)
  • Verify PSU wattage (check existing PSU label — need 450W+ for RTX 3060)
  • Install GPU, update NVIDIA drivers in Windows + WSL2
  • Re-test same Phase 0/1 workflows with GPU acceleration
  • Report: does this resolve the bottleneck?
  • Install OpenHands via Docker in WSL2
  • Configure to read implementation specs from MBR KB
  • Configure to write code outputs to Astro monorepo
  • Run first real overnight implementation task (a specific Astro component or utility)
  • Review and integrate output the next morning