Tier 2 AI Agent Setup
Section titled “Tier 2 AI Agent Setup”Date: 2026-04-06
Background
Section titled “Background”This task implements the Tier 2 AI agent layer. The strategic decisions are settled and logged in 09_Logs/Decisions/2026-04-06_AI-Team-Strategy — this task is purely implementation-focused.
The Three-Tier Model (established in AI-Team-Strategy.md):
- Tier 1 — Architect: CC / Cursor (frontier models) for design, planning, complex reasoning
- Tier 2 — Builder: Local agents for overnight batch execution of well-specified tasks ← this task
- Tier 3 — QA Gate: CC / API for final quality/security review before deployment
Obsidian KB as the coordination layer: All tiers read context from and write outputs to the Obsidian KB markdown vaults. No vector database needed. Human-auditable, git-backed, already in use.
Hardware Context
Section titled “Hardware Context”| Component | Spec |
|---|---|
| CPU | AMD Ryzen 7 1700 (8 core / 16 thread, 3.0 GHz) — AM4 socket |
| RAM | 32 GB |
| GPU | NVIDIA GeForce GTX 1050 — 2 GB GDDR5 VRAM |
| OS | Windows 11 Home with WSL2 |
GPU assessment: GTX 1050 (2 GB VRAM) cannot run GPU-accelerated LLM inference. All inference falls to CPU until upgraded.
CPU-only inference capability:
- Gemma 4 E2B (2.3B effective): ~15–20 tok/s — fast enough for simple tasks
- Gemma 4 E4B (4.5B effective): ~8–12 tok/s — adequate for small overnight tasks
- Qwen3 7B Q4: ~5–8 tok/s — slow but functional
- 13B+ models: ~2–4 tok/s — too slow for meaningful overnight work
GPU Upgrade Option (High ROI)
Section titled “GPU Upgrade Option (High ROI)”A GPU upgrade on the existing machine is potentially a much better ROI than buying a new mini-PC:
| GPU | VRAM | Models unlocked | Est. Used Price (CAD) | Speed |
|---|---|---|---|---|
| RTX 3060 12 GB | 12 GB GDDR6 | Qwen3 14B Q4, Gemma 4 26B MoE | ~$220–280 | 20–30 tok/s |
| RTX 3070 8 GB | 8 GB GDDR6 | Qwen3 8B, Gemma 4 E4B GPU-accel | ~$180–220 | 25–35 tok/s |
RTX 3060 12 GB is the recommended upgrade — 12 GB VRAM fits 13B models fully in VRAM, AM4/PCIe 3.0 compatible, ~120–170W (safe for existing PSU). Available used from Canada Computers or Kijiji London.
ROI: ~$250 CAD / $60/month savings = ~4-month payback vs 4-year payback for a Beelink mini-PC.
Hardware decision sequence:
- Test CPU-only first (Phase 0 below) — zero cost
- If CPU throughput is genuinely insufficient for target workflows: source RTX 3060 12 GB
- If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (verify actual Amazon.ca direct price)
Orchestration Stack (decided in AI-Team-Strategy.md)
Section titled “Orchestration Stack (decided in AI-Team-Strategy.md)”| Layer | Tool | Why |
|---|---|---|
| Scheduler + orchestrator | n8n (self-hosted) | Visual, 400+ integrations, 70+ AI nodes, no Python required |
| Code + implementation agent | OpenHands | Autonomous coding agent; WSL2/Docker native |
| Complex multi-role crews | CrewAI (as needed) | Role-based Python framework; YAML config |
| Local model server | Ollama | Runs local models; OpenAI-compatible API |
| Memory / coordination | Obsidian KB | Markdown, git-backed, shared across all tiers |
| Inter-agent protocol | MCP | Already in CC; standardizing across ecosystem |
Models to evaluate (all free via Ollama):
- Gemma 4 E4B — start here (CPU-only, zero friction)
- Gemma 4 26B MoE — test after GPU upgrade
- Qwen3 14B Q4 — strong coding alternative
- DeepSeek V3.2 — best for complex reasoning/tool-use (needs 24+ GB VRAM)
First Target Workflow
Section titled “First Target Workflow”Business: MyBetterRates (MBR) — KB path: /mnt/d/FSS/KB/MBR/
Work type: Astro monorepo site development, KB structure, mini apps
First test task: One small MBR implementation task:
- Claude (Tier 1) writes a spec to MBR KB
- Gemma 4 (Tier 2) reads the spec, produces code output, writes result back to KB
- Talbot reviews the output the next morning
Phase 0 — Test CPU-only (zero cost, existing hardware)
Section titled “Phase 0 — Test CPU-only (zero cost, existing hardware)”- Install Ollama in WSL2 (
curl -fsSL https://ollama.com/install.sh | sh) - Pull Gemma 4 E4B:
ollama pull gemma4:e4b - Test one small MBR task manually via CLI: provide spec from KB, measure output quality and timing
- Evaluate: Is output quality acceptable? How long per call? Is CPU throughput sufficient for overnight use?
Phase 1 — n8n + KB-integrated workflow
Section titled “Phase 1 — n8n + KB-integrated workflow”- Install n8n via Docker in WSL2 (self-hosted AI starter kit)
- Build first KB-reading workflow: scheduled trigger → read task from MBR KB → pass to Ollama → write output back to KB
- Run overnight
- Review morning output: quality, completeness, issues
Phase 2 — GPU upgrade decision (if Phase 0/1 shows CPU is insufficient)
Section titled “Phase 2 — GPU upgrade decision (if Phase 0/1 shows CPU is insufficient)”- If CPU throughput is the bottleneck: source used RTX 3060 12 GB from Canada Computers London or Kijiji London (~$250 CAD)
- Verify PSU wattage (check existing PSU label — need 450W+ for RTX 3060)
- Install GPU, update NVIDIA drivers in Windows + WSL2
- Re-test same Phase 0/1 workflows with GPU acceleration
- Report: does this resolve the bottleneck?
Phase 3 — OpenHands code agent
Section titled “Phase 3 — OpenHands code agent”- Install OpenHands via Docker in WSL2
- Configure to read implementation specs from MBR KB
- Configure to write code outputs to Astro monorepo
- Run first real overnight implementation task (a specific Astro component or utility)
- Review and integrate output the next morning
Key References
Section titled “Key References”- Strategic research:
AI-Team-Strategy.md(this is the source task) - MBR KB:
/mnt/d/FSS/KB/MBR/ - Business KB:
/mnt/d/FSS/KB/Business/ - n8n self-hosted AI starter kit: https://github.com/n8n-io/self-hosted-ai-starter-kit
- OpenHands docs (local setup): https://docs.openhands.dev/openhands/usage/run-openhands/local-setup