Tier 2 AI Agent Setup

Date: 2026-04-06

Background

This task implements the Tier 2 AI agent layer. The strategic decisions are settled and logged in 09_Logs/Decisions/2026-04-06_AI-Team-Strategy — this task is purely implementation-focused.

The Three-Tier Model (established in AI-Team-Strategy.md):

Tier 1 — Architect: CC / Cursor (frontier models) for design, planning, complex reasoning
Tier 2 — Builder: Local agents for overnight batch execution of well-specified tasks ← this task
Tier 3 — QA Gate: CC / API for final quality/security review before deployment

Obsidian KB as the coordination layer: All tiers read context from and write outputs to the Obsidian KB markdown vaults. No vector database needed. Human-auditable, git-backed, already in use.

Hardware Context

Component	Spec
CPU	AMD Ryzen 7 1700 (8 core / 16 thread, 3.0 GHz) — AM4 socket
RAM	32 GB
GPU	NVIDIA GeForce GTX 1050 — 2 GB GDDR5 VRAM
OS	Windows 11 Home with WSL2

GPU assessment: GTX 1050 (2 GB VRAM) cannot run GPU-accelerated LLM inference. All inference falls to CPU until upgraded.

CPU-only inference capability:

Gemma 4 E2B (2.3B effective): ~15–20 tok/s — fast enough for simple tasks
Gemma 4 E4B (4.5B effective): ~8–12 tok/s — adequate for small overnight tasks
Qwen3 7B Q4: ~5–8 tok/s — slow but functional
13B+ models: ~2–4 tok/s — too slow for meaningful overnight work

GPU Upgrade Option (High ROI)

A GPU upgrade on the existing machine is potentially a much better ROI than buying a new mini-PC:

GPU	VRAM	Models unlocked	Est. Used Price (CAD)	Speed
RTX 3060 12 GB	12 GB GDDR6	Qwen3 14B Q4, Gemma 4 26B MoE	~$220–280	20–30 tok/s
RTX 3070 8 GB	8 GB GDDR6	Qwen3 8B, Gemma 4 E4B GPU-accel	~$180–220	25–35 tok/s

RTX 3060 12 GB is the recommended upgrade — 12 GB VRAM fits 13B models fully in VRAM, AM4/PCIe 3.0 compatible, ~120–170W (safe for existing PSU). Available used from Canada Computers or Kijiji London.

ROI: ~$250 CAD / $60/month savings = ~4-month payback vs 4-year payback for a Beelink mini-PC.

Hardware decision sequence:

Test CPU-only first (Phase 0 below) — zero cost
If CPU throughput is genuinely insufficient for target workflows: source RTX 3060 12 GB
If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (verify actual Amazon.ca direct price)

Orchestration Stack (decided in AI-Team-Strategy.md)

Layer	Tool	Why
Scheduler + orchestrator	n8n (self-hosted)	Visual, 400+ integrations, 70+ AI nodes, no Python required
Code + implementation agent	OpenHands	Autonomous coding agent; WSL2/Docker native
Complex multi-role crews	CrewAI (as needed)	Role-based Python framework; YAML config
Local model server	Ollama	Runs local models; OpenAI-compatible API
Memory / coordination	Obsidian KB	Markdown, git-backed, shared across all tiers
Inter-agent protocol	MCP	Already in CC; standardizing across ecosystem

Models to evaluate (all free via Ollama):

Gemma 4 E4B — start here (CPU-only, zero friction)
Gemma 4 26B MoE — test after GPU upgrade
Qwen3 14B Q4 — strong coding alternative
DeepSeek V3.2 — best for complex reasoning/tool-use (needs 24+ GB VRAM)

First Target Workflow

Business: MyBetterRates (MBR) — KB path: /mnt/d/FSS/KB/MBR/ Work type: Astro monorepo site development, KB structure, mini apps

First test task: One small MBR implementation task:

Claude (Tier 1) writes a spec to MBR KB
Gemma 4 (Tier 2) reads the spec, produces code output, writes result back to KB
Talbot reviews the output the next morning

Tasks

Phase 0 — Test CPU-only (zero cost, existing hardware)

Install Ollama in WSL2 (curl -fsSL https://ollama.com/install.sh | sh)
Pull Gemma 4 E4B: ollama pull gemma4:e4b
Test one small MBR task manually via CLI: provide spec from KB, measure output quality and timing
Evaluate: Is output quality acceptable? How long per call? Is CPU throughput sufficient for overnight use?

Phase 1 — n8n + KB-integrated workflow

Install n8n via Docker in WSL2 (self-hosted AI starter kit)
Build first KB-reading workflow: scheduled trigger → read task from MBR KB → pass to Ollama → write output back to KB
Run overnight
Review morning output: quality, completeness, issues

Phase 2 — GPU upgrade decision (if Phase 0/1 shows CPU is insufficient)

If CPU throughput is the bottleneck: source used RTX 3060 12 GB from Canada Computers London or Kijiji London (~$250 CAD)
Verify PSU wattage (check existing PSU label — need 450W+ for RTX 3060)
Install GPU, update NVIDIA drivers in Windows + WSL2
Re-test same Phase 0/1 workflows with GPU acceleration
Report: does this resolve the bottleneck?

Phase 3 — OpenHands code agent

Install OpenHands via Docker in WSL2
Configure to read implementation specs from MBR KB
Configure to write code outputs to Astro monorepo
Run first real overnight implementation task (a specific Astro component or utility)
Review and integrate output the next morning

Key References

Strategic research: AI-Team-Strategy.md (this is the source task)
MBR KB: /mnt/d/FSS/KB/MBR/
Business KB: /mnt/d/FSS/KB/Business/
n8n self-hosted AI starter kit: https://github.com/n8n-io/self-hosted-ai-starter-kit
OpenHands docs (local setup): https://docs.openhands.dev/openhands/usage/run-openhands/local-setup