AI Team Strategy — Settled Decisions

Context

Strategic research into adding a local Tier 2 AI agent team alongside existing CC/Cursor setup. Implementation task: Tier-2-Agent-Setup

The Three-Tier Model

Tier	Tool	Role
1 — Architect	CC / Cursor (frontier)	Design, planning, complex reasoning, architecture
2 — Builder	Local agents	Overnight batch execution of well-specified tasks
3 — QA Gate	CC / API (frontier)	Final quality + security review before deployment

CC and Cursor are NOT being replaced — this is additive. Local agents handle volume/overnight work; frontier models handle judgment.

Obsidian KB as the Coordination Layer

The existing KB (markdown vaults, git-backed) already solves the hardest problem in agent systems: long-term memory. All tiers read context from and write outputs to the KB. No vector DB needed. Human-auditable, already in use via SMTM task system.

Orchestration Stack

Layer	Tool
Scheduler + orchestrator	n8n (self-hosted, Docker, WSL2)
Code + implementation agent	OpenHands
Complex role-based crews	CrewAI (as needed)
Local model server	Ollama
Inter-agent protocol	MCP (already in CC; ecosystem converging)

OpenClaw: Highest capability ceiling but security crisis in early 2026 (135K exposed instances). Do not use with business data until late 2026.

Hardware Decision

Existing machine specs:

AMD Ryzen 7 1700 (8C/16T), 32 GB RAM
NVIDIA GTX 1050 — 2 GB VRAM (not usable for GPU-accelerated LLM inference)
CPU-only inference: ~8–20 tok/s depending on model size

Decision sequence:

Test CPU-only with Gemma 4 E4B first — zero cost
If CPU insufficient: source used RTX 3060 12 GB (~$250 CAD) — ~4-month payback, AM4-compatible
If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (~$2,800–3,160 CAD; verify Amazon.ca direct price — filter to “Sold by Amazon.ca”)
RTX Pro 6000 Blackwell ($13K), Beelink at $8K+: not financially justified for overnight-only use case

Key insight: The workstation is a Tier 2 overnight executor only — no daytime interactive use. Human review time is the real throughput bottleneck, not inference speed.

Financial trigger for hardware: Measured API costs >$80 CAD/month, or CPU-only quality demonstrably insufficient for target workflows.

Models (all free via Ollama)

Model	Effective params	Runs on	Best for
Gemma 4 E4B (Apache 2.0, Apr 2026)	~4.5B	CPU-only	Start here
Gemma 4 26B MoE	~4B active/token	12 GB VRAM	Agentic coding after GPU upgrade
Qwen3 14B Q4	14B	12 GB VRAM	Coding/SWE alternative
DeepSeek V3.2	MoE	24+ GB VRAM	Complex reasoning (future)

Key Warnings / Gotchas

Amazon.ca third-party prices are inflated — always filter to “Sold by Amazon.ca” for accurate Beelink pricing
Existing PSU must handle GPU upgrade — verify wattage on PSU label before buying RTX 3060 (needs 450W+)
OpenClaw security posture — do not deploy with business data until at least late 2026
Payback math: RTX 3060 ~4 months; Beelink ~4 years; RTX Pro 6000 never — hardware decision must be triggered by measured cost, not assumed need

Lessons Learned

1. Hardware decisions need a measured trigger, not assumed ROI This decision went through multiple revisions (AMD mini-PC → RTX Pro 6000 Blackwell → back to mini-PC → “do nothing” → GPU upgrade) because the use case kept shifting. The root cause: the actual workflow (overnight-only, human review is the bottleneck) wasn’t pinned down until late in the analysis. The correct pattern for any future hardware decision:

Define the actual workflow and its usage pattern first
Measure the real cost of the current approach (API spend, time lost)
Only buy hardware when a specific, measured trigger is hit — never on assumed ROI

2. The Three-Tier Model is the reusable mental model for AI tooling decisions Frontier models (CC/Cursor) are expensive per token but unmatched for judgment. Local models are cheap/free but behind on quality. The resolution isn’t either/or — it’s explicit tier assignment:

Tier 1 (Frontier): Design, architecture, complex reasoning, QA gates
Tier 2 (Local): Execution of well-specified work, overnight batch tasks, background agents
Tier 3 (Frontier QA): Final review before deployment This model applies to any future AI workflow or tooling decision — always ask which tier a task belongs to before choosing the tool.