AI Team Strategy — Settled Decisions
Section titled “AI Team Strategy — Settled Decisions”Context
Section titled “Context”Strategic research into adding a local Tier 2 AI agent team alongside existing CC/Cursor setup. Implementation task: Tier-2-Agent-Setup
The Three-Tier Model
Section titled “The Three-Tier Model”| Tier | Tool | Role |
|---|---|---|
| 1 — Architect | CC / Cursor (frontier) | Design, planning, complex reasoning, architecture |
| 2 — Builder | Local agents | Overnight batch execution of well-specified tasks |
| 3 — QA Gate | CC / API (frontier) | Final quality + security review before deployment |
CC and Cursor are NOT being replaced — this is additive. Local agents handle volume/overnight work; frontier models handle judgment.
Obsidian KB as the Coordination Layer
Section titled “Obsidian KB as the Coordination Layer”The existing KB (markdown vaults, git-backed) already solves the hardest problem in agent systems: long-term memory. All tiers read context from and write outputs to the KB. No vector DB needed. Human-auditable, already in use via SMTM task system.
Orchestration Stack
Section titled “Orchestration Stack”| Layer | Tool |
|---|---|
| Scheduler + orchestrator | n8n (self-hosted, Docker, WSL2) |
| Code + implementation agent | OpenHands |
| Complex role-based crews | CrewAI (as needed) |
| Local model server | Ollama |
| Inter-agent protocol | MCP (already in CC; ecosystem converging) |
OpenClaw: Highest capability ceiling but security crisis in early 2026 (135K exposed instances). Do not use with business data until late 2026.
Hardware Decision
Section titled “Hardware Decision”Existing machine specs:
- AMD Ryzen 7 1700 (8C/16T), 32 GB RAM
- NVIDIA GTX 1050 — 2 GB VRAM (not usable for GPU-accelerated LLM inference)
- CPU-only inference: ~8–20 tok/s depending on model size
Decision sequence:
- Test CPU-only with Gemma 4 E4B first — zero cost
- If CPU insufficient: source used RTX 3060 12 GB (~$250 CAD) — ~4-month payback, AM4-compatible
- If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (~$2,800–3,160 CAD; verify Amazon.ca direct price — filter to “Sold by Amazon.ca”)
- RTX Pro 6000 Blackwell ($13K), Beelink at $8K+: not financially justified for overnight-only use case
Key insight: The workstation is a Tier 2 overnight executor only — no daytime interactive use. Human review time is the real throughput bottleneck, not inference speed.
Financial trigger for hardware: Measured API costs >$80 CAD/month, or CPU-only quality demonstrably insufficient for target workflows.
Models (all free via Ollama)
Section titled “Models (all free via Ollama)”| Model | Effective params | Runs on | Best for |
|---|---|---|---|
| Gemma 4 E4B (Apache 2.0, Apr 2026) | ~4.5B | CPU-only | Start here |
| Gemma 4 26B MoE | ~4B active/token | 12 GB VRAM | Agentic coding after GPU upgrade |
| Qwen3 14B Q4 | 14B | 12 GB VRAM | Coding/SWE alternative |
| DeepSeek V3.2 | MoE | 24+ GB VRAM | Complex reasoning (future) |
Key Warnings / Gotchas
Section titled “Key Warnings / Gotchas”- Amazon.ca third-party prices are inflated — always filter to “Sold by Amazon.ca” for accurate Beelink pricing
- Existing PSU must handle GPU upgrade — verify wattage on PSU label before buying RTX 3060 (needs 450W+)
- OpenClaw security posture — do not deploy with business data until at least late 2026
- Payback math: RTX 3060 ~4 months; Beelink ~4 years; RTX Pro 6000 never — hardware decision must be triggered by measured cost, not assumed need
Lessons Learned
Section titled “Lessons Learned”1. Hardware decisions need a measured trigger, not assumed ROI This decision went through multiple revisions (AMD mini-PC → RTX Pro 6000 Blackwell → back to mini-PC → “do nothing” → GPU upgrade) because the use case kept shifting. The root cause: the actual workflow (overnight-only, human review is the bottleneck) wasn’t pinned down until late in the analysis. The correct pattern for any future hardware decision:
- Define the actual workflow and its usage pattern first
- Measure the real cost of the current approach (API spend, time lost)
- Only buy hardware when a specific, measured trigger is hit — never on assumed ROI
2. The Three-Tier Model is the reusable mental model for AI tooling decisions Frontier models (CC/Cursor) are expensive per token but unmatched for judgment. Local models are cheap/free but behind on quality. The resolution isn’t either/or — it’s explicit tier assignment:
- Tier 1 (Frontier): Design, architecture, complex reasoning, QA gates
- Tier 2 (Local): Execution of well-specified work, overnight batch tasks, background agents
- Tier 3 (Frontier QA): Final review before deployment This model applies to any future AI workflow or tooling decision — always ask which tier a task belongs to before choosing the tool.