Skip to content

Strategic research into adding a local Tier 2 AI agent team alongside existing CC/Cursor setup. Implementation task: Tier-2-Agent-Setup


TierToolRole
1 — ArchitectCC / Cursor (frontier)Design, planning, complex reasoning, architecture
2 — BuilderLocal agentsOvernight batch execution of well-specified tasks
3 — QA GateCC / API (frontier)Final quality + security review before deployment

CC and Cursor are NOT being replaced — this is additive. Local agents handle volume/overnight work; frontier models handle judgment.


The existing KB (markdown vaults, git-backed) already solves the hardest problem in agent systems: long-term memory. All tiers read context from and write outputs to the KB. No vector DB needed. Human-auditable, already in use via SMTM task system.


LayerTool
Scheduler + orchestratorn8n (self-hosted, Docker, WSL2)
Code + implementation agentOpenHands
Complex role-based crewsCrewAI (as needed)
Local model serverOllama
Inter-agent protocolMCP (already in CC; ecosystem converging)

OpenClaw: Highest capability ceiling but security crisis in early 2026 (135K exposed instances). Do not use with business data until late 2026.


Existing machine specs:

  • AMD Ryzen 7 1700 (8C/16T), 32 GB RAM
  • NVIDIA GTX 1050 — 2 GB VRAM (not usable for GPU-accelerated LLM inference)
  • CPU-only inference: ~8–20 tok/s depending on model size

Decision sequence:

  1. Test CPU-only with Gemma 4 E4B first — zero cost
  2. If CPU insufficient: source used RTX 3060 12 GB (~$250 CAD) — ~4-month payback, AM4-compatible
  3. If RTX 3060 still insufficient: evaluate Beelink GTR9 Pro (~$2,800–3,160 CAD; verify Amazon.ca direct price — filter to “Sold by Amazon.ca”)
  4. RTX Pro 6000 Blackwell ($13K), Beelink at $8K+: not financially justified for overnight-only use case

Key insight: The workstation is a Tier 2 overnight executor only — no daytime interactive use. Human review time is the real throughput bottleneck, not inference speed.

Financial trigger for hardware: Measured API costs >$80 CAD/month, or CPU-only quality demonstrably insufficient for target workflows.


ModelEffective paramsRuns onBest for
Gemma 4 E4B (Apache 2.0, Apr 2026)~4.5BCPU-onlyStart here
Gemma 4 26B MoE~4B active/token12 GB VRAMAgentic coding after GPU upgrade
Qwen3 14B Q414B12 GB VRAMCoding/SWE alternative
DeepSeek V3.2MoE24+ GB VRAMComplex reasoning (future)

  • Amazon.ca third-party prices are inflated — always filter to “Sold by Amazon.ca” for accurate Beelink pricing
  • Existing PSU must handle GPU upgrade — verify wattage on PSU label before buying RTX 3060 (needs 450W+)
  • OpenClaw security posture — do not deploy with business data until at least late 2026
  • Payback math: RTX 3060 ~4 months; Beelink ~4 years; RTX Pro 6000 never — hardware decision must be triggered by measured cost, not assumed need

1. Hardware decisions need a measured trigger, not assumed ROI This decision went through multiple revisions (AMD mini-PC → RTX Pro 6000 Blackwell → back to mini-PC → “do nothing” → GPU upgrade) because the use case kept shifting. The root cause: the actual workflow (overnight-only, human review is the bottleneck) wasn’t pinned down until late in the analysis. The correct pattern for any future hardware decision:

  • Define the actual workflow and its usage pattern first
  • Measure the real cost of the current approach (API spend, time lost)
  • Only buy hardware when a specific, measured trigger is hit — never on assumed ROI

2. The Three-Tier Model is the reusable mental model for AI tooling decisions Frontier models (CC/Cursor) are expensive per token but unmatched for judgment. Local models are cheap/free but behind on quality. The resolution isn’t either/or — it’s explicit tier assignment:

  • Tier 1 (Frontier): Design, architecture, complex reasoning, QA gates
  • Tier 2 (Local): Execution of well-specified work, overnight batch tasks, background agents
  • Tier 3 (Frontier QA): Final review before deployment This model applies to any future AI workflow or tooling decision — always ask which tier a task belongs to before choosing the tool.