Spec-Driven AI Workflow Research Report
Section titled “Spec-Driven AI Workflow Research Report”Source: _Now.md — “Research the most effective spec-driven task-based AI workflow” + “_Now/_Next naming convention upgrade”
Executive Summary
Section titled “Executive Summary”After researching GSD, Superpowers, Conductor, CC’s native task system, the TASKS.md pattern, Ralph Wiggum loops, and several other approaches, my recommendation is a layered approach:
- Install Superpowers as your primary workflow engine (brainstorm, plan, execute with TDD and subagents)
- Add the Ralph Wiggum plugin for overnight autonomous loops
- Keep
_Now.md/_Next.mdwith minor naming improvements (no rename to _ToAI/_FromAI) - Adopt TASKS.md per-project as the lightweight spec/plan file that Superpowers and headless mode both consume
This gives you structured spec-driven development for interactive sessions + autonomous overnight execution, without adopting heavy enterprise workflow overhead.
Framework Comparison
Section titled “Framework Comparison”1. GSD (Get Shit Done)
Section titled “1. GSD (Get Shit Done)”What it is: A meta-prompting and context engineering layer on top of Claude Code (also supports OpenCode and Gemini CLI). Created by TACHES (a solo developer who doesn’t write code — Claude does it all). Install: npx get-shit-done-cc@latest.
Stats: 12.4K GitHub stars, 1.2K forks, 750+ commits, MIT license, active Discord, v1.9.6. Trusted by engineers at Amazon, Google, Shopify, Webflow.
The Six-Step Cycle (per milestone/phase):
- Initialize (
/gsd:new-project) — Spawns 4 parallel research agents (stack, features, architecture, pitfalls). ProducesPROJECT.md,REQUIREMENTS.md,ROADMAP.md. - Discuss (
/gsd:discuss-phase N) — Captures implementation decisions before planning. Output:{phase}-CONTEXT.md. - Plan (
/gsd:plan-phase N) — Research agents investigate, planner creates 2-3 atomic task plans in XML with<verify>/<done>blocks. Checker verifies. They loop until plans pass. - Execute (
/gsd:execute-phase N) — Each task runs in a fresh 200K-token sub-agent context. Atomic git commits per task. Output:{phase}-N-SUMMARY.md. - Verify (
/gsd:verify-work N) — Automated UAT against phase goals. Spawns debugger agents for failures. - Complete (
/gsd:complete-milestone) — Archives, tags release, moves to next milestone.
Spec lives in .planning/ directory:
.planning/ PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md, config.json research/, {phase}-CONTEXT.md, {phase}-N-PLAN.md, todos/Modes: mode: yolo (autonomous) vs mode: interactive. Also: /gsd:quick skips research/plan verification for small tasks. /gsd:map-codebase for brownfield analysis. /gsd:pause-work / /gsd:resume-work for session persistence.
Strengths:
- Context rot solved architecturally — sub-agents get fresh 200K windows, orchestrator stays at 30-40% utilization
- Atomic git commits per task —
git bisectworks, rollbacks are surgical (critical for overnight review) - Built specifically for solopreneurs — no enterprise theater
- Users completed 23-plan projects with “Lean Orchestrator” pattern
- Quick mode for ad-hoc small tasks
Weaknesses:
- Token consumption: v1.5.27 saw 4x increase; one bug fix spawned 100+ agents consuming 10K tokens in 60 seconds (Issue #120)
- Commands may break after CC updates (Issue #218) — deep coupling to CC internals
- Does NOT merge its methodology into existing CLAUDE.md files (Issue #50)
- Discuss/verify phases are interactive by design — true overnight requires running only the execute phase unattended
- Smaller community than Superpowers (12.4K vs 47.7K stars)
Fit for you: The most mature spec-driven system (vs Superpowers being the most mature skills framework). Excellent architecture, but the token cost amplification and CC update fragility are real concerns at your 439 sessions/month volume. Best as a daytime planning / overnight execution workflow. Consider alongside Superpowers rather than instead of it. A gsd-autopilot fork exists for continuous execution but is underdocumented.
Sources:
- GitHub
- New Stack: Beating Context Rot
- ccforeveryone.com/gsd
- GSD Test Report (Medium)
- Spec-Driven Comparison (BMad, GSD, Ralph Loop)
- A GSD System for Claude Code (Esteban Torres)
2. Superpowers (by Obra)
Section titled “2. Superpowers (by Obra)”What it is: The most mature and widely adopted skills framework for Claude Code. An agentic development methodology with 14 composable skills that activate automatically based on context.
How it works — 3 phases:
Phase 1 — Brainstorm (/superpowers:brainstorm): Socratic questioning, one question at a time, multiple-choice when possible. Explores 2-3 approaches with trade-offs. YAGNI ruthlessly applied. Output: docs/plans/YYYY-MM-DD-<topic>-design.md
Phase 2 — Plan (/superpowers:write-plan): Decomposes design into extremely granular tasks (2-5 minutes each). Each task follows TDD: write failing test -> verify it fails -> implement minimal code -> verify it passes -> commit. Includes exact file paths, code examples, pytest commands, and git commit syntax.
Phase 3 — Execute (/superpowers:execute-plan): Two modes:
- Subagent mode: Fresh agent per task, full context provided upfront. Two-stage review: (1) spec compliance, (2) code quality. Both must pass before next task.
- Parallel mode: Independent tasks run concurrently across multiple agents.
The 14 Core Skills:
| Skill | Purpose |
|---|---|
brainstorming | Socratic requirements refinement before coding |
writing-plans | Granular task decomposition with TDD steps |
executing-plans | Step-by-step plan execution with checkpoints |
subagent-driven-development | Fresh subagent per task + two-stage review |
dispatching-parallel-agents | Concurrent execution for independent tasks |
test-driven-development | Enforces RED-GREEN-REFACTOR; deletes code written before tests |
systematic-debugging | 4-phase root cause analysis before any fix |
verification-before-completion | Evidence-based checks before marking done |
using-git-worktrees | Isolated branches for clean main |
requesting-code-review | Reviews implementation against spec |
receiving-code-review | Handles review feedback |
finishing-a-development-branch | Merge and cleanup |
using-superpowers | Meta-skill |
writing-skills | Author new custom skills |
Stats: ~47.7K GitHub stars, 270+ commits, MIT license. Officially accepted into Anthropic’s Claude plugins marketplace (Jan 15, 2026). Companion repos: superpowers-lab (experimental) and superpowers-marketplace (20+ community skills).
Strengths:
- Most mature and widely adopted CC skills framework — official Anthropic recognition
- Structurally prevents Claude from cutting corners (TDD enforcement, two-stage review)
- 2+ hour autonomous sessions reported with plan adherence
- Replaces the “senior dev + PM + QA” team a solopreneur doesn’t have
- Free, MIT license, active community
Weaknesses:
- Overhead: 10-20 minute brainstorm + planning phase before coding starts (overkill for small fixes)
- Opinionated: enforces TDD — if your project doesn’t use tests, the workflow fights you
- Token cost: subagent dispatch + two-stage review consumes more tokens than vanilla CC
- Quality still depends on your input during requirements refinement
Fit for you: Excellent. This is the primary recommendation for structured work. For quick tasks, you skip it and work directly. For any task > 30 minutes, Superpowers provides genuine structural value.
Installation:
/plugin marketplace add obra/superpowers-marketplace/plugin install superpowers@superpowers-marketplaceSources:
- GitHub
- Obra’s blog: How I’m using coding agents
- Complete Guide 2026
- Superpowers explained (Dev Genius)
- Superpowers marketplace
3. Conductor (Gemini CLI / CC Ports)
Section titled “3. Conductor (Gemini CLI / CC Ports)”What it is: A context-driven development extension originally for Gemini CLI, with two Claude Code ports. Enforces: Context -> Spec -> Plan -> Implement with phase checkpoints.
How it works:
- Setup — Creates persistent context files:
product.md,tech-stack.md,workflow.md,code_styleguides/ - New Track — Creates
spec.md(requirements) +plan.md(phased task list) per feature - Implement — Works through plan.md sequentially; marks tasks
[~]in progress,[x]done; commits with conventional format - Phase Checkpoints — Pauses at phase boundaries for human verification (git diff, test coverage, manual verification plan)
CC Ports:
- conductor_cc (pilotparpikhodjaev/conductor_cc) — Direct port with
/conductor:setup,/conductor:implement, etc. Closest to original. - claude-conductor (superbasicstudio/claude-conductor) — Documentation framework (CONDUCTOR.md, ARCHITECTURE.md, JOURNAL.md). Good for context, but lacks orchestration. 298 stars.
Strengths:
- Treats context as a managed artifact (versioned markdown files)
- Every task tied to a commit SHA (full auditability)
- Logical revert by track/phase/task (not just commit hash)
- Resumable across sessions (state in markdown, not chat)
Weaknesses:
- Phase checkpoints block autonomous execution — designed for human-in-the-loop, not overnight runs
- Token-heavy (loads product.md, tech-stack.md, spec.md, plan.md at every operation)
- CC ports are immature community projects
- No parallel tracks — sequential execution only
- TDD mandated with 80% coverage targets
Fit for you: The pattern is valuable (persistent context + spec + plan as markdown files), but the specific plugins are immature. Better to adopt the pattern within your existing workflow (CLAUDE.md + TASKS.md) than to install a fragile plugin. Not recommended as a primary tool, but extract the pattern.
Sources:
4. CC Native Task System (TaskCreate/TaskUpdate/TaskList/TaskGet)
Section titled “4. CC Native Task System (TaskCreate/TaskUpdate/TaskList/TaskGet)”What it is: Claude Code’s built-in task management, upgraded from simple TodoWrite to a full four-tool system.
How it works:
TaskCreate— defines task with subject, description,activeForm(spinner text)TaskUpdate— claim withowner, change status, manage dependencies (addBlockedBy/addBlocks)TaskList— shows all tasks with status/owner/blockersTaskGet— full details for a specific task- Tasks persist in
~/.claude/tasks/and broadcast updates across sessions
Strengths:
- Real-time cross-session coordination (Session A completes -> Session B sees immediately)
- Dependency resolution (blocked/unblocked chains)
- Perfect for multi-agent swarm execution
- Built-in, no plugins needed
Weaknesses:
- Tasks live outside your repo (not version-controlled)
- Optimized for multi-agent coordination, not solo planning
- Not human-editable in Obsidian
- No persistent plan — pure runtime orchestration
Fit for you: Useful as the execution layer when Superpowers dispatches sub-agents. Not a replacement for TASKS.md as a planning tool. Use indirectly (via Superpowers), not directly.
5. TASKS.md Pattern
Section titled “5. TASKS.md Pattern”What it is: A markdown file in the project root as the single source of truth for what needs to happen.
Typical structure:
## Feature: User Authentication- [x] Set up auth middleware- [x] Implement login endpoint- [ ] Add password reset flow- [ ] Write integration testsThe insight: Context window = RAM (volatile, limited); filesystem = disk (persistent, unlimited). Anything important gets written to disk.
Expanded pattern (planning-with-files): task_plan.md for phases, findings.md for research, progress.md for session logs. Auto-recovers after /clear.
Strengths:
- Version-controlled, human-readable, editable in Obsidian
- Works with any AI tool (CC, Cursor, Gemini, Codex)
- Survives context window resets
- Simple, no dependencies
Weaknesses:
- No real-time cross-session coordination
- Manual status tracking (no automatic dependency resolution)
Fit for you: Adopt this for every project. It complements Superpowers (which generates plans) and headless mode (which consumes them).
6. Ralph Wiggum Plugin (Autonomous Loops)
Section titled “6. Ralph Wiggum Plugin (Autonomous Loops)”What it is: An official Anthropic plugin that creates autonomous loops. A Stop hook intercepts Claude’s exit and re-feeds the original prompt. Each iteration sees modified files and git history from previous runs.
Real-world results:
- YC hackathon teams shipped 6+ repos overnight for ~$297 in API costs
- Geoffrey Huntley ran a 3-month loop that built a complete programming language
How to use:
claude -p "Read TASKS.md and work through all items. After each, commit with a descriptive message." \ --allowedTools "Edit,Read,Bash,Write,Glob,Grep" \ --max-iterations 50Best for: Batch operations with well-defined success criteria — large refactors, test coverage, documentation generation.
Fit for you: Install this for overnight autonomous execution. Pair with a well-written TASKS.md and headless mode.
Sources:
7. Other Notable Approaches
Section titled “7. Other Notable Approaches”cc-sdd (gotalab/cc-sdd): Spec-driven development enforcing requirements -> design -> tasks pipeline. One-command install, supports CC/Codex/Cursor/Gemini CLI. Kiro-compatible.
GitHub Spec Kit: GitHub’s open-source toolkit for spec-driven development across coding agents.
Anthropic’s Recommended Workflow: Four phases: (1) Explore the codebase, (2) Plan with extended thinking (“think hard” / “ultrathink”), (3) Implement one task at a time, (4) Commit. Maintain CLAUDE.md in git documenting mistakes. Use Boris Cherny’s approach: plan first, parallel instances, share learnings, rigorously verify.
Recommendation: Your Optimal Stack
Section titled “Recommendation: Your Optimal Stack”For Interactive Development Sessions
Section titled “For Interactive Development Sessions”Superpowers (brainstorm -> plan -> execute with TDD + subagent review) + TASKS.md per project (persistent plan) + CLAUDE.md per project (persistent context) + /now slash command (cross-project dispatch)Workflow:
- Open
_Now.md, identify the project/task - Start CC in the project directory
/superpowers:brainstormfor new features (or skip for small tasks)/superpowers:write-planto generate granular TASKS.md/superpowers:execute-planfor autonomous execution with verification- Write completion report to
_Next_YYYY-MM-DD_ShortDesc.md
For Overnight Autonomous Runs
Section titled “For Overnight Autonomous Runs”Ralph Wiggum plugin (autonomous loops) + Well-defined TASKS.md (written during the day) + Headless mode with --allowedTools + Sandboxed environment (Docker/WSL)Workflow:
- During the day: create detailed TASKS.md with clear completion criteria
- Before bed: launch headless CC with Ralph Wiggum loop
- Morning: review git log, test results, and TASKS.md status
For KB Work (This Repo)
Section titled “For KB Work (This Repo)”_Now.md (cross-project task dispatch) + Direct CC interaction (no Superpowers overhead for markdown work) + /now + /ptr slash commands for quick sessionsSuperpowers is overkill for markdown KB work. Reserve it for code projects.
_Now.md / _Next.md Naming Convention Analysis
Section titled “_Now.md / _Next.md Naming Convention Analysis”Verdict: Keep _Now/_Next. Do NOT Rename to _ToAI/_FromAI.
Section titled “Verdict: Keep _Now/_Next. Do NOT Rename to _ToAI/_FromAI.”Why _Now/_Next is better:
- Describes temporal state (active vs. completed), which is the dimension that matters
_ToAI/_FromAIimplies a mailbox metaphor that misrepresents the actual relationship — you don’t “send” _Now.md to AI, you point AI at it as context_Now/_Nextworks equally well when you read the files (unlike_ToAIwhich becomes nonsensical when you open it yourself)
Do NOT Adopt Paired Indices (_ToAI2 -> _FromAI2)
Section titled “Do NOT Adopt Paired Indices (_ToAI2 -> _FromAI2)”Your actual workflow is one-to-many: one _Now.md spawns multiple _NextN.md reports over time. Paired indices force one-to-one correspondence that doesn’t match reality. Added bookkeeping for zero benefit.
Two Small Improvements
Section titled “Two Small Improvements”-
Date-stamp Next files: Use
_Next_YYYY-MM-DD_ShortDesc.mdinstead of_Next2.md. Example:_Next_2026-02-08_AIWorkflowResearch.md. Self-documenting, sorts chronologically, scales to hundreds of files. -
Add source reference at top of each Next file:
Source: _Now.md -- "Spec-driven workflow research". Gives traceability without paired indices.
What Changes with Spec-Driven Adoption
Section titled “What Changes with Spec-Driven Adoption”When you adopt Superpowers/TASKS.md for projects, _Now.md becomes a lightweight dispatcher:
- Monorepo: see TASKS.md in monorepo repo- KB: reorganize 04_AI section- Python utils: fix scaffolder bugDetailed task tracking moves into project-level files. _Now.md stays as the cross-project “what am I working on?” view. _NextN.md captures cross-cutting outcomes (lessons learned, CC bugs found, workflow improvements).
Implementation Priority
Section titled “Implementation Priority”Tonight (if you want an overnight run)
Section titled “Tonight (if you want an overnight run)”- Install Superpowers:
/plugin marketplace add obra/superpowers-marketplace - Create a TASKS.md in your monorepo with 3-5 well-defined tasks
- Run headless:
claude -p "Read TASKS.md and complete all tasks" --allowedTools "Edit,Read,Bash,Write,Glob,Grep"
This Week
Section titled “This Week”- Install Ralph Wiggum plugin for loop-based autonomous execution
- Update your
/ptrslash command to include completion report naming convention - Try one full Superpowers brainstorm -> plan -> execute cycle on a real feature
- Create project-level CLAUDE.md for your monorepo (via
/init) - Set up headless mode scripts for routine batch tasks
- Review results after 5-10 sessions and adjust
Appendix: Key Resources
Section titled “Appendix: Key Resources”- Superpowers (47.7K stars)
- GSD (12.4K stars)
- Ralph Wiggum Plugin (Official)
- cc-sdd (Spec-driven development)
- Claude Code Best Practices (Anthropic)
- Conductor (Gemini CLI)
- conductor_cc (CC port)
- planning-with-files
- GitHub Spec Kit