Spec-Driven AI Workflow Research Report

Source: _Now.md — “Research the most effective spec-driven task-based AI workflow” + “_Now/_Next naming convention upgrade”

Executive Summary

After researching GSD, Superpowers, Conductor, CC’s native task system, the TASKS.md pattern, Ralph Wiggum loops, and several other approaches, my recommendation is a layered approach:

Install Superpowers as your primary workflow engine (brainstorm, plan, execute with TDD and subagents)
Add the Ralph Wiggum plugin for overnight autonomous loops
Keep _Now.md/_Next.md with minor naming improvements (no rename to _ToAI/_FromAI)
Adopt TASKS.md per-project as the lightweight spec/plan file that Superpowers and headless mode both consume

This gives you structured spec-driven development for interactive sessions + autonomous overnight execution, without adopting heavy enterprise workflow overhead.

Framework Comparison

1. GSD (Get Shit Done)

What it is: A meta-prompting and context engineering layer on top of Claude Code (also supports OpenCode and Gemini CLI). Created by TACHES (a solo developer who doesn’t write code — Claude does it all). Install: npx get-shit-done-cc@latest.

Stats: 12.4K GitHub stars, 1.2K forks, 750+ commits, MIT license, active Discord, v1.9.6. Trusted by engineers at Amazon, Google, Shopify, Webflow.

The Six-Step Cycle (per milestone/phase):

Initialize (/gsd:new-project) — Spawns 4 parallel research agents (stack, features, architecture, pitfalls). Produces PROJECT.md, REQUIREMENTS.md, ROADMAP.md.
Discuss (/gsd:discuss-phase N) — Captures implementation decisions before planning. Output: {phase}-CONTEXT.md.
Plan (/gsd:plan-phase N) — Research agents investigate, planner creates 2-3 atomic task plans in XML with <verify>/<done> blocks. Checker verifies. They loop until plans pass.
Execute (/gsd:execute-phase N) — Each task runs in a fresh 200K-token sub-agent context. Atomic git commits per task. Output: {phase}-N-SUMMARY.md.
Verify (/gsd:verify-work N) — Automated UAT against phase goals. Spawns debugger agents for failures.
Complete (/gsd:complete-milestone) — Archives, tags release, moves to next milestone.

Spec lives in .planning/ directory:

.planning/
  PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md, config.json
  research/, {phase}-CONTEXT.md, {phase}-N-PLAN.md, todos/

Modes: mode: yolo (autonomous) vs mode: interactive. Also: /gsd:quick skips research/plan verification for small tasks. /gsd:map-codebase for brownfield analysis. /gsd:pause-work / /gsd:resume-work for session persistence.

Strengths:

Context rot solved architecturally — sub-agents get fresh 200K windows, orchestrator stays at 30-40% utilization
Atomic git commits per task — git bisect works, rollbacks are surgical (critical for overnight review)
Built specifically for solopreneurs — no enterprise theater
Users completed 23-plan projects with “Lean Orchestrator” pattern
Quick mode for ad-hoc small tasks

Weaknesses:

Token consumption: v1.5.27 saw 4x increase; one bug fix spawned 100+ agents consuming 10K tokens in 60 seconds (Issue #120)
Commands may break after CC updates (Issue #218) — deep coupling to CC internals
Does NOT merge its methodology into existing CLAUDE.md files (Issue #50)
Discuss/verify phases are interactive by design — true overnight requires running only the execute phase unattended
Smaller community than Superpowers (12.4K vs 47.7K stars)

Fit for you: The most mature spec-driven system (vs Superpowers being the most mature skills framework). Excellent architecture, but the token cost amplification and CC update fragility are real concerns at your 439 sessions/month volume. Best as a daytime planning / overnight execution workflow. Consider alongside Superpowers rather than instead of it. A gsd-autopilot fork exists for continuous execution but is underdocumented.

Sources:

2. Superpowers (by Obra)

What it is: The most mature and widely adopted skills framework for Claude Code. An agentic development methodology with 14 composable skills that activate automatically based on context.

How it works — 3 phases:

Phase 1 — Brainstorm (/superpowers:brainstorm): Socratic questioning, one question at a time, multiple-choice when possible. Explores 2-3 approaches with trade-offs. YAGNI ruthlessly applied. Output: docs/plans/YYYY-MM-DD-<topic>-design.md

Phase 2 — Plan (/superpowers:write-plan): Decomposes design into extremely granular tasks (2-5 minutes each). Each task follows TDD: write failing test -> verify it fails -> implement minimal code -> verify it passes -> commit. Includes exact file paths, code examples, pytest commands, and git commit syntax.

Phase 3 — Execute (/superpowers:execute-plan): Two modes:

Subagent mode: Fresh agent per task, full context provided upfront. Two-stage review: (1) spec compliance, (2) code quality. Both must pass before next task.
Parallel mode: Independent tasks run concurrently across multiple agents.

The 14 Core Skills:

Skill	Purpose
`brainstorming`	Socratic requirements refinement before coding
`writing-plans`	Granular task decomposition with TDD steps
`executing-plans`	Step-by-step plan execution with checkpoints
`subagent-driven-development`	Fresh subagent per task + two-stage review
`dispatching-parallel-agents`	Concurrent execution for independent tasks
`test-driven-development`	Enforces RED-GREEN-REFACTOR; deletes code written before tests
`systematic-debugging`	4-phase root cause analysis before any fix
`verification-before-completion`	Evidence-based checks before marking done
`using-git-worktrees`	Isolated branches for clean main
`requesting-code-review`	Reviews implementation against spec
`receiving-code-review`	Handles review feedback
`finishing-a-development-branch`	Merge and cleanup
`using-superpowers`	Meta-skill
`writing-skills`	Author new custom skills

Stats: ~47.7K GitHub stars, 270+ commits, MIT license. Officially accepted into Anthropic’s Claude plugins marketplace (Jan 15, 2026). Companion repos: superpowers-lab (experimental) and superpowers-marketplace (20+ community skills).

Strengths:

Most mature and widely adopted CC skills framework — official Anthropic recognition
Structurally prevents Claude from cutting corners (TDD enforcement, two-stage review)
2+ hour autonomous sessions reported with plan adherence
Replaces the “senior dev + PM + QA” team a solopreneur doesn’t have
Free, MIT license, active community

Weaknesses:

Overhead: 10-20 minute brainstorm + planning phase before coding starts (overkill for small fixes)
Opinionated: enforces TDD — if your project doesn’t use tests, the workflow fights you
Token cost: subagent dispatch + two-stage review consumes more tokens than vanilla CC
Quality still depends on your input during requirements refinement

Fit for you: Excellent. This is the primary recommendation for structured work. For quick tasks, you skip it and work directly. For any task > 30 minutes, Superpowers provides genuine structural value.

Installation:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

Sources:

3. Conductor (Gemini CLI / CC Ports)

What it is: A context-driven development extension originally for Gemini CLI, with two Claude Code ports. Enforces: Context -> Spec -> Plan -> Implement with phase checkpoints.

How it works:

Setup — Creates persistent context files: product.md, tech-stack.md, workflow.md, code_styleguides/
New Track — Creates spec.md (requirements) + plan.md (phased task list) per feature
Implement — Works through plan.md sequentially; marks tasks [~] in progress, [x] done; commits with conventional format
Phase Checkpoints — Pauses at phase boundaries for human verification (git diff, test coverage, manual verification plan)

CC Ports:

conductor_cc (pilotparpikhodjaev/conductor_cc) — Direct port with /conductor:setup, /conductor:implement, etc. Closest to original.
claude-conductor (superbasicstudio/claude-conductor) — Documentation framework (CONDUCTOR.md, ARCHITECTURE.md, JOURNAL.md). Good for context, but lacks orchestration. 298 stars.

Strengths:

Treats context as a managed artifact (versioned markdown files)
Every task tied to a commit SHA (full auditability)
Logical revert by track/phase/task (not just commit hash)
Resumable across sessions (state in markdown, not chat)

Weaknesses:

Phase checkpoints block autonomous execution — designed for human-in-the-loop, not overnight runs
Token-heavy (loads product.md, tech-stack.md, spec.md, plan.md at every operation)
CC ports are immature community projects
No parallel tracks — sequential execution only
TDD mandated with 80% coverage targets

Fit for you: The pattern is valuable (persistent context + spec + plan as markdown files), but the specific plugins are immature. Better to adopt the pattern within your existing workflow (CLAUDE.md + TASKS.md) than to install a fragile plugin. Not recommended as a primary tool, but extract the pattern.

Sources:

4. CC Native Task System (TaskCreate/TaskUpdate/TaskList/TaskGet)

What it is: Claude Code’s built-in task management, upgraded from simple TodoWrite to a full four-tool system.

How it works:

TaskCreate — defines task with subject, description, activeForm (spinner text)
TaskUpdate — claim with owner, change status, manage dependencies (addBlockedBy/addBlocks)
TaskList — shows all tasks with status/owner/blockers
TaskGet — full details for a specific task
Tasks persist in ~/.claude/tasks/ and broadcast updates across sessions

Strengths:

Real-time cross-session coordination (Session A completes -> Session B sees immediately)
Dependency resolution (blocked/unblocked chains)
Perfect for multi-agent swarm execution
Built-in, no plugins needed

Weaknesses:

Tasks live outside your repo (not version-controlled)
Optimized for multi-agent coordination, not solo planning
Not human-editable in Obsidian
No persistent plan — pure runtime orchestration

Fit for you: Useful as the execution layer when Superpowers dispatches sub-agents. Not a replacement for TASKS.md as a planning tool. Use indirectly (via Superpowers), not directly.

5. TASKS.md Pattern

What it is: A markdown file in the project root as the single source of truth for what needs to happen.

Typical structure:

## Feature: User Authentication
- [x] Set up auth middleware
- [x] Implement login endpoint
- [ ] Add password reset flow
- [ ] Write integration tests

The insight: Context window = RAM (volatile, limited); filesystem = disk (persistent, unlimited). Anything important gets written to disk.

Expanded pattern (planning-with-files): task_plan.md for phases, findings.md for research, progress.md for session logs. Auto-recovers after /clear.

Strengths:

Version-controlled, human-readable, editable in Obsidian
Works with any AI tool (CC, Cursor, Gemini, Codex)
Survives context window resets
Simple, no dependencies

Weaknesses:

No real-time cross-session coordination
Manual status tracking (no automatic dependency resolution)

Fit for you: Adopt this for every project. It complements Superpowers (which generates plans) and headless mode (which consumes them).

6. Ralph Wiggum Plugin (Autonomous Loops)

What it is: An official Anthropic plugin that creates autonomous loops. A Stop hook intercepts Claude’s exit and re-feeds the original prompt. Each iteration sees modified files and git history from previous runs.

Real-world results:

YC hackathon teams shipped 6+ repos overnight for ~$297 in API costs
Geoffrey Huntley ran a 3-month loop that built a complete programming language

How to use:

claude -p "Read TASKS.md and work through all items. After each, commit with a descriptive message." \
  --allowedTools "Edit,Read,Bash,Write,Glob,Grep" \
  --max-iterations 50

Best for: Batch operations with well-defined success criteria — large refactors, test coverage, documentation generation.

Fit for you: Install this for overnight autonomous execution. Pair with a well-written TASKS.md and headless mode.

Sources:

7. Other Notable Approaches

cc-sdd (gotalab/cc-sdd): Spec-driven development enforcing requirements -> design -> tasks pipeline. One-command install, supports CC/Codex/Cursor/Gemini CLI. Kiro-compatible.

GitHub Spec Kit: GitHub’s open-source toolkit for spec-driven development across coding agents.

Anthropic’s Recommended Workflow: Four phases: (1) Explore the codebase, (2) Plan with extended thinking (“think hard” / “ultrathink”), (3) Implement one task at a time, (4) Commit. Maintain CLAUDE.md in git documenting mistakes. Use Boris Cherny’s approach: plan first, parallel instances, share learnings, rigorously verify.

Recommendation: Your Optimal Stack

For Interactive Development Sessions

Superpowers (brainstorm -> plan -> execute with TDD + subagent review)
    + TASKS.md per project (persistent plan)
    + CLAUDE.md per project (persistent context)
    + /now slash command (cross-project dispatch)

Workflow:

Open _Now.md, identify the project/task
Start CC in the project directory
/superpowers:brainstorm for new features (or skip for small tasks)
/superpowers:write-plan to generate granular TASKS.md
/superpowers:execute-plan for autonomous execution with verification
Write completion report to _Next_YYYY-MM-DD_ShortDesc.md

For Overnight Autonomous Runs

Ralph Wiggum plugin (autonomous loops)
    + Well-defined TASKS.md (written during the day)
    + Headless mode with --allowedTools
    + Sandboxed environment (Docker/WSL)

Workflow:

During the day: create detailed TASKS.md with clear completion criteria
Before bed: launch headless CC with Ralph Wiggum loop
Morning: review git log, test results, and TASKS.md status

For KB Work (This Repo)

_Now.md (cross-project task dispatch)
    + Direct CC interaction (no Superpowers overhead for markdown work)
    + /now + /ptr slash commands for quick sessions

Superpowers is overkill for markdown KB work. Reserve it for code projects.

_Now.md / _Next.md Naming Convention Analysis

Verdict: Keep `_Now`/`_Next`. Do NOT Rename to `_ToAI`/`_FromAI`.

Why _Now/_Next is better:

Describes temporal state (active vs. completed), which is the dimension that matters
_ToAI/_FromAI implies a mailbox metaphor that misrepresents the actual relationship — you don’t “send” _Now.md to AI, you point AI at it as context
_Now/_Next works equally well when you read the files (unlike _ToAI which becomes nonsensical when you open it yourself)

Do NOT Adopt Paired Indices (`_ToAI2` -> `_FromAI2`)

Your actual workflow is one-to-many: one _Now.md spawns multiple _NextN.md reports over time. Paired indices force one-to-one correspondence that doesn’t match reality. Added bookkeeping for zero benefit.

Two Small Improvements

Date-stamp Next files: Use _Next_YYYY-MM-DD_ShortDesc.md instead of _Next2.md. Example: _Next_2026-02-08_AIWorkflowResearch.md. Self-documenting, sorts chronologically, scales to hundreds of files.
Add source reference at top of each Next file: Source: _Now.md -- "Spec-driven workflow research". Gives traceability without paired indices.

What Changes with Spec-Driven Adoption

When you adopt Superpowers/TASKS.md for projects, _Now.md becomes a lightweight dispatcher:

- Monorepo: see TASKS.md in monorepo repo
- KB: reorganize 04_AI section
- Python utils: fix scaffolder bug

Detailed task tracking moves into project-level files. _Now.md stays as the cross-project “what am I working on?” view. _NextN.md captures cross-cutting outcomes (lessons learned, CC bugs found, workflow improvements).

Implementation Priority

Tonight (if you want an overnight run)

Install Superpowers: /plugin marketplace add obra/superpowers-marketplace
Create a TASKS.md in your monorepo with 3-5 well-defined tasks
Run headless: claude -p "Read TASKS.md and complete all tasks" --allowedTools "Edit,Read,Bash,Write,Glob,Grep"

This Week

Install Ralph Wiggum plugin for loop-based autonomous execution
Update your /ptr slash command to include completion report naming convention
Try one full Superpowers brainstorm -> plan -> execute cycle on a real feature

Create project-level CLAUDE.md for your monorepo (via /init)
Set up headless mode scripts for routine batch tasks
Review results after 5-10 sessions and adjust

Spec-Driven AI Workflow Research Report

Executive Summary

Framework Comparison

1. GSD (Get Shit Done)

2. Superpowers (by Obra)

3. Conductor (Gemini CLI / CC Ports)

4. CC Native Task System (TaskCreate/TaskUpdate/TaskList/TaskGet)

5. TASKS.md Pattern

6. Ralph Wiggum Plugin (Autonomous Loops)

7. Other Notable Approaches

Recommendation: Your Optimal Stack

For Interactive Development Sessions

For Overnight Autonomous Runs

For KB Work (This Repo)

_Now.md / _Next.md Naming Convention Analysis

Verdict: Keep _Now/_Next. Do NOT Rename to _ToAI/_FromAI.

Do NOT Adopt Paired Indices (_ToAI2 -> _FromAI2)

Two Small Improvements

What Changes with Spec-Driven Adoption

Implementation Priority

Tonight (if you want an overnight run)

This Week

Next

Appendix: Key Resources

Verdict: Keep `_Now`/`_Next`. Do NOT Rename to `_ToAI`/`_FromAI`.

Do NOT Adopt Paired Indices (`_ToAI2` -> `_FromAI2`)