AI Dev Workflow

Master workflow for AI-assisted development. Planning first, verification always, multi-tool when valuable.

Core Principles

Plan first, code later — clear roadmap prevents scope creep
Verify before declaring done — run thorough tests, grep for stale refs, check outputs
One task at a time — fix -> verify -> next (not batch-and-pray)
Short, focused sessions — ~40 messages or one feature per session
Commit after each confirmed step — maintain clean history
Review for lessons — at end of each task, capture what should improve

Workflow Phases

Phase 1: Preparation

Start with a clear PRD (Product Requirements Doc)
- Include sequence diagrams for user flows and data flow
Ask AI for suggestions on the best way to proceed
Ask AI to improve your project outline
Ask: “Do you completely understand? What could be clarified?”
Use structured docs before opening the IDE

Phase 2: Planning (CC Plan Mode)

Enter Plan Mode: Shift+Tab twice
Review proposed architecture before approving build
For complex work, use extended thinking: “think hard” or “ultrathink”
Break into atomic tasks with clear completion criteria

Phase 3: Implementation

Work through task list one item at a time
For each item: implement -> verify -> next
Use Haiku 4.5 for implementation (cheaper), Opus for review
Always maintain ChangeLog.md: have AI write its changes there

Phase 4: Verification/Testing (Autonomous AI-Driven)

Objective: World-class reliability through autonomous AI testing, visual regression, and cross-platform verification.

4.1 Unit + Component Testing

Vitest for business logic and Astro component testing
Use Astro Container API for rendering components in isolation
@testing-library/dom for DOM assertions
Command: npm run test:unit (vitest)

4.2 E2E + Cross-Browser Testing

Playwright for full browser automation across Chromium, Firefox, WebKit
Configure playwright.config.ts projects: Desktop Chrome, Desktop Firefox, Desktop Safari (WebKit), Mobile Chrome (Pixel 7), Mobile Safari (iPhone 15)
Use aria-label and text-based locators (not brittle CSS selectors)
AI Enhancement: ZeroStep ai() calls for natural-language assertions that survive UI redesigns
Command: npm run test:e2e (playwright test)

4.3 Visual Regression

Playwright’s built-in expect(page).toHaveScreenshot() on every target viewport
Threshold: maxDiffPixelRatio: 0.001 (0.1%)
Mask dynamic content regions (dates, animations) to prevent false positives
Upgrade path: Lost Pixel (7K free screenshots/month) when managing 3+ sites

4.4 Accessibility + SEO Validation

Accessibility: @axe-core/playwright scans on every page, fail on any serious violations
SEO: Lighthouse CI in GitHub Actions — enforce minimum scores (Performance 90+, SEO 95+, Accessibility 95+)
Scripted checks: Assert <title>, <meta description>, <h1>, alt attributes, canonical links exist on every page

4.5 AI-Autonomous Execution (The Loop)

Playwright MCP enables Claude Code to directly control browsers and run tests
Feedback Loop: When tests fail, output pipes to Claude Code for autonomous diagnosis and fix
Protocol:
1. Claude runs npm run test:e2e -- --reporter=json
2. On failure: reads trace, analyzes screenshots, fixes source code
3. Re-runs only failed tests until green
4. Documents fixes in ChangeLog.md
Playwright Agents: Use Planner (strategy), Generator (code), Healer (self-fix) for autonomous test maintenance

4.6 Real Device Validation (Release Gates)

Daily development: Playwright emulation (free, fast, sufficient for iteration)
Pre-release: LambdaTest real device testing on Safari iOS + Chrome Android
Trigger: Only before production deployments, not on every commit

4.7 “Definition of Done” Checklist

npm run build passes (Astro type checking)
npm run test:unit passes (Vitest — logic verification)
npm run test:e2e passes on Chromium, Firefox, and WebKit
Visual regression diffs < 0.1%
Zero axe-core accessibility violations (serious/critical)
Lighthouse SEO ≥ 95, Performance ≥ 90, Accessibility ≥ 95
No console.error or 404 network requests in test traces

4.8 General Verification (Always Apply)

Never accept “done” without evidence
Run deployment/build scripts after every change
Grep across ALL file types for stale references
Diff final file contents to confirm no residual issues
For critical work: have a second AI model review the results

Phase 5: Review and Learn

Ask AI to identify lessons
Continually upgrade LESSONS.md, at project and global level as appropriate, to increase productivity in future

Phase 5: Document and Commit

Update all docs, logs
Commit to Git with descriptive messages

Multi-Tool Strategy

Model Allocation

Model	Use For	Cost
Claude Code (Opus)	Complex architecture, security, business logic	$20/mn
Claude Code (Haiku)	Implementation, routine tasks	Cheaper
Cursor	IDE-integrated dev, multi-file visual editing	$20/mn
Gemini (free tier)	Prototyping, scaffolding, Google integrations	Free
GLM-4.7	Daily coding at ~Claude level	$3-6/mn

Multi-Prompting

Send same prompt to 3 models (Claude, ChatGPT, Gemini)
Synthesize the best insights — cancels out individual weaknesses
For critical decisions: always get a second AI opinion

80/20 Rule

Gemini for 80% daily tasks (scaffolding, integration, deployment)
Claude for 20% that matters most (business logic, security, complex features)

Agentic Development Patterns

Ralph Loop

Generate markdown specs -> run tasks manually once -> refine prompts -> wrap in loop
Each loop: one task -> finish -> commit -> update plans -> exit
Use only where completion criteria are clear and verifiable
Set strict spending limits
Mistakes to avoid: https://medium.com/ai-software-engineer/7-ralph-loop-mistakes-that-are-burning-your-tokens-and-wasting-your-time-b506c384face

Async Development

Describe feature before bed; AI builds overnight; review in morning
Best with local models for cost and privacy

Parallel Agents

Spawn sub-agents per language domain (Python, TS, YAML)
Parent agent coordinates and runs integration tests after all complete
Use tmux for multiple sessions

Verify-Before-Done Protocol

Paste into CC at start of sessions:

After making all changes, before telling me you're done:
1) grep the entire project for any remaining references to old values
2) run the build/deploy script and show output
3) cat each modified file so I can visually confirm
Only then report completion.

Task List Atomic Workflow

Work through TASKS.md one item at a time. For each item:
implement the fix, then immediately verify it works.
Do NOT move to the next item until current one is confirmed working.
Show status update after each item.

Task Management System

Three-Layer Architecture

Layer 1: Cross-Project Dispatcher

File: _Now.md in KB root
Purpose: Lightweight “what am I working on?” view
Format: Simple list pointing to project TASKS.md files
DO NOT put task details here — keep it lightweight

Layer 2: Per-Project Spec

File: TASKS.md in each project root
Purpose: Detailed, version-controlled task breakdown
Format: Markdown checklist with completion criteria
Tool Integration: Consumed by Superpowers, ralph-loop, headless mode
Templates: See TASKS_md_Template.md for three templates

Layer 3: Session Insights

File: _Next_YYYY-MM-DD_ShortDesc.md in KB _WorkingOn/
Purpose: Capture cross-cutting insights, lessons learned
Naming: Date-stamped for chronological sorting
Include source reference: Source: _Now.md -- "Brief task description"

When to Use Each Layer

Scenario	Layer 1 (_Now)	Layer 2 (TASKS.md)	Layer 3 (_Next)
Quick KB edit (5-10 min)	✅ Point to task	❌ Skip	✅ Capture insights if useful
Small bug fix (15-30 min)	✅ Point to project	✅ Detail spec	✅ Capture insights
New feature (1-4 hours)	✅ Point to project	✅ Full Superpowers workflow	✅ Document lessons
Overnight run	✅ Note it’s running	✅ Clear completion criteria	✅ Review results next day

Key Frameworks

Superpowers (Recommended - Primary Workflow)

The most mature skills framework for Claude Code (47.7K stars, official Anthropic recognition)
Three-phase workflow: brainstorm → plan → execute
Enforces TDD with two-stage review (spec compliance + code quality)
Prevents Claude from cutting corners or premature “done”
When to use: Any task > 30 minutes, new features, complex work
When to skip: Quick fixes, markdown KB work, trivial tasks
Install: /plugin marketplace add obra/superpowers-marketplace
https://github.com/obra/superpowers

Superpowers Workflow:

/superpowers:brainstorm — Socratic requirements refinement
/superpowers:write-plan — Generate granular TASKS.md with TDD steps
/superpowers:execute-plan — Fresh subagent per task, verified before next

Ralph Loop (Overnight Autonomous Execution)

Official Anthropic plugin for autonomous iteration
Re-feeds original prompt after each completion, sees git history
Best for: Batch operations, large refactors, well-defined tasks
Usage: Headless mode with --max-iterations N
YC hackathon teams shipped 6+ repos overnight (~$297 API costs)
Install: /plugin install ralph-loop@claude-plugins-official
https://github.com/anthropics/claude-code/tree/main/plugins/ralph-loop

Headless Command:

claude -p "Read TASKS.md and complete all items" \
  --allowedTools "Edit,Read,Bash,Write,Glob,Grep" \
  --max-iterations 50

GSD (Get Shit Done)

Spec-driven framework with six-step cycle per milestone
Spawns parallel research agents, creates atomic task plans
Each task runs in fresh 200K-token sub-agent context
Atomic git commits per task (surgical rollbacks)
Strengths: Solves context rot, good for solopreneurs
Weaknesses: Token consumption can spike, CC update fragility
Fit: Consider alongside Superpowers for daytime planning + overnight execution
https://github.com/glittercowboy/get-shit-done

TASKS.md Pattern (Always Adopt)

Single markdown file in project root as source of truth
Version-controlled, human-readable, survives context resets
Works with ANY AI tool (CC, Cursor, Gemini, Codex)
Complements all frameworks above (they generate or consume it)
Universal adoption: Use for every project, regardless of framework choice