AI Dev Workflow
Section titled “AI Dev Workflow”Master workflow for AI-assisted development. Planning first, verification always, multi-tool when valuable.
Core Principles
Section titled “Core Principles”- Plan first, code later — clear roadmap prevents scope creep
- Verify before declaring done — run thorough tests, grep for stale refs, check outputs
- One task at a time — fix -> verify -> next (not batch-and-pray)
- Short, focused sessions — ~40 messages or one feature per session
- Commit after each confirmed step — maintain clean history
- Review for lessons — at end of each task, capture what should improve
Workflow Phases
Section titled “Workflow Phases”Phase 1: Preparation
Section titled “Phase 1: Preparation”- Start with a clear PRD (Product Requirements Doc)
- Include sequence diagrams for user flows and data flow
- Ask AI for suggestions on the best way to proceed
- Ask AI to improve your project outline
- Ask: “Do you completely understand? What could be clarified?”
- Use structured docs before opening the IDE
Phase 2: Planning (CC Plan Mode)
Section titled “Phase 2: Planning (CC Plan Mode)”- Enter Plan Mode: Shift+Tab twice
- Review proposed architecture before approving build
- For complex work, use extended thinking: “think hard” or “ultrathink”
- Break into atomic tasks with clear completion criteria
Phase 3: Implementation
Section titled “Phase 3: Implementation”- Work through task list one item at a time
- For each item: implement -> verify -> next
- Use Haiku 4.5 for implementation (cheaper), Opus for review
- Always maintain ChangeLog.md: have AI write its changes there
Phase 4: Verification/Testing (Autonomous AI-Driven)
Section titled “Phase 4: Verification/Testing (Autonomous AI-Driven)”Objective: World-class reliability through autonomous AI testing, visual regression, and cross-platform verification.
4.1 Unit + Component Testing
- Vitest for business logic and Astro component testing
- Use Astro Container API for rendering components in isolation
@testing-library/domfor DOM assertions- Command:
npm run test:unit(vitest)
4.2 E2E + Cross-Browser Testing
- Playwright for full browser automation across Chromium, Firefox, WebKit
- Configure
playwright.config.tsprojects: Desktop Chrome, Desktop Firefox, Desktop Safari (WebKit), Mobile Chrome (Pixel 7), Mobile Safari (iPhone 15) - Use
aria-labeland text-based locators (not brittle CSS selectors) - AI Enhancement: ZeroStep
ai()calls for natural-language assertions that survive UI redesigns - Command:
npm run test:e2e(playwright test)
4.3 Visual Regression
- Playwright’s built-in
expect(page).toHaveScreenshot()on every target viewport - Threshold:
maxDiffPixelRatio: 0.001(0.1%) - Mask dynamic content regions (dates, animations) to prevent false positives
- Upgrade path: Lost Pixel (7K free screenshots/month) when managing 3+ sites
4.4 Accessibility + SEO Validation
- Accessibility:
@axe-core/playwrightscans on every page, fail on any serious violations - SEO: Lighthouse CI in GitHub Actions — enforce minimum scores (Performance 90+, SEO 95+, Accessibility 95+)
- Scripted checks: Assert
<title>,<meta description>,<h1>,altattributes, canonical links exist on every page
4.5 AI-Autonomous Execution (The Loop)
- Playwright MCP enables Claude Code to directly control browsers and run tests
- Feedback Loop: When tests fail, output pipes to Claude Code for autonomous diagnosis and fix
- Protocol:
- Claude runs
npm run test:e2e -- --reporter=json - On failure: reads trace, analyzes screenshots, fixes source code
- Re-runs only failed tests until green
- Documents fixes in ChangeLog.md
- Claude runs
- Playwright Agents: Use Planner (strategy), Generator (code), Healer (self-fix) for autonomous test maintenance
4.6 Real Device Validation (Release Gates)
- Daily development: Playwright emulation (free, fast, sufficient for iteration)
- Pre-release: LambdaTest real device testing on Safari iOS + Chrome Android
- Trigger: Only before production deployments, not on every commit
4.7 “Definition of Done” Checklist
-
npm run buildpasses (Astro type checking) -
npm run test:unitpasses (Vitest — logic verification) -
npm run test:e2epasses on Chromium, Firefox, and WebKit - Visual regression diffs < 0.1%
- Zero axe-core accessibility violations (serious/critical)
- Lighthouse SEO ≥ 95, Performance ≥ 90, Accessibility ≥ 95
- No
console.erroror 404 network requests in test traces
4.8 General Verification (Always Apply)
- Never accept “done” without evidence
- Run deployment/build scripts after every change
- Grep across ALL file types for stale references
- Diff final file contents to confirm no residual issues
- For critical work: have a second AI model review the results
Phase 5: Review and Learn
Section titled “Phase 5: Review and Learn”- Ask AI to identify lessons
- Continually upgrade LESSONS.md, at project and global level as appropriate, to increase productivity in future
Phase 5: Document and Commit
Section titled “Phase 5: Document and Commit”- Update all docs, logs
- Commit to Git with descriptive messages
Multi-Tool Strategy
Section titled “Multi-Tool Strategy”Model Allocation
Section titled “Model Allocation”| Model | Use For | Cost |
|---|---|---|
| Claude Code (Opus) | Complex architecture, security, business logic | $20/mn |
| Claude Code (Haiku) | Implementation, routine tasks | Cheaper |
| Cursor | IDE-integrated dev, multi-file visual editing | $20/mn |
| Gemini (free tier) | Prototyping, scaffolding, Google integrations | Free |
| GLM-4.7 | Daily coding at ~Claude level | $3-6/mn |
Multi-Prompting
Section titled “Multi-Prompting”- Send same prompt to 3 models (Claude, ChatGPT, Gemini)
- Synthesize the best insights — cancels out individual weaknesses
- For critical decisions: always get a second AI opinion
80/20 Rule
Section titled “80/20 Rule”- Gemini for 80% daily tasks (scaffolding, integration, deployment)
- Claude for 20% that matters most (business logic, security, complex features)
Agentic Development Patterns
Section titled “Agentic Development Patterns”Ralph Loop
Section titled “Ralph Loop”- Generate markdown specs -> run tasks manually once -> refine prompts -> wrap in loop
- Each loop: one task -> finish -> commit -> update plans -> exit
- Use only where completion criteria are clear and verifiable
- Set strict spending limits
- Mistakes to avoid: https://medium.com/ai-software-engineer/7-ralph-loop-mistakes-that-are-burning-your-tokens-and-wasting-your-time-b506c384face
Async Development
Section titled “Async Development”- Describe feature before bed; AI builds overnight; review in morning
- Best with local models for cost and privacy
Parallel Agents
Section titled “Parallel Agents”- Spawn sub-agents per language domain (Python, TS, YAML)
- Parent agent coordinates and runs integration tests after all complete
- Use tmux for multiple sessions
Verify-Before-Done Protocol
Section titled “Verify-Before-Done Protocol”Paste into CC at start of sessions:
After making all changes, before telling me you're done:1) grep the entire project for any remaining references to old values2) run the build/deploy script and show output3) cat each modified file so I can visually confirmOnly then report completion.Task List Atomic Workflow
Section titled “Task List Atomic Workflow”Work through TASKS.md one item at a time. For each item:implement the fix, then immediately verify it works.Do NOT move to the next item until current one is confirmed working.Show status update after each item.Task Management System
Section titled “Task Management System”Three-Layer Architecture
Section titled “Three-Layer Architecture”Layer 1: Cross-Project Dispatcher
- File:
_Now.mdin KB root - Purpose: Lightweight “what am I working on?” view
- Format: Simple list pointing to project TASKS.md files
- DO NOT put task details here — keep it lightweight
Layer 2: Per-Project Spec
- File:
TASKS.mdin each project root - Purpose: Detailed, version-controlled task breakdown
- Format: Markdown checklist with completion criteria
- Tool Integration: Consumed by Superpowers, ralph-loop, headless mode
- Templates: See
TASKS_md_Template.mdfor three templates
Layer 3: Session Insights
- File:
_Next_YYYY-MM-DD_ShortDesc.mdin KB _WorkingOn/ - Purpose: Capture cross-cutting insights, lessons learned
- Naming: Date-stamped for chronological sorting
- Include source reference:
Source: _Now.md -- "Brief task description"
When to Use Each Layer
Section titled “When to Use Each Layer”| Scenario | Layer 1 (_Now) | Layer 2 (TASKS.md) | Layer 3 (_Next) |
|---|---|---|---|
| Quick KB edit (5-10 min) | ✅ Point to task | ❌ Skip | ✅ Capture insights if useful |
| Small bug fix (15-30 min) | ✅ Point to project | ✅ Detail spec | ✅ Capture insights |
| New feature (1-4 hours) | ✅ Point to project | ✅ Full Superpowers workflow | ✅ Document lessons |
| Overnight run | ✅ Note it’s running | ✅ Clear completion criteria | ✅ Review results next day |
Key Frameworks
Section titled “Key Frameworks”Superpowers (Recommended - Primary Workflow)
Section titled “Superpowers (Recommended - Primary Workflow)”- The most mature skills framework for Claude Code (47.7K stars, official Anthropic recognition)
- Three-phase workflow: brainstorm → plan → execute
- Enforces TDD with two-stage review (spec compliance + code quality)
- Prevents Claude from cutting corners or premature “done”
- When to use: Any task > 30 minutes, new features, complex work
- When to skip: Quick fixes, markdown KB work, trivial tasks
- Install:
/plugin marketplace add obra/superpowers-marketplace - https://github.com/obra/superpowers
Superpowers Workflow:
/superpowers:brainstorm— Socratic requirements refinement/superpowers:write-plan— Generate granular TASKS.md with TDD steps/superpowers:execute-plan— Fresh subagent per task, verified before next
Ralph Loop (Overnight Autonomous Execution)
Section titled “Ralph Loop (Overnight Autonomous Execution)”- Official Anthropic plugin for autonomous iteration
- Re-feeds original prompt after each completion, sees git history
- Best for: Batch operations, large refactors, well-defined tasks
- Usage: Headless mode with
--max-iterations N - YC hackathon teams shipped 6+ repos overnight (~$297 API costs)
- Install:
/plugin install ralph-loop@claude-plugins-official - https://github.com/anthropics/claude-code/tree/main/plugins/ralph-loop
Headless Command:
claude -p "Read TASKS.md and complete all items" \ --allowedTools "Edit,Read,Bash,Write,Glob,Grep" \ --max-iterations 50GSD (Get Shit Done)
Section titled “GSD (Get Shit Done)”- Spec-driven framework with six-step cycle per milestone
- Spawns parallel research agents, creates atomic task plans
- Each task runs in fresh 200K-token sub-agent context
- Atomic git commits per task (surgical rollbacks)
- Strengths: Solves context rot, good for solopreneurs
- Weaknesses: Token consumption can spike, CC update fragility
- Fit: Consider alongside Superpowers for daytime planning + overnight execution
- https://github.com/glittercowboy/get-shit-done
TASKS.md Pattern (Always Adopt)
Section titled “TASKS.md Pattern (Always Adopt)”- Single markdown file in project root as source of truth
- Version-controlled, human-readable, survives context resets
- Works with ANY AI tool (CC, Cursor, Gemini, Codex)
- Complements all frameworks above (they generate or consume it)
- Universal adoption: Use for every project, regardless of framework choice