Skip to content

Master workflow for AI-assisted development. Planning first, verification always, multi-tool when valuable.

  1. Plan first, code later — clear roadmap prevents scope creep
  2. Verify before declaring done — run thorough tests, grep for stale refs, check outputs
  3. One task at a time — fix -> verify -> next (not batch-and-pray)
  4. Short, focused sessions — ~40 messages or one feature per session
  5. Commit after each confirmed step — maintain clean history
  6. Review for lessons — at end of each task, capture what should improve
  • Start with a clear PRD (Product Requirements Doc)
    • Include sequence diagrams for user flows and data flow
  • Ask AI for suggestions on the best way to proceed
  • Ask AI to improve your project outline
  • Ask: “Do you completely understand? What could be clarified?”
  • Use structured docs before opening the IDE
  • Enter Plan Mode: Shift+Tab twice
  • Review proposed architecture before approving build
  • For complex work, use extended thinking: “think hard” or “ultrathink”
  • Break into atomic tasks with clear completion criteria
  • Work through task list one item at a time
  • For each item: implement -> verify -> next
  • Use Haiku 4.5 for implementation (cheaper), Opus for review
  • Always maintain ChangeLog.md: have AI write its changes there

Phase 4: Verification/Testing (Autonomous AI-Driven)

Section titled “Phase 4: Verification/Testing (Autonomous AI-Driven)”

Objective: World-class reliability through autonomous AI testing, visual regression, and cross-platform verification.

4.1 Unit + Component Testing

  • Vitest for business logic and Astro component testing
  • Use Astro Container API for rendering components in isolation
  • @testing-library/dom for DOM assertions
  • Command: npm run test:unit (vitest)

4.2 E2E + Cross-Browser Testing

  • Playwright for full browser automation across Chromium, Firefox, WebKit
  • Configure playwright.config.ts projects: Desktop Chrome, Desktop Firefox, Desktop Safari (WebKit), Mobile Chrome (Pixel 7), Mobile Safari (iPhone 15)
  • Use aria-label and text-based locators (not brittle CSS selectors)
  • AI Enhancement: ZeroStep ai() calls for natural-language assertions that survive UI redesigns
  • Command: npm run test:e2e (playwright test)

4.3 Visual Regression

  • Playwright’s built-in expect(page).toHaveScreenshot() on every target viewport
  • Threshold: maxDiffPixelRatio: 0.001 (0.1%)
  • Mask dynamic content regions (dates, animations) to prevent false positives
  • Upgrade path: Lost Pixel (7K free screenshots/month) when managing 3+ sites

4.4 Accessibility + SEO Validation

  • Accessibility: @axe-core/playwright scans on every page, fail on any serious violations
  • SEO: Lighthouse CI in GitHub Actions — enforce minimum scores (Performance 90+, SEO 95+, Accessibility 95+)
  • Scripted checks: Assert <title>, <meta description>, <h1>, alt attributes, canonical links exist on every page

4.5 AI-Autonomous Execution (The Loop)

  • Playwright MCP enables Claude Code to directly control browsers and run tests
  • Feedback Loop: When tests fail, output pipes to Claude Code for autonomous diagnosis and fix
  • Protocol:
    1. Claude runs npm run test:e2e -- --reporter=json
    2. On failure: reads trace, analyzes screenshots, fixes source code
    3. Re-runs only failed tests until green
    4. Documents fixes in ChangeLog.md
  • Playwright Agents: Use Planner (strategy), Generator (code), Healer (self-fix) for autonomous test maintenance

4.6 Real Device Validation (Release Gates)

  • Daily development: Playwright emulation (free, fast, sufficient for iteration)
  • Pre-release: LambdaTest real device testing on Safari iOS + Chrome Android
  • Trigger: Only before production deployments, not on every commit

4.7 “Definition of Done” Checklist

  • npm run build passes (Astro type checking)
  • npm run test:unit passes (Vitest — logic verification)
  • npm run test:e2e passes on Chromium, Firefox, and WebKit
  • Visual regression diffs < 0.1%
  • Zero axe-core accessibility violations (serious/critical)
  • Lighthouse SEO ≥ 95, Performance ≥ 90, Accessibility ≥ 95
  • No console.error or 404 network requests in test traces

4.8 General Verification (Always Apply)

  • Never accept “done” without evidence
  • Run deployment/build scripts after every change
  • Grep across ALL file types for stale references
  • Diff final file contents to confirm no residual issues
  • For critical work: have a second AI model review the results
  • Ask AI to identify lessons
  • Continually upgrade LESSONS.md, at project and global level as appropriate, to increase productivity in future
  • Update all docs, logs
  • Commit to Git with descriptive messages

ModelUse ForCost
Claude Code (Opus)Complex architecture, security, business logic$20/mn
Claude Code (Haiku)Implementation, routine tasksCheaper
CursorIDE-integrated dev, multi-file visual editing$20/mn
Gemini (free tier)Prototyping, scaffolding, Google integrationsFree
GLM-4.7Daily coding at ~Claude level$3-6/mn
  • Send same prompt to 3 models (Claude, ChatGPT, Gemini)
  • Synthesize the best insights — cancels out individual weaknesses
  • For critical decisions: always get a second AI opinion
  • Gemini for 80% daily tasks (scaffolding, integration, deployment)
  • Claude for 20% that matters most (business logic, security, complex features)
  • Describe feature before bed; AI builds overnight; review in morning
  • Best with local models for cost and privacy
  • Spawn sub-agents per language domain (Python, TS, YAML)
  • Parent agent coordinates and runs integration tests after all complete
  • Use tmux for multiple sessions

Paste into CC at start of sessions:

After making all changes, before telling me you're done:
1) grep the entire project for any remaining references to old values
2) run the build/deploy script and show output
3) cat each modified file so I can visually confirm
Only then report completion.
Work through TASKS.md one item at a time. For each item:
implement the fix, then immediately verify it works.
Do NOT move to the next item until current one is confirmed working.
Show status update after each item.

Layer 1: Cross-Project Dispatcher

  • File: _Now.md in KB root
  • Purpose: Lightweight “what am I working on?” view
  • Format: Simple list pointing to project TASKS.md files
  • DO NOT put task details here — keep it lightweight

Layer 2: Per-Project Spec

  • File: TASKS.md in each project root
  • Purpose: Detailed, version-controlled task breakdown
  • Format: Markdown checklist with completion criteria
  • Tool Integration: Consumed by Superpowers, ralph-loop, headless mode
  • Templates: See TASKS_md_Template.md for three templates

Layer 3: Session Insights

  • File: _Next_YYYY-MM-DD_ShortDesc.md in KB _WorkingOn/
  • Purpose: Capture cross-cutting insights, lessons learned
  • Naming: Date-stamped for chronological sorting
  • Include source reference: Source: _Now.md -- "Brief task description"
ScenarioLayer 1 (_Now)Layer 2 (TASKS.md)Layer 3 (_Next)
Quick KB edit (5-10 min)✅ Point to task❌ Skip✅ Capture insights if useful
Small bug fix (15-30 min)✅ Point to project✅ Detail spec✅ Capture insights
New feature (1-4 hours)✅ Point to project✅ Full Superpowers workflow✅ Document lessons
Overnight run✅ Note it’s running✅ Clear completion criteria✅ Review results next day
Section titled “Superpowers (Recommended - Primary Workflow)”
  • The most mature skills framework for Claude Code (47.7K stars, official Anthropic recognition)
  • Three-phase workflow: brainstorm → plan → execute
  • Enforces TDD with two-stage review (spec compliance + code quality)
  • Prevents Claude from cutting corners or premature “done”
  • When to use: Any task > 30 minutes, new features, complex work
  • When to skip: Quick fixes, markdown KB work, trivial tasks
  • Install: /plugin marketplace add obra/superpowers-marketplace
  • https://github.com/obra/superpowers

Superpowers Workflow:

  1. /superpowers:brainstorm — Socratic requirements refinement
  2. /superpowers:write-plan — Generate granular TASKS.md with TDD steps
  3. /superpowers:execute-plan — Fresh subagent per task, verified before next

Ralph Loop (Overnight Autonomous Execution)

Section titled “Ralph Loop (Overnight Autonomous Execution)”
  • Official Anthropic plugin for autonomous iteration
  • Re-feeds original prompt after each completion, sees git history
  • Best for: Batch operations, large refactors, well-defined tasks
  • Usage: Headless mode with --max-iterations N
  • YC hackathon teams shipped 6+ repos overnight (~$297 API costs)
  • Install: /plugin install ralph-loop@claude-plugins-official
  • https://github.com/anthropics/claude-code/tree/main/plugins/ralph-loop

Headless Command:

Terminal window
claude -p "Read TASKS.md and complete all items" \
--allowedTools "Edit,Read,Bash,Write,Glob,Grep" \
--max-iterations 50
  • Spec-driven framework with six-step cycle per milestone
  • Spawns parallel research agents, creates atomic task plans
  • Each task runs in fresh 200K-token sub-agent context
  • Atomic git commits per task (surgical rollbacks)
  • Strengths: Solves context rot, good for solopreneurs
  • Weaknesses: Token consumption can spike, CC update fragility
  • Fit: Consider alongside Superpowers for daytime planning + overnight execution
  • https://github.com/glittercowboy/get-shit-done
  • Single markdown file in project root as source of truth
  • Version-controlled, human-readable, survives context resets
  • Works with ANY AI tool (CC, Cursor, Gemini, Codex)
  • Complements all frameworks above (they generate or consume it)
  • Universal adoption: Use for every project, regardless of framework choice