testing

Testing Rules

Philosophy: Diagnose, Don’t Brute-Force

When tests fail, the goal is NEVER to tweak code blindly until green. Every failure is one of:

A BUG in the code — code doesn’t do what it should
A GAP in the code — code is missing something the test expects
A BUG in the test — test itself is wrong or outdated
An ENVIRONMENTAL issue — missing env var, wrong config, external dependency

Diagnose which one it is, then fix the ROOT CAUSE.

7-Step Diagnostic Process

READ the failing test. Understand what it expects and why. Read the ENTIRE test file.
READ the code under test line by line. Trace actual execution path for the failing case.
READ imports and dependencies. Check shared state, utilities, side effects.
DIAGNOSE the category. Is it a code bug, code gap, test bug, or environmental?
VERIFY framework behavior. If unclear, search official docs for the test framework.
APPLY the root-cause fix. Fix the actual problem, not the symptom.
RE-RUN and verify. Run the specific test, then the FULL suite to catch regressions.

Maximum 3 diagnostic cycles per failure. If still stuck after 3 attempts, escalate to user with findings.

Test Requirements by Task Type

Task Type	Test Requirement
Bug fix	Regression test proving the bug is fixed
New feature	Unit tests for core logic + integration for API/UI
Refactor	Existing tests must still pass (no behavior change)
API endpoint	Request/response validation, error cases, auth checks
Schema/content change	Build validation passes

Rules

Every new feature or bugfix requires tests where applicable.
Run the full test suite before committing. Fix all failures.
NEVER skip, disable, or delete tests to make a commit pass.
NEVER use test.skip() or test.todo() without a tracked TODO explaining why.
Test edge cases: empty inputs, null/undefined, boundary values, error paths.
Snapshot tests are a last resort. Prefer explicit assertions.
Mock external services in unit tests.
Use describe blocks to group related tests logically.

Anti-Patterns

Anti-Pattern	Why It’s Wrong	Do Instead
Changing code randomly until tests pass	Hides real bug, creates new ones	Follow 7-step diagnostic process
Deleting a failing test	Removes safety net	Fix the root cause
Adding `// @ts-ignore` to pass type tests	Masks type errors	Fix the type issue
Testing implementation details	Breaks on refactor	Test behavior and outcomes
No assertions in test	False confidence	Every test must assert something

Visual Testing (Frontend Only)

Activates when the project has a frontend stack — detected from stack.json, package.json deps (react, next, astro, svelte, vue), or file patterns (src/components/, *.tsx, *.jsx).

Pure backend/CLI projects: Skip this section entirely.

Expanded TDD Cycle

When frontend work is detected, the TDD cycle becomes:

RED → GREEN → VISUAL → REFACTOR

RED — failing functional test (behavior, not appearance)
GREEN — implementation passes functional test
VISUAL — capture/verify visual baseline
- Playwright toHaveScreenshot() for automated regression
- Cross-browser: Chromium + Firefox + WebKit (minimum)
- Viewports: mobile (375px), tablet (768px), desktop (1280px)
- Deterministic: animations: 'disabled', fonts loaded, time frozen
- mask option for dynamic elements (timestamps, avatars, ads)
REFACTOR — clean up with both functional + visual tests as safety net

Visual Approval Workflow

Scenario	Action
Intentional visual change	`npx playwright test --update-snapshots`
Unintentional visual diff	Treat as RED — it’s a regression, fix it
New component (no baseline)	First run creates baseline, commit snapshots

Local vs CI

Context	Scope
Local development	Selective — only changed components’ visual tests
CI pipeline	Full visual suite across all browsers + viewports
Pre-commit	Affected visual tests only
Pre-push	Full visual suite

Interactive Verification

When claude --chrome is available:

Use for live design verification during development
Complementary to automated tests, not a replacement
Good for subjective quality checks automated tests can’t catch

Deterministic Rendering Checklist

Before capturing screenshots:

animations: 'disabled' in Playwright config
Fonts loaded (page.waitForLoadState('networkidle') or font-face check)
Time frozen (page.clock.setFixedTime() for timestamps)
Dynamic content masked (mask: [page.locator('.avatar')])
Viewport set explicitly (page.setViewportSize())
Color scheme set (page.emulateMedia({ colorScheme: 'light' }))

Integration

Skill	How it uses this rule
`test-driven-development`	Adds VISUAL step after GREEN when frontend detected
`verification-before-completion`	Adds visual regression check to completion gate
`playwright`	Viewport presets + cross-browser config templates