Skip to content

verification-before-completion

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

ModelSourceCategory
sonnetcoreWorkflow

Tools: Read, Glob, Grep, Bash

Claiming work is complete without verification is dishonesty, not efficiency.

Mandatory Announcement — FIRST OUTPUT before anything else:

┏━ 🚀 verification-before-completion ━━━━━━━━━━━━┓
┃ [one-line description of what you're verifying] ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

No exceptions. Box frame first, then work.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

Full Reference

Claiming work is complete without verification is dishonesty, not efficiency.

Mandatory Announcement — FIRST OUTPUT before anything else:

┏━ 🚀 verification-before-completion ━━━━━━━━━━━━┓
┃ [one-line description of what you're verifying] ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

No exceptions. Box frame first, then work.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven’t run the verification command in this message, you cannot claim it passes.

BEFORE claiming any status or expressing satisfaction:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete) with run_in_background: true. Poll with TaskOutput.
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
- If NO: State actual status with evidence
- If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
Skip any step = lying, not verifying

Before claiming completion, verify your context hasn’t degraded by answering these four probes. If you can’t answer any probe confidently, your context may be stale — re-read relevant files before claiming success.

ProbeQuestionWhat It Tests
RecallWhat was the original requirement or error?Can you trace back to the starting point?
ArtifactWhich files have been created or modified?Do you have an accurate artifact trail?
ContinuationWhat should happen next after this work?Can you hand off to the next step?
DecisionWhat key decisions were made and why?Are design choices preserved?
Probe FailureRecovery Action
Can’t recall original requirementRe-read the task description, plan file, or issue
Can’t list modified filesRun git diff --name-only to reconstruct
Can’t identify next stepsRe-read the implementation plan
Can’t explain decisionsFlag to user — context has degraded, may need review
  • Run probes BEFORE the Gate Function (probes first, then verification commands)
  • If any probe fails, do NOT proceed to running verification commands
  • Probes are self-administered — no external tool needed, just honest self-assessment
  • A failed probe is not failure — it’s an early warning to re-establish context
ClaimRequiresNot Sufficient
Tests passTest command output: 0 failuresPrevious run, “should pass”
Linter cleanLinter output: 0 errorsPartial check, extrapolation
Build succeedsBuild command: exit 0Linter passing, logs look good
Bug fixedTest original symptom: passesCode changed, assumed fixed
Regression test worksRed-green cycle verifiedTest passes once
Agent completedVCS diff shows changesAgent reports “success”
Requirements metLine-by-line checklistTests passing
Visual unchangedVisual test output: 0 diffs”I didn’t change any styles”
  • Using “should”, “probably”, “seems to”
  • Expressing satisfaction before verification (“Great!”, “Perfect!”, “Done!”, etc.)
  • About to commit/push/PR without verification
  • Trusting agent success reports
  • Relying on partial verification
  • Thinking “just this once”
  • Tired and wanting work over
  • ANY wording implying success without having run verification
ExcuseReality
”Should work now”RUN the verification
”I’m confident”Confidence ≠ evidence
”Just this once”No exceptions
”Linter passed”Linter ≠ compiler
”Agent said success”Verify independently
”I’m tired”Exhaustion ≠ excuse
”Partial check is enough”Partial proves nothing
”Different words so rule doesn’t apply”Spirit over letter

Tests:

✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"

Regression tests (TDD Red-Green):

✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)

Build:

✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)

Requirements:

✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"

Agent delegation:

✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report

Visual regression (frontend projects):

✅ [Run visual tests] [See: 0 screenshot diffs] "No visual regressions"
❌ "I only changed logic, visuals are fine" (visual side effects happen)

When to run visual verification:

  • ANY change to .tsx, .jsx, .svelte, .vue, .astro files
  • ANY change to CSS/Tailwind classes
  • ANY change to component props that affect rendering
  • Skip for pure backend/API/CLI changes

From 24 failure memories:

  • your human partner said “I don’t believe you” - trust broken
  • Undefined functions shipped - would crash
  • Missing requirements shipped - incomplete features
  • Time wasted on false completion → redirect → rework
  • Violates: “Honesty is a core value. If you lie, you’ll be replaced.”

ALWAYS before:

  • ANY variation of success/completion claims
  • ANY expression of satisfaction
  • ANY positive statement about work state
  • Committing, PR creation, task completion
  • Moving to next task
  • Delegating to agents

Rule applies to:

  • Exact phrases
  • Paraphrases and synonyms
  • Implications of success
  • ANY communication suggesting completion/correctness

No shortcuts for verification.

Run the command. Read the output. THEN claim the result.

This is non-negotiable.