observation-masking

Token-efficient tool output management — decides when to keep tool output in context vs offload to files, using write-to-file + return-reference patterns. Use when tool outputs are large, when context is filling up, or when building token-efficient agent workflows.

Model	Source
sonnet	pack: context-engineering

Full Reference

Observation Masking

Tool outputs are context. Every byte of stdout, every JSON blob, every file read that lands in the assistant turn costs tokens — and stays in context for every future turn. Observation masking is the discipline of deciding what deserves context residency and what gets offloaded.

Two patterns cover 95% of cases: write-to-file + return-reference and selective extraction. The decision between them — and whether to mask at all — lives in reference/decision-matrix.md.

Quick Reference

Item	Value
Core question	Does this output need to be in the window for future turns?
Primary pattern	Write full output to file → return file path as reference
Secondary pattern	Extract only the needed fields → discard the rest
Trigger threshold	>500 tokens of tool output that won’t be re-referenced
Context budget signal	>60% context used → apply masking aggressively

Reference Index

I want to…	File
See write-to-file, selective extraction, and summary-in-context patterns with code	`reference/patterns.md`
Decide whether to mask, extract, or keep based on output size and reuse	`reference/decision-matrix.md`

Usage: Read the reference file matching your current task from the index above. Each file is self-contained with inline examples and gotchas.

Core Mental Model

Every tool observation enters the context window as a message. Without masking, large outputs accumulate:

Turn 1: bash output → 2,000 tokens (stays forever)
Turn 2: file read  → 3,500 tokens (stays forever)
Turn 3: API call   → 1,800 tokens (stays forever)
Turn 4: you're now 7,300 tokens poorer for data you looked at once

With masking:

Turn 1: bash output → written to .claude/tmp/scan-output.txt → "Saved to .claude/tmp/scan-output.txt (2,847 lines)"
Turn 2: file read   → extract 3 relevant lines → "Found: PORT=3000, DB_URL=..., NODE_ENV=production"
Turn 3: API call    → written to .claude/tmp/api-response.json → "200 OK — 47 records. Saved to .claude/tmp/api-response.json"
Turn 4: context budget: intact

Decision Flow

Tool is about to produce output
         │
         ▼
    Is output large? (>500 tokens estimated)
    ├── No  → Keep in context (default)
    └── Yes → Will I reference specific fields later?
              ├── Yes, known fields → Selective extraction
              ├── Yes, unknown scope → Write-to-file + reference
              └── No  → Write-to-file + discard (summary only)

Full decision logic with size thresholds and reuse scoring → reference/decision-matrix.md.

Patterns at a Glance

Write-to-file + return-reference — full output preserved on disk, tiny reference in context:

# Instead of reading a 500-line config dump into context:
some-tool --verbose > .claude/tmp/tool-output.txt && echo "Saved to .claude/tmp/tool-output.txt"

Selective extraction — pull only what matters, discard the rest:

# Instead of keeping a full JSON blob:
curl -s api/endpoint | jq '{id: .id, status: .status, error: .error}'

Summary-in-context + detail-in-file — human-readable summary stays, raw data offloaded:

npm audit --json > .claude/tmp/audit.json && npm audit 2>&1 | tail -5

Full implementations with edge cases → reference/patterns.md.

When to Apply

Apply observation masking when:

A tool output exceeds ~500 tokens and won’t be re-read verbatim
Context usage is above 60% (aggressive masking mode)
Building multi-turn agent workflows where context accumulates over many steps
Scanning/auditing tools that produce exhaustive output (dependency trees, lint reports, audit logs)
API responses containing nested data where only 2-3 fields matter

Do NOT apply when:

Output is small (<200 tokens) — masking overhead exceeds benefit
Output will be referenced multiple times in the same session
Output needs to be in context for the model to reason over it holistically
Writing a test/implementation where the full content is the work product

Announcement

┏━ ⚡ observation-masking ━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ [one-line description of what’s being managed] ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛