observation-masking
Token-efficient tool output management — decides when to keep tool output in context vs offload to files, using write-to-file + return-reference patterns. Use when tool outputs are large, when context is filling up, or when building token-efficient agent workflows.
| Model | Source |
|---|---|
| sonnet | pack: context-engineering |
Full Reference
Observation Masking
Section titled “Observation Masking”Tool outputs are context. Every byte of stdout, every JSON blob, every file read that lands in the assistant turn costs tokens — and stays in context for every future turn. Observation masking is the discipline of deciding what deserves context residency and what gets offloaded.
Two patterns cover 95% of cases: write-to-file + return-reference and selective extraction. The decision between them — and whether to mask at all — lives in reference/decision-matrix.md.
Quick Reference
Section titled “Quick Reference”| Item | Value |
|---|---|
| Core question | Does this output need to be in the window for future turns? |
| Primary pattern | Write full output to file → return file path as reference |
| Secondary pattern | Extract only the needed fields → discard the rest |
| Trigger threshold | >500 tokens of tool output that won’t be re-referenced |
| Context budget signal | >60% context used → apply masking aggressively |
Reference Index
Section titled “Reference Index”| I want to… | File |
|---|---|
| See write-to-file, selective extraction, and summary-in-context patterns with code | reference/patterns.md |
| Decide whether to mask, extract, or keep based on output size and reuse | reference/decision-matrix.md |
Usage: Read the reference file matching your current task from the index above. Each file is self-contained with inline examples and gotchas.
Core Mental Model
Section titled “Core Mental Model”Every tool observation enters the context window as a message. Without masking, large outputs accumulate:
Turn 1: bash output → 2,000 tokens (stays forever)Turn 2: file read → 3,500 tokens (stays forever)Turn 3: API call → 1,800 tokens (stays forever)Turn 4: you're now 7,300 tokens poorer for data you looked at onceWith masking:
Turn 1: bash output → written to .claude/tmp/scan-output.txt → "Saved to .claude/tmp/scan-output.txt (2,847 lines)"Turn 2: file read → extract 3 relevant lines → "Found: PORT=3000, DB_URL=..., NODE_ENV=production"Turn 3: API call → written to .claude/tmp/api-response.json → "200 OK — 47 records. Saved to .claude/tmp/api-response.json"Turn 4: context budget: intactDecision Flow
Section titled “Decision Flow”Tool is about to produce output │ ▼ Is output large? (>500 tokens estimated) ├── No → Keep in context (default) └── Yes → Will I reference specific fields later? ├── Yes, known fields → Selective extraction ├── Yes, unknown scope → Write-to-file + reference └── No → Write-to-file + discard (summary only)Full decision logic with size thresholds and reuse scoring → reference/decision-matrix.md.
Patterns at a Glance
Section titled “Patterns at a Glance”Write-to-file + return-reference — full output preserved on disk, tiny reference in context:
# Instead of reading a 500-line config dump into context:some-tool --verbose > .claude/tmp/tool-output.txt && echo "Saved to .claude/tmp/tool-output.txt"Selective extraction — pull only what matters, discard the rest:
# Instead of keeping a full JSON blob:curl -s api/endpoint | jq '{id: .id, status: .status, error: .error}'Summary-in-context + detail-in-file — human-readable summary stays, raw data offloaded:
npm audit --json > .claude/tmp/audit.json && npm audit 2>&1 | tail -5Full implementations with edge cases → reference/patterns.md.
When to Apply
Section titled “When to Apply”Apply observation masking when:
- A tool output exceeds ~500 tokens and won’t be re-read verbatim
- Context usage is above 60% (aggressive masking mode)
- Building multi-turn agent workflows where context accumulates over many steps
- Scanning/auditing tools that produce exhaustive output (dependency trees, lint reports, audit logs)
- API responses containing nested data where only 2-3 fields matter
Do NOT apply when:
- Output is small (<200 tokens) — masking overhead exceeds benefit
- Output will be referenced multiple times in the same session
- Output needs to be in context for the model to reason over it holistically
- Writing a test/implementation where the full content is the work product
Announcement
Section titled “Announcement”┏━ ⚡ observation-masking ━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ [one-line description of what’s being managed] ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛