Skip to content

macos-agent-builder

Use when building AI-driven macOS automation agents — configuring Peekaboo agent mode, selecting AI models, composing MCP tool chains, or exporting agent configurations for Claude Desktop/Cursor.

ModelSource
sonnetpack: macos-automation
Full Reference

Guide for building AI-powered macOS desktop automation agents. Covers Peekaboo’s built-in agent mode, custom MCP tool chains, model selection, and agent configuration export for Claude Desktop, Cursor, and other MCP clients.

Mandatory Announcement — FIRST OUTPUT before anything else:

┏━ 🧠 macos-agent-builder ━━━━━━━━━━━━━━━━━━━━━━━┓
┃ [one-line description of agent being designed] ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

No exceptions. Box frame first, then work.

The fastest path. Peekaboo’s agent command provides NL → tool chain automation out of the box.

Terminal window
peekaboo agent "Fill out the registration form with test data" --model claude-sonnet-4.5
  • Free-form natural language task description
  • Multi-model support (see Model Selection Guide)
  • Interactive chat mode (auto TTY detection)
  • Session persistence (--list-sessions, --resume)
  • Audio input (--audio, --audio-file, --realtime)

Best for: Quick tasks, interactive exploration, prototyping automations.

Selective tool exposure through MCP server configuration. More control over what the AI can do.

{
"mcpServers": {
"peekaboo": {
"command": "npx",
"args": ["-y", "@steipete/peekaboo", "mcp"],
"env": {
"PEEKABOO_ALLOW_TOOLS": "see,click,type,press,hotkey,list,image"
}
},
"macos-automator": {
"command": "npx",
"args": ["-y", "@steipete/macos-automator-mcp@latest"]
}
}
}

Best for: Production agents, restricted environments, specific tool needs.

Option 3: Hybrid (Peekaboo Agent + Script Tools)

Section titled “Option 3: Hybrid (Peekaboo Agent + Script Tools)”

Combine Peekaboo’s visual capabilities with macos-automator’s script execution.

Configure Peekaboo as MCP server + macos-automator as separate MCP server → AI client has both visual + scripting tools available.

Best for: Complex workflows needing both GUI interaction and scripted data extraction.

ModelSpeedReasoningVisionBest For
gpt-5.1FastGoodYesSimple UI tasks, form filling, navigation
claude-sonnet-4.5MediumExcellentYesComplex multi-step, error recovery, data extraction
gemini-3-flashFastGoodExcellentVision-heavy tasks, screen reading, visual verification
ollama (local)VariesModel-dependentModel-dependentPrivacy-sensitive, offline, no API cost
grok-3FastGoodYesGeneral purpose, experimental

Recommendation: Start with claude-sonnet-4.5 for complex tasks. Switch to gemini-3-flash for vision-heavy tasks. Use gpt-5.1 for speed on simple tasks.

Provider Configuration:

ProviderEnv VarModels
OpenAIOPENAI_API_KEYgpt-5.1, gpt-4o, gpt-4o-mini
AnthropicANTHROPIC_API_KEYclaude-sonnet-4.5, claude-haiku-4.5
xAIGROK_API_KEYgrok-3, grok-3-mini
GoogleGEMINI_API_KEYgemini-3-flash, gemini-3-pro
OllamaPEEKABOO_OLLAMA_BASE_URLAny installed model
Custom~/.peekaboo/credentialsOpenRouter, Groq, Together AI

Default — no filtering. AI has access to everything.

Terminal window
peekaboo agent "task description" --model claude-sonnet-4.5
Section titled “Restricted Set (recommended for production)”

Allow only specific tools:

Terminal window
PEEKABOO_ALLOW_TOOLS="see,click,type,press,hotkey,scroll,list,image" peekaboo agent "task"

Or deny dangerous tools:

Terminal window
PEEKABOO_DISABLE_TOOLS="app quit,clean,config" peekaboo agent "task"

Or via config.json:

{
"tools": {
"allow": ["see", "click", "type", "press", "hotkey", "scroll", "list", "image"],
"deny": []
}
}

Read-only + basic interaction:

PEEKABOO_ALLOW_TOOLS="see,list,image"

GOTCHA [silent-bug]: Tool filtering is ALL-or-NOTHING for subcommands. allow: ["window"] allows ALL window subcommands (close, minimize, maximize, etc.). There’s no per-subcommand filtering.

{
"mcpServers": {
"peekaboo": {
"command": "npx",
"args": ["-y", "@steipete/peekaboo", "mcp"],
"env": {
"PEEKABOO_ALLOW_TOOLS": "see,click,type,press,hotkey,scroll,list,image,window,app,menu"
}
}
}
}

Config location: ~/Library/Application Support/Claude/claude_desktop_config.json

{
"mcpServers": {
"peekaboo": {
"command": "npx",
"args": ["-y", "@steipete/peekaboo", "mcp"]
}
}
}

Config location: ~/.cursor/mcp.json

Combine Peekaboo (GUI) + macos-automator (scripts) + applescript-mcp (raw bridge):

{
"mcpServers": {
"peekaboo-gui": {
"command": "npx",
"args": ["-y", "@steipete/peekaboo", "mcp"],
"env": {
"PEEKABOO_ALLOW_TOOLS": "see,click,type,press,hotkey,scroll,list,image,window,app,menu,dialog"
}
},
"macos-scripts": {
"command": "npx",
"args": ["-y", "@steipete/macos-automator-mcp@latest"]
},
"raw-applescript": {
"command": "npx",
"args": ["@peakmojo/applescript-mcp"]
}
}
}
See → identify form fields → type data into each → click submit → verify success

Model: gpt-5.1 (fast, simple task) Tools: see, click, type, press

See → identify data elements → copy/read values → format output

Model: claude-sonnet-4.5 (data reasoning) Tools: see, list + macos-automator for script-accessible data

Launch app → navigate to feature → perform action → verify result

Model: claude-sonnet-4.5 (multi-step reasoning) Tools: see, click, type, press, hotkey, app, menu, window

Capture → diff against baseline → alert on change → repeat

Model: gemini-3-flash (fast vision) Tools: see, image, capture

Load .peekaboo.json → execute steps → handle errors → report

No AI needed — use peekaboo run script.peekaboo.json

  • Always filter for production agents — don’t expose app quit, clean, config commands
  • Use PEEKABOO_ALLOW_TOOLS allowlist (safer than denylist)
  • Peekaboo --wait-for defaults to 5000ms — increase for slow apps
  • macos-automator timeout_seconds defaults to 60 — increase for long scripts
  • Agent --max-steps caps total actions — set based on task complexity
  • Max 3 retries per action (See→Act→Verify loop)
  • Fail loudly after retry exhaustion — don’t loop forever
  • Log each action for debugging
  • Peekaboo agent interactive chat mode: auto TTY detection
  • /help for available commands during agent session
  • Esc cancels current action
  • Ctrl+D exits agent mode
  • --dry-run shows planned actions without executing
  • Never automate password entry without explicit user consent
  • Financial transactions should always have human confirmation
  • File deletion operations should use --dry-run first
  • System settings changes should be reversible
NeedSkill
Peekaboo agent mode referencepeekaboo (agent.md)
MCP server setuppeekaboo (mcp.md)
Script execution toolsmacos-automator (tools.md)
Multi-step workflow patternsmacos-workflow
Script generationapplescript-forge