MacuseMacuse
Toolboxes

UI Automation

The UI Automation toolboxes (UI Viewer and UI Controller) provide control over any macOS application using the Accessibility API. This enables AI assistants to interact with apps that don't have dedicated integrations.

Requirements

  • Permission: Accessibility
  • Permission: Screen Recording (required for capture_snapshot screenshots)

Ref ID System

Elements are assigned role-based short IDs for stable targeting. The ID uses the first letter of the element's role followed by a sequential counter:

PrefixRole
BButton
TText field, text area, search field
LLink
SSlider
CCheckbox
RRadio button
MMenu item
PPopup button, combo box
IImage
XStatic text
VList, table, outline
GTab group
DDisclosure triangle

For example, B1 is the first button, T2 is the second text field, L1 is the first link.

Call ui_viewer_capture_snapshot or ui_viewer_get_ui_tree to get a refs map, then use these IDs with any UI controller tool via the ref_id parameter.

When multiple elements share the same role and name, a nth field disambiguates them.

This approach is inspired by agent-browser from Vercel Labs.

UI Viewer Tools

Inspect application interfaces without making changes.

ToolDescription
ui_viewer_capture_snapshotCapture a window screenshot with element detection and role-based ref IDs
ui_viewer_list_appsList running macOS applications
ui_viewer_get_frontmostGet frontmost app and active window
ui_viewer_get_ui_treeGet accessibility UI tree with stable refs
ui_viewer_get_visible_textExtract visible text from app UI
ui_viewer_find_elementsFind elements matching XPath query

capture_snapshot

The ui_viewer_capture_snapshot tool captures a window screenshot and detects UI elements, returning role-based short IDs (B1, T2, L3...) for use with UI controller tools.

Key options:

  • annotate — Draw element bounding boxes and ID labels on the screenshot
  • interactive_only — Only include interactive elements in refs (default: true)
  • include_menu_bar — Include menu bar items in results (default: false)
  • include_screenshot — Set to false for faster element-only detection without screenshot overhead

UI Controller Tools

Interact with applications by clicking, typing, and more.

ToolDescription
ui_controller_manage_appManage application lifecycle: open (launch) or close (quit)
ui_controller_clickClick by ref_id, xpath, fuzzy text query, or screen coordinates. Supports single, double, and right click.
ui_controller_type_textType text into input fields
ui_controller_press_keyPress keyboard keys or shortcuts
ui_controller_select_menuSelect menu items
ui_controller_scrollScroll within an app — target a scrollable element or scroll at the active window center
ui_controller_dragDrag from one element or coordinate to another, with modifier key support
ui_controller_manage_windowList, close, minimize, restore, fullscreen, focus, move, or resize windows
ui_controller_file_dialogDrive macOS Open/Save file dialogs — navigate folders, set filename, confirm or cancel
ui_controller_dockInteract with the macOS Dock — list items, launch apps, or right-click for context menus

Example Usage

  • "Take a screenshot of Safari and show me all interactive elements"
  • "Click the Submit button in Safari"
  • "Click the element labeled 'Save' in Figma"
  • "Type 'hello world' in the search field"
  • "Scroll down in the document"
  • "Drag the file icon to the trash"
  • "Press Command+S to save the document"
  • "Select Copy from the Edit menu"
  • "List all windows in VS Code, then minimize the second one"
  • "Save the file as report.pdf in my Documents folder"
  • "Launch Slack from the Dock"
  • "Find all buttons in the current app"

Ready-to-use prompts for UI automation:

Browse all UI automation workflows →

On this page