UI Automation
The UI Automation toolboxes (UI Viewer and UI Controller) provide control over any macOS application using the Accessibility API. This enables AI assistants to interact with apps that don't have dedicated integrations.
Requirements
- Permission: Accessibility
- Permission: Screen Recording (required for
capture_snapshotscreenshots)
Ref ID System
Elements are assigned role-based short IDs for stable targeting. The ID uses the first letter of the element's role followed by a sequential counter:
| Prefix | Role |
|---|---|
B | Button |
T | Text field, text area, search field |
L | Link |
S | Slider |
C | Checkbox |
R | Radio button |
M | Menu item |
P | Popup button, combo box |
I | Image |
X | Static text |
V | List, table, outline |
G | Tab group |
D | Disclosure triangle |
For example, B1 is the first button, T2 is the second text field, L1 is the first link.
Call ui_viewer_capture_snapshot or ui_viewer_get_ui_tree to get a refs map, then use these IDs with any UI controller tool via the ref_id parameter.
When multiple elements share the same role and name, a nth field disambiguates them.
This approach is inspired by agent-browser from Vercel Labs.
UI Viewer Tools
Inspect application interfaces without making changes.
| Tool | Description |
|---|---|
ui_viewer_capture_snapshot | Capture a window screenshot with element detection and role-based ref IDs |
ui_viewer_list_apps | List running macOS applications |
ui_viewer_get_frontmost | Get frontmost app and active window |
ui_viewer_get_ui_tree | Get accessibility UI tree with stable refs |
ui_viewer_get_visible_text | Extract visible text from app UI |
ui_viewer_find_elements | Find elements matching XPath query |
capture_snapshot
The ui_viewer_capture_snapshot tool captures a window screenshot and detects UI elements, returning role-based short IDs (B1, T2, L3...) for use with UI controller tools.
Key options:
annotate— Draw element bounding boxes and ID labels on the screenshotinteractive_only— Only include interactive elements in refs (default: true)include_menu_bar— Include menu bar items in results (default: false)include_screenshot— Set to false for faster element-only detection without screenshot overhead
UI Controller Tools
Interact with applications by clicking, typing, and more.
| Tool | Description |
|---|---|
ui_controller_manage_app | Manage application lifecycle: open (launch) or close (quit) |
ui_controller_click | Click by ref_id, xpath, fuzzy text query, or screen coordinates. Supports single, double, and right click. |
ui_controller_type_text | Type text into input fields |
ui_controller_press_key | Press keyboard keys or shortcuts |
ui_controller_select_menu | Select menu items |
ui_controller_scroll | Scroll within an app — target a scrollable element or scroll at the active window center |
ui_controller_drag | Drag from one element or coordinate to another, with modifier key support |
ui_controller_manage_window | List, close, minimize, restore, fullscreen, focus, move, or resize windows |
ui_controller_file_dialog | Drive macOS Open/Save file dialogs — navigate folders, set filename, confirm or cancel |
ui_controller_dock | Interact with the macOS Dock — list items, launch apps, or right-click for context menus |
Example Usage
- "Take a screenshot of Safari and show me all interactive elements"
- "Click the Submit button in Safari"
- "Click the element labeled 'Save' in Figma"
- "Type 'hello world' in the search field"
- "Scroll down in the document"
- "Drag the file icon to the trash"
- "Press Command+S to save the document"
- "Select Copy from the Edit menu"
- "List all windows in VS Code, then minimize the second one"
- "Save the file as report.pdf in my Documents folder"
- "Launch Slack from the Dock"
- "Find all buttons in the current app"
Related Workflows
Ready-to-use prompts for UI automation:
- Batch Data Entry - Automate repetitive form filling
- Menu Navigation - Execute complex menu commands