# Send Windows Control to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "windows-control",
    "name": "Windows Control",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/Spliff7777/windows-control",
    "canonicalUrl": "https://clawhub.ai/Spliff7777/windows-control",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/windows-control",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=windows-control",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "package.json",
      "SKILL.md",
      "scripts/click.py",
      "scripts/click_element.py",
      "scripts/click_text.py",
      "scripts/close_window.py"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "windows-control",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-11T02:42:43.700Z",
      "expiresAt": "2026-05-18T02:42:43.700Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=windows-control",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=windows-control",
        "contentDisposition": "attachment; filename=\"windows-control-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "windows-control"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/windows-control"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/windows-control",
    "downloadUrl": "https://openagent3.xyz/downloads/windows-control",
    "agentUrl": "https://openagent3.xyz/skills/windows-control/agent",
    "manifestUrl": "https://openagent3.xyz/skills/windows-control/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/windows-control/agent.md"
  }
}
```
## Documentation

### Windows Control Skill

Full desktop automation for Windows. Control mouse, keyboard, and screen like a human user.

### Quick Start

All scripts are in skills/windows-control/scripts/

### Screenshot

py screenshot.py > output.b64

Returns base64 PNG of entire screen.

### Click

py click.py 500 300              # Left click at (500, 300)
py click.py 500 300 right        # Right click
py click.py 500 300 left 2       # Double click

### Type Text

py type_text.py "Hello World"

Types text at current cursor position (10ms between keys).

### Press Keys

py key_press.py "enter"
py key_press.py "ctrl+s"
py key_press.py "alt+tab"
py key_press.py "ctrl+shift+esc"

### Move Mouse

py mouse_move.py 500 300

Moves mouse to coordinates (smooth 0.2s animation).

### Scroll

py scroll.py up 5      # Scroll up 5 notches
py scroll.py down 10   # Scroll down 10 notches

### Window Management (NEW!)

py focus_window.py "Chrome"           # Bring window to front
py minimize_window.py "Notepad"       # Minimize window
py maximize_window.py "VS Code"       # Maximize window
py close_window.py "Calculator"       # Close window
py get_active_window.py               # Get title of active window

### Advanced Actions (NEW!)

# Click by text (No coordinates needed!)
py click_text.py "Save"               # Click "Save" button anywhere
py click_text.py "Submit" "Chrome"    # Click "Submit" in Chrome only

# Drag and Drop
py drag.py 100 100 500 300            # Drag from (100,100) to (500,300)

# Robust Automation (Wait/Find)
py wait_for_text.py "Ready" "App" 30  # Wait up to 30s for text
py wait_for_window.py "Notepad" 10    # Wait for window to appear
py find_text.py "Login" "Chrome"      # Get coordinates of text
py list_windows.py                    # List all open windows

### Read Window Text

py read_window.py "Notepad"           # Read all text from Notepad
py read_window.py "Visual Studio"     # Read text from VS Code
py read_window.py "Chrome"            # Read text from browser

Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!

### Read UI Elements (NEW!)

py read_ui_elements.py "Chrome"               # All interactive elements
py read_ui_elements.py "Chrome" --buttons-only  # Just buttons
py read_ui_elements.py "Chrome" --links-only    # Just links
py read_ui_elements.py "Chrome" --json          # JSON output

Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.

### Read Webpage Content (NEW!)

py read_webpage.py                     # Read active browser
py read_webpage.py "Chrome"            # Target Chrome specifically
py read_webpage.py "Chrome" --buttons  # Include buttons
py read_webpage.py "Chrome" --links    # Include links with coords
py read_webpage.py "Chrome" --full     # All elements (inputs, images)
py read_webpage.py "Chrome" --json     # JSON output

Enhanced browser content extraction with headings, text, buttons, and links.

### Handle Dialogs (NEW!)

# List all open dialogs
py handle_dialog.py list

# Read current dialog content
py handle_dialog.py read
py handle_dialog.py read --json

# Click button in dialog
py handle_dialog.py click "OK"
py handle_dialog.py click "Save"
py handle_dialog.py click "Yes"

# Type into dialog text field
py handle_dialog.py type "myfile.txt"
py handle_dialog.py type "C:\\path\\to\\file" --field 0

# Dismiss dialog (auto-finds OK/Close/Cancel)
py handle_dialog.py dismiss

# Wait for dialog to appear
py handle_dialog.py wait --timeout 10
py handle_dialog.py wait "Save As" --timeout 5

Handles Save/Open dialogs, message boxes, alerts, confirmations, etc.

### Click Element by Name (NEW!)

py click_element.py "Save"                    # Click "Save" anywhere
py click_element.py "OK" --window "Notepad"   # In specific window
py click_element.py "Submit" --type Button    # Only buttons
py click_element.py "File" --type MenuItem    # Menu items
py click_element.py --list                    # List clickable elements
py click_element.py --list --window "Chrome"  # List in specific window

Click buttons, links, menu items by name without needing coordinates.

### Read Screen Region (OCR - Optional)

py read_region.py 100 100 500 300     # Read text from coordinates

Note: Requires Tesseract OCR installation. Use read_window.py instead for better results.

### Workflow Pattern

Read window - Extract text from specific window (fast, accurate)
Read UI elements - Get buttons, links with coordinates
Screenshot (if needed) - See visual layout
Act - Click element by name or coordinates
Handle dialogs - Interact with popups/save dialogs
Read window - Verify changes

### Screen Coordinates

Origin (0, 0) is top-left corner
Your screen: 2560x1440 (check with screenshot)
Use coordinates from screenshot analysis

### Open Notepad and type

# Press Windows key
py key_press.py "win"

# Type "notepad"
py type_text.py "notepad"

# Press Enter
py key_press.py "enter"

# Wait a moment, then type
py type_text.py "Hello from AI!"

# Save
py key_press.py "ctrl+s"

### Click in VS Code

# Read current VS Code content
py read_window.py "Visual Studio Code"

# Click at specific location (e.g., file explorer)
py click.py 50 100

# Type filename
py type_text.py "test.js"

# Press Enter
py key_press.py "enter"

# Verify new file opened
py read_window.py "Visual Studio Code"

### Monitor Notepad changes

# Read current content
py read_window.py "Notepad"

# User types something...

# Read updated content (no screenshot needed!)
py read_window.py "Notepad"

### Text Reading Methods

Method 1: Windows UI Automation (BEST)

Use read_window.py for any window
Use read_ui_elements.py for buttons/links with coordinates
Use read_webpage.py for browser content with structure
Gets actual text data (not image-based)

Method 2: Click by Name (NEW)

Use click_element.py to click buttons/links by name
No coordinates needed - finds elements automatically
Works across all windows or target specific window

Method 3: Dialog Handling (NEW)

Use handle_dialog.py for popups, save dialogs, alerts
Read dialog content, click buttons, type text
Auto-dismiss with common buttons (OK, Cancel, etc.)

Method 4: Screenshot + Vision (Fallback)

Take full screenshot
AI reads text visually
Slower but works for any content

Method 5: OCR (Optional)

Use read_region.py with Tesseract
Requires additional installation
Good for images/PDFs with text

### Safety Features

pyautogui.FAILSAFE = True (move mouse to top-left to abort)
Small delays between actions
Smooth mouse movements (not instant jumps)

### Requirements

Python 3.11+
pyautogui (installed ✅)
pillow (installed ✅)

### Tips

Always screenshot first to see current state
Coordinates are absolute (not relative to windows)
Wait briefly after clicks for UI to update
Use ctrl+z friendly actions when possible

Status: ✅ READY FOR USE (v2.0 - Dialog & UI Elements)
Created: 2026-02-01
Updated: 2026-02-02
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: Spliff7777
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-11T02:42:43.700Z
- Expires at: 2026-05-18T02:42:43.700Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/windows-control)
- [Send to Agent page](https://openagent3.xyz/skills/windows-control/agent)
- [JSON manifest](https://openagent3.xyz/skills/windows-control/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/windows-control/agent.md)
- [Download page](https://openagent3.xyz/downloads/windows-control)