Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Advanced desktop automation with mouse, keyboard, and screen control
Advanced desktop automation with mouse, keyboard, and screen control
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
โ Absolute positioning - Move to exact coordinates โ Relative movement - Move from current position โ Smooth movement - Natural, human-like mouse paths โ Click types - Left, right, middle, double, triple clicks โ Drag & drop - Drag from point A to point B โ Scroll - Vertical and horizontal scrolling โ Position tracking - Get current mouse coordinates
โ Text typing - Fast, accurate text input โ Hotkeys - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.) โ Special keys - Enter, Tab, Escape, Arrow keys, F-keys โ Key combinations - Multi-key press combinations โ Hold & release - Manual key state control โ Typing speed - Configurable WPM (instant to human-like)
โ Screenshot - Capture entire screen or regions โ Image recognition - Find elements on screen (via OpenCV) โ Color detection - Get pixel colors at coordinates โ Multi-monitor - Support for multiple displays
โ Window list - Get all open windows โ Activate window - Bring window to front โ Window info - Get position, size, title โ Minimize/Maximize - Control window states
โ Failsafe - Move mouse to corner to abort โ Pause control - Emergency stop mechanism โ Approval mode - Require confirmation for actions โ Bounds checking - Prevent out-of-screen operations โ Logging - Track all automation actions
First, install required dependencies: pip install pyautogui pillow opencv-python pygetwindow
from skills.desktop_control import DesktopController # Initialize controller dc = DesktopController(failsafe=True) # Mouse operations dc.move_mouse(500, 300) # Move to coordinates dc.click() # Left click at current position dc.click(100, 200, button="right") # Right click at position # Keyboard operations dc.type_text("Hello from OpenClaw!") dc.hotkey("ctrl", "c") # Copy dc.press("enter") # Screen operations screenshot = dc.screenshot() position = dc.get_mouse_position()
move_mouse(x, y, duration=0, smooth=True) Move mouse to absolute screen coordinates. Parameters: x (int): X coordinate (pixels from left) y (int): Y coordinate (pixels from top) duration (float): Movement time in seconds (0 = instant, 0.5 = smooth) smooth (bool): Use bezier curve for natural movement Example: # Instant movement dc.move_mouse(1000, 500) # Smooth 1-second movement dc.move_mouse(1000, 500, duration=1.0) move_relative(x_offset, y_offset, duration=0) Move mouse relative to current position. Parameters: x_offset (int): Pixels to move horizontally (positive = right) y_offset (int): Pixels to move vertically (positive = down) duration (float): Movement time in seconds Example: # Move 100px right, 50px down dc.move_relative(100, 50, duration=0.3) click(x=None, y=None, button='left', clicks=1, interval=0.1) Perform mouse click. Parameters: x, y (int, optional): Coordinates to click (None = current position) button (str): 'left', 'right', 'middle' clicks (int): Number of clicks (1 = single, 2 = double) interval (float): Delay between multiple clicks Example: # Simple left click dc.click() # Double-click at specific position dc.click(500, 300, clicks=2) # Right-click dc.click(button='right') drag(start_x, start_y, end_x, end_y, duration=0.5, button='left') Drag and drop operation. Parameters: start_x, start_y (int): Starting coordinates end_x, end_y (int): Ending coordinates duration (float): Drag duration button (str): Mouse button to use Example: # Drag file from desktop to folder dc.drag(100, 100, 500, 500, duration=1.0) scroll(clicks, direction='vertical', x=None, y=None) Scroll mouse wheel. Parameters: clicks (int): Scroll amount (positive = up/left, negative = down/right) direction (str): 'vertical' or 'horizontal' x, y (int, optional): Position to scroll at Example: # Scroll down 5 clicks dc.scroll(-5) # Scroll up 10 clicks dc.scroll(10) # Horizontal scroll dc.scroll(5, direction='horizontal') get_mouse_position() Get current mouse coordinates. Returns: (x, y) tuple Example: x, y = dc.get_mouse_position() print(f"Mouse is at: {x}, {y}")
type_text(text, interval=0, wpm=None) Type text with configurable speed. Parameters: text (str): Text to type interval (float): Delay between keystrokes (0 = instant) wpm (int, optional): Words per minute (overrides interval) Example: # Instant typing dc.type_text("Hello World") # Human-like typing at 60 WPM dc.type_text("Hello World", wpm=60) # Slow typing with 0.1s between keys dc.type_text("Hello World", interval=0.1) press(key, presses=1, interval=0.1) Press and release a key. Parameters: key (str): Key name (see Key Names section) presses (int): Number of times to press interval (float): Delay between presses Example: # Press Enter dc.press('enter') # Press Space 3 times dc.press('space', presses=3) # Press Down arrow dc.press('down') hotkey(*keys, interval=0.05) Execute keyboard shortcut. Parameters: *keys (str): Keys to press together interval (float): Delay between key presses Example: # Copy (Ctrl+C) dc.hotkey('ctrl', 'c') # Paste (Ctrl+V) dc.hotkey('ctrl', 'v') # Open Run dialog (Win+R) dc.hotkey('win', 'r') # Save (Ctrl+S) dc.hotkey('ctrl', 's') # Select All (Ctrl+A) dc.hotkey('ctrl', 'a') key_down(key) / key_up(key) Manually control key state. Example: # Hold Shift dc.key_down('shift') dc.type_text("hello") # Types "HELLO" dc.key_up('shift') # Hold Ctrl and click (for multi-select) dc.key_down('ctrl') dc.click(100, 100) dc.click(200, 100) dc.key_up('ctrl')
screenshot(region=None, filename=None) Capture screen or region. Parameters: region (tuple, optional): (left, top, width, height) for partial capture filename (str, optional): Path to save image Returns: PIL Image object Example: # Full screen img = dc.screenshot() # Save to file dc.screenshot(filename="screenshot.png") # Capture specific region img = dc.screenshot(region=(100, 100, 500, 300)) get_pixel_color(x, y) Get color of pixel at coordinates. Returns: RGB tuple (r, g, b) Example: r, g, b = dc.get_pixel_color(500, 300) print(f"Color at (500, 300): RGB({r}, {g}, {b})") find_on_screen(image_path, confidence=0.8) Find image on screen (requires OpenCV). Parameters: image_path (str): Path to template image confidence (float): Match threshold (0-1) Returns: (x, y, width, height) or None Example: # Find button on screen location = dc.find_on_screen("button.png") if location: x, y, w, h = location # Click center of found image dc.click(x + w//2, y + h//2) get_screen_size() Get screen resolution. Returns: (width, height) tuple Example: width, height = dc.get_screen_size() print(f"Screen: {width}x{height}")
get_all_windows() List all open windows. Returns: List of window titles Example: windows = dc.get_all_windows() for title in windows: print(f"Window: {title}") activate_window(title_substring) Bring window to front by title. Parameters: title_substring (str): Part of window title to match Example: # Activate Chrome dc.activate_window("Chrome") # Activate VS Code dc.activate_window("Visual Studio Code") get_active_window() Get currently focused window. Returns: Window title (str) Example: active = dc.get_active_window() print(f"Active window: {active}")
copy_to_clipboard(text) Copy text to clipboard. Example: dc.copy_to_clipboard("Hello from OpenClaw!") get_from_clipboard() Get text from clipboard. Returns: str Example: text = dc.get_from_clipboard() print(f"Clipboard: {text}")
'a' through 'z'
'0' through '9'
'f1' through 'f24'
'enter' / 'return' 'esc' / 'escape' 'space' / 'spacebar' 'tab' 'backspace' 'delete' / 'del' 'insert' 'home' 'end' 'pageup' / 'pgup' 'pagedown' / 'pgdn'
'up' / 'down' / 'left' / 'right'
'ctrl' / 'control' 'shift' 'alt' 'win' / 'winleft' / 'winright' 'cmd' / 'command' (Mac)
'capslock' 'numlock' 'scrolllock'
'.' / ',' / '?' / '!' / ';' / ':' '[' / ']' / '{' / '}' '(' / ')' '+' / '-' / '*' / '/' / '='
Move mouse to any corner of the screen to abort all automation. # Enable failsafe (enabled by default) dc = DesktopController(failsafe=True)
# Pause all automation for 2 seconds dc.pause(2.0) # Check if automation is safe to proceed if dc.is_safe(): dc.click(500, 500)
Require user confirmation before actions: dc = DesktopController(require_approval=True) # This will ask for confirmation dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
dc = DesktopController() # Click name field dc.click(300, 200) dc.type_text("John Doe", wpm=80) # Tab to next field dc.press('tab') dc.type_text("john@example.com", wpm=80) # Tab to password dc.press('tab') dc.type_text("SecurePassword123", wpm=60) # Submit form dc.press('enter')
# Capture specific area region = (100, 100, 800, 600) # left, top, width, height img = dc.screenshot(region=region) # Save with timestamp import datetime timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") img.save(f"capture_{timestamp}.png")
# Hold Ctrl and click multiple files dc.key_down('ctrl') dc.click(100, 200) # First file dc.click(100, 250) # Second file dc.click(100, 300) # Third file dc.key_up('ctrl') # Copy selected files dc.hotkey('ctrl', 'c')
# Activate Calculator dc.activate_window("Calculator") time.sleep(0.5) # Type calculation dc.type_text("5+3=", interval=0.2) time.sleep(0.5) # Take screenshot of result dc.screenshot(filename="calculation_result.png")
# Drag file from source to destination dc.drag( start_x=200, start_y=300, # File location end_x=800, end_y=500, # Folder location duration=1.0 # Smooth 1-second drag )
Use instant movements for speed: duration=0 Batch operations instead of individual calls Cache screen positions instead of recalculating Disable failsafe for maximum performance (use with caution) Use hotkeys instead of menu navigation
Screen coordinates start at (0, 0) in top-left corner Multi-monitor setups may have negative coordinates for secondary displays Windows DPI scaling may affect coordinate accuracy Failsafe corners are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1) Some applications may block simulated input (games, secure apps)
Check DPI scaling settings Verify screen resolution matches expectations Use get_screen_size() to confirm dimensions
Ensure target application has focus Some apps require admin privileges Try increasing interval for reliability
Increase screen border tolerance Move mouse away from corners during normal use Disable if needed: DesktopController(failsafe=False)
Run Python with administrator privileges for some operations Some secure applications block automation
PyAutoGUI - Core automation engine Pillow - Image processing OpenCV (optional) - Image recognition PyGetWindow - Window management Install all: pip install pyautogui pillow opencv-python pygetwindow Built for OpenClaw - The ultimate desktop automation companion ๐ฆ
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.