{
  "schemaVersion": "1.0",
  "item": {
    "slug": "error-recovery-automation",
    "name": "Error Recovery Automation",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/konscious0beast/error-recovery-automation",
    "canonicalUrl": "https://clawhub.ai/konscious0beast/error-recovery-automation",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/error-recovery-automation",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=error-recovery-automation",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/error-recovery-automation"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/error-recovery-automation",
    "agentPageUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent",
    "manifestUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Error Recovery Automation Skill",
        "body": "This skill provides patterns for automating the detection and recovery of common OpenClaw errors: gateway unresponsiveness, browser service failures, cron scheduler issues, and other recurring problems. It builds on health‑monitoring and system‑diagnostics by adding automated recovery workflows that can be triggered by cron jobs, heartbeat checks, or external monitoring."
      },
      {
        "title": "When to use",
        "body": "A service (gateway, browser, cron) fails intermittently and you want to automate its restart.\nYou are setting up proactive monitoring and need a recovery plan beyond just detection.\nYou want to reduce the manual steps required when “Läuft alles?” reveals a failure.\nYou need to ensure critical OpenClaw components stay running with minimal user intervention.\nYou are asked to “create a skill for error recovery automation” (this is that skill)."
      },
      {
        "title": "1. Error Detection Patterns",
        "body": "Before automating recovery, you must reliably detect the error. Use these detection methods:\n\nGateway unresponsive:\n\nopenclaw gateway status returns non‑zero exit code or shows \"running\": false.\nGateway logs (~/.openclaw/logs/gateway.err.log) contain recent CRITICAL or ERROR entries.\nHTTP health endpoint (if configured) returns non‑2xx status.\n\nBrowser service unavailable:\n\nopenclaw browser --browser-profile openclaw status --json shows \"running\": false or CDP not ready.\nBrowser logs contain connection timeouts or Chrome process failures.\nSimple page load via curl to CDP endpoint fails.\n\nCron scheduler not running:\n\nopenclaw cron status returns \"running\": false or error.\nCron logs show no recent activity.\nScheduled jobs are not triggered (check openclaw cron list for missed runs).\n\nMemory search disabled:\n\nmemory_search tool returns “disabled” or native‑module error.\nopenclaw doctor --fix reports better‑sqlite3 mismatch.\n\nPermission errors:\n\nFile operations fail with EACCES/EPERM.\nLogs indicate permission denied on specific paths (archive, logs, config)."
      },
      {
        "title": "2. Automated Recovery Steps",
        "body": "For each error type, define a recovery script that attempts to restore service automatically. The script should:\n\nDetect the error (using the patterns above).\nAttempt recovery (restart service, fix permissions, rebuild module).\nVerify recovery (re‑run detection after a short wait).\nReport outcome (exit code 0 for success, non‑zero for persistent failure).\n\nGateway Recovery Script Template\n\n#!/bin/bash\nset -e\n\nSERVICE=\"gateway\"\nMAX_ATTEMPTS=2\nSLEEP_SECONDS=5\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw gateway status > /dev/null 2>&1\n}\n\nrestart() {\n  openclaw gateway restart\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE is healthy\"\n    exit 0\n  fi\n  log \"$SERVICE is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1\n\nBrowser Service Recovery Script Template\n\n#!/bin/bash\nset -e\n\nSERVICE=\"browser\"\nPROFILE=\"openclaw\"\nMAX_ATTEMPTS=2\nSLEEP_SECONDS=8\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw browser --browser-profile \"$PROFILE\" status --json 2>&1 | grep -q '\"running\":true'\n}\n\nrestart() {\n  openclaw browser --browser-profile \"$PROFILE\" stop\n  sleep 2\n  openclaw browser --browser-profile \"$PROFILE\" start\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE ($PROFILE) is healthy\"\n    exit 0\n  fi\n  log \"$SERVICE ($PROFILE) is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE ($PROFILE) could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1\n\nCron Scheduler Recovery Script Template\n\n#!/bin/bash\nset -e\n\nSERVICE=\"cron\"\nMAX_ATTEMPTS=1\nSLEEP_SECONDS=3\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw cron status 2>&1 | grep -q '\"running\":true'\n}\n\nrestart() {\n  # Cron is restarted automatically when gateway restarts.\n  # If cron is not running, restart gateway.\n  openclaw gateway restart\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE scheduler is running\"\n    exit 0\n  fi\n  log \"$SERVICE scheduler is not running, restarting gateway (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE scheduler still not running after $MAX_ATTEMPTS attempts\"\nexit 1\n\nMemory Search Recovery Script Template\n\n#!/bin/bash\nset -e\n\nSERVICE=\"memory_search\"\nMAX_ATTEMPTS=1\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw memory search --query \"test\" 2>&1 | grep -q -v \"disabled\\|Module did not self-register\"\n}\n\nrestart() {\n  # Try rebuilding better‑sqlite3\n  cd \"$(dirname \"$(which openclaw)\")/../lib/node_modules/openclaw\"\n  npm rebuild better-sqlite3\n  # Restart gateway to pick up the rebuilt module\n  openclaw gateway restart\n  sleep 5\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE is functional\"\n    exit 0\n  fi\n  log \"$SERVICE is disabled, rebuilding native module (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1"
      },
      {
        "title": "3. Integration with Cron for Automated Recovery",
        "body": "Once you have a recovery script, schedule it as a cron job that runs only when the service is likely to fail (e.g., every 30 minutes for browser, every hour for gateway). Use an isolated agent session to execute the script and announce failures.\n\nExample cron job for browser recovery:\n\nopenclaw cron add \\\n  --name \"Browser‑Recovery‑Automation\" \\\n  --schedule 'every 30 minutes' \\\n  --session isolated \\\n  --payload '{\"kind\":\"agentTurn\",\"message\":\"Run browser recovery automation script\",\"model\":\"default\",\"thinking\":\"low\"}' \\\n  --delivery '{\"mode\":\"announce\",\"channel\":\"telegram\"}'\n\nAgent response inside isolated session: The agent reads the script (or inline logic) and executes it via exec. If the script exits with 0, the agent announces success; if non‑zero, the cron delivery forwards the failure message.\n\nAlternative: You can embed the recovery logic directly in the agent’s response (without a separate script) for simplicity, but a script is easier to test and reuse."
      },
      {
        "title": "4. Escalation When Automation Fails",
        "body": "If automated recovery fails after the maximum attempts, escalate:\n\nLog the failure in memory/YYYY‑MM‑DD.md with tag error‑recovery‑failed.\nAdd a task to inbox/agent‑aufgaben.md for manual diagnosis.\nSend a high‑priority notification (if supported) to the user.\nFallback to a safe state (e.g., disable the problematic component if possible).\n\nExample escalation snippet:\n\nif [ $? -ne 0 ]; then\n  echo \"Browser recovery failed. Adding manual diagnosis task.\"\n  # Append to agent-aufgaben.md\n  echo \"| 99 | Diagnose browser recovery failure – automated recovery failed after 2 attempts | ⬜ |\" >> inbox/agent-aufgaben.md\n  # Store in memory\n  echo \"## [error] Browser recovery automation failed\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Date: $(date +%Y-%m-%d)\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Tags: error, browser, recovery-failed\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Browser recovery script exited with code $?. Manual intervention required.\" >> memory/$(date +%Y-%m-%d).md\nfi"
      },
      {
        "title": "5. Testing Recovery Scripts",
        "body": "Before deploying a recovery script as a cron job, test it manually:\n\nSimulate the failure (e.g., kill the gateway process, stop the browser service).\nRun the recovery script and verify it detects the failure and restarts the service.\nCheck that the service is functional after recovery.\nVerify logs for any unintended side effects.\n\nExample test command:\n\n# Stop browser service\nopenclaw browser --browser-profile openclaw stop\n\n# Run recovery script\n./scripts/browser-recovery.sh\n\n# Verify browser is running\nopenclaw browser --browser-profile openclaw status --json | grep '\"running\":true'"
      },
      {
        "title": "Example 1: Gateway Recovery Automation",
        "body": "Script: scripts/gateway-recovery.sh (see template above). Cron schedule: every 1 hour. Announce only on failure."
      },
      {
        "title": "Example 2: Browser Recovery Automation",
        "body": "Script: scripts/browser-recovery.sh (see template above). Cron schedule: every 30 minutes. Announce only on failure."
      },
      {
        "title": "Example 3: Combined Health‑Check + Recovery",
        "body": "A single script that checks multiple services and recovers any that are unhealthy. Useful for a comprehensive “keep‑alive” cron job.\n\n#!/bin/bash\nset -e\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\n# Check gateway\nif ! openclaw gateway status > /dev/null 2>&1; then\n  log \"Gateway unhealthy, restarting...\"\n  openclaw gateway restart\n  sleep 5\nfi\n\n# Check browser\nif ! openclaw browser --browser-profile openclaw status --json 2>&1 | grep -q '\"running\":true'; then\n  log \"Browser unhealthy, restarting...\"\n  openclaw browser --browser-profile openclaw stop\n  sleep 2\n  openclaw browser --browser-profile openclaw start\n  sleep 8\nfi\n\nlog \"All services healthy\"\nexit 0\n\nSchedule this script every 30 minutes with an isolated agentTurn job."
      },
      {
        "title": "Anti‑Patterns",
        "body": "Over‑aggressive recovery: Restarting a service too frequently can cause instability. Set reasonable intervals (≥30 minutes) and maximum attempts (≤2).\nSilent recovery: If recovery succeeds but you never hear about it, you might not know the service was failing. At minimum, log recovery events to memory/ files.\nNo verification: Restarting a service without verifying it actually recovered can mask deeper issues. Always re‑check after restart.\nHard‑coded assumptions: Avoid assuming a specific Node version, path, or user ID. Use environment variables or detect them at runtime.\nIgnoring dependencies: Browser depends on gateway; restarting browser while gateway is down will fail. Check dependencies in order.\nAutomating unsafe actions: Do not automate deletion of logs, modification of critical configs, or any irreversible action without a rollback plan."
      },
      {
        "title": "Related Patterns",
        "body": "Health‑Monitoring skill – proactive health checks and monitoring.\nSystem‑Diagnostics skill – diagnosing root causes of failures.\nCron‑Job Creation playbook – creating scheduled jobs.\nGateway Health Check and Recovery playbook – specific to gateway.\nBrowser Service Health Monitoring and Recovery playbook – specific to browser.\nMaintenance Execution playbook – incorporating recovery into regular maintenance."
      },
      {
        "title": "References",
        "body": "scripts/gateway-recovery.sh (template)\nscripts/browser-recovery.sh (template)\nscripts/cron-recovery.sh (template)\nskills/health-monitoring/SKILL.md\nskills/system-diagnostics/SKILL.md\ndocs/MAINTENANCE.md\nmemory/patterns/playbooks.md\nopenclaw cron --help\nopenclaw gateway --help\nopenclaw browser --help"
      },
      {
        "title": "Skill Integration",
        "body": "When an OpenClaw error occurs (gateway, browser, cron, memory search), read this skill to create or run an automated recovery script. Store successful recovery patterns in memory/patterns/tools.md. Update pending.md if automation fails and manual intervention is needed.\n\nThis skill increases autonomy by providing standardized, automated recovery workflows for common failures, reducing the need for manual intervention and increasing system resilience."
      }
    ],
    "body": "Error Recovery Automation Skill\n\nThis skill provides patterns for automating the detection and recovery of common OpenClaw errors: gateway unresponsiveness, browser service failures, cron scheduler issues, and other recurring problems. It builds on health‑monitoring and system‑diagnostics by adding automated recovery workflows that can be triggered by cron jobs, heartbeat checks, or external monitoring.\n\nWhen to use\nA service (gateway, browser, cron) fails intermittently and you want to automate its restart.\nYou are setting up proactive monitoring and need a recovery plan beyond just detection.\nYou want to reduce the manual steps required when “Läuft alles?” reveals a failure.\nYou need to ensure critical OpenClaw components stay running with minimal user intervention.\nYou are asked to “create a skill for error recovery automation” (this is that skill).\nCore patterns\n1. Error Detection Patterns\n\nBefore automating recovery, you must reliably detect the error. Use these detection methods:\n\nGateway unresponsive:\n\nopenclaw gateway status returns non‑zero exit code or shows \"running\": false.\nGateway logs (~/.openclaw/logs/gateway.err.log) contain recent CRITICAL or ERROR entries.\nHTTP health endpoint (if configured) returns non‑2xx status.\n\nBrowser service unavailable:\n\nopenclaw browser --browser-profile openclaw status --json shows \"running\": false or CDP not ready.\nBrowser logs contain connection timeouts or Chrome process failures.\nSimple page load via curl to CDP endpoint fails.\n\nCron scheduler not running:\n\nopenclaw cron status returns \"running\": false or error.\nCron logs show no recent activity.\nScheduled jobs are not triggered (check openclaw cron list for missed runs).\n\nMemory search disabled:\n\nmemory_search tool returns “disabled” or native‑module error.\nopenclaw doctor --fix reports better‑sqlite3 mismatch.\n\nPermission errors:\n\nFile operations fail with EACCES/EPERM.\nLogs indicate permission denied on specific paths (archive, logs, config).\n2. Automated Recovery Steps\n\nFor each error type, define a recovery script that attempts to restore service automatically. The script should:\n\nDetect the error (using the patterns above).\nAttempt recovery (restart service, fix permissions, rebuild module).\nVerify recovery (re‑run detection after a short wait).\nReport outcome (exit code 0 for success, non‑zero for persistent failure).\nGateway Recovery Script Template\n#!/bin/bash\nset -e\n\nSERVICE=\"gateway\"\nMAX_ATTEMPTS=2\nSLEEP_SECONDS=5\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw gateway status > /dev/null 2>&1\n}\n\nrestart() {\n  openclaw gateway restart\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE is healthy\"\n    exit 0\n  fi\n  log \"$SERVICE is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1\n\nBrowser Service Recovery Script Template\n#!/bin/bash\nset -e\n\nSERVICE=\"browser\"\nPROFILE=\"openclaw\"\nMAX_ATTEMPTS=2\nSLEEP_SECONDS=8\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw browser --browser-profile \"$PROFILE\" status --json 2>&1 | grep -q '\"running\":true'\n}\n\nrestart() {\n  openclaw browser --browser-profile \"$PROFILE\" stop\n  sleep 2\n  openclaw browser --browser-profile \"$PROFILE\" start\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE ($PROFILE) is healthy\"\n    exit 0\n  fi\n  log \"$SERVICE ($PROFILE) is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE ($PROFILE) could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1\n\nCron Scheduler Recovery Script Template\n#!/bin/bash\nset -e\n\nSERVICE=\"cron\"\nMAX_ATTEMPTS=1\nSLEEP_SECONDS=3\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw cron status 2>&1 | grep -q '\"running\":true'\n}\n\nrestart() {\n  # Cron is restarted automatically when gateway restarts.\n  # If cron is not running, restart gateway.\n  openclaw gateway restart\n  sleep \"$SLEEP_SECONDS\"\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE scheduler is running\"\n    exit 0\n  fi\n  log \"$SERVICE scheduler is not running, restarting gateway (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE scheduler still not running after $MAX_ATTEMPTS attempts\"\nexit 1\n\nMemory Search Recovery Script Template\n#!/bin/bash\nset -e\n\nSERVICE=\"memory_search\"\nMAX_ATTEMPTS=1\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\ncheck() {\n  openclaw memory search --query \"test\" 2>&1 | grep -q -v \"disabled\\|Module did not self-register\"\n}\n\nrestart() {\n  # Try rebuilding better‑sqlite3\n  cd \"$(dirname \"$(which openclaw)\")/../lib/node_modules/openclaw\"\n  npm rebuild better-sqlite3\n  # Restart gateway to pick up the rebuilt module\n  openclaw gateway restart\n  sleep 5\n}\n\nattempt=0\nwhile [ $attempt -lt $MAX_ATTEMPTS ]; do\n  if check; then\n    log \"$SERVICE is functional\"\n    exit 0\n  fi\n  log \"$SERVICE is disabled, rebuilding native module (attempt $((attempt+1))/$MAX_ATTEMPTS)...\"\n  restart\n  ((attempt++))\ndone\n\nlog \"$SERVICE could not be recovered after $MAX_ATTEMPTS attempts\"\nexit 1\n\n3. Integration with Cron for Automated Recovery\n\nOnce you have a recovery script, schedule it as a cron job that runs only when the service is likely to fail (e.g., every 30 minutes for browser, every hour for gateway). Use an isolated agent session to execute the script and announce failures.\n\nExample cron job for browser recovery:\n\nopenclaw cron add \\\n  --name \"Browser‑Recovery‑Automation\" \\\n  --schedule 'every 30 minutes' \\\n  --session isolated \\\n  --payload '{\"kind\":\"agentTurn\",\"message\":\"Run browser recovery automation script\",\"model\":\"default\",\"thinking\":\"low\"}' \\\n  --delivery '{\"mode\":\"announce\",\"channel\":\"telegram\"}'\n\n\nAgent response inside isolated session: The agent reads the script (or inline logic) and executes it via exec. If the script exits with 0, the agent announces success; if non‑zero, the cron delivery forwards the failure message.\n\nAlternative: You can embed the recovery logic directly in the agent’s response (without a separate script) for simplicity, but a script is easier to test and reuse.\n\n4. Escalation When Automation Fails\n\nIf automated recovery fails after the maximum attempts, escalate:\n\nLog the failure in memory/YYYY‑MM‑DD.md with tag error‑recovery‑failed.\nAdd a task to inbox/agent‑aufgaben.md for manual diagnosis.\nSend a high‑priority notification (if supported) to the user.\nFallback to a safe state (e.g., disable the problematic component if possible).\n\nExample escalation snippet:\n\nif [ $? -ne 0 ]; then\n  echo \"Browser recovery failed. Adding manual diagnosis task.\"\n  # Append to agent-aufgaben.md\n  echo \"| 99 | Diagnose browser recovery failure – automated recovery failed after 2 attempts | ⬜ |\" >> inbox/agent-aufgaben.md\n  # Store in memory\n  echo \"## [error] Browser recovery automation failed\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Date: $(date +%Y-%m-%d)\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Tags: error, browser, recovery-failed\" >> memory/$(date +%Y-%m-%d).md\n  echo \"Browser recovery script exited with code $?. Manual intervention required.\" >> memory/$(date +%Y-%m-%d).md\nfi\n\n5. Testing Recovery Scripts\n\nBefore deploying a recovery script as a cron job, test it manually:\n\nSimulate the failure (e.g., kill the gateway process, stop the browser service).\nRun the recovery script and verify it detects the failure and restarts the service.\nCheck that the service is functional after recovery.\nVerify logs for any unintended side effects.\n\nExample test command:\n\n# Stop browser service\nopenclaw browser --browser-profile openclaw stop\n\n# Run recovery script\n./scripts/browser-recovery.sh\n\n# Verify browser is running\nopenclaw browser --browser-profile openclaw status --json | grep '\"running\":true'\n\nExamples\nExample 1: Gateway Recovery Automation\n\nScript: scripts/gateway-recovery.sh (see template above). Cron schedule: every 1 hour. Announce only on failure.\n\nExample 2: Browser Recovery Automation\n\nScript: scripts/browser-recovery.sh (see template above). Cron schedule: every 30 minutes. Announce only on failure.\n\nExample 3: Combined Health‑Check + Recovery\n\nA single script that checks multiple services and recovers any that are unhealthy. Useful for a comprehensive “keep‑alive” cron job.\n\n#!/bin/bash\nset -e\n\nlog() { echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"; }\n\n# Check gateway\nif ! openclaw gateway status > /dev/null 2>&1; then\n  log \"Gateway unhealthy, restarting...\"\n  openclaw gateway restart\n  sleep 5\nfi\n\n# Check browser\nif ! openclaw browser --browser-profile openclaw status --json 2>&1 | grep -q '\"running\":true'; then\n  log \"Browser unhealthy, restarting...\"\n  openclaw browser --browser-profile openclaw stop\n  sleep 2\n  openclaw browser --browser-profile openclaw start\n  sleep 8\nfi\n\nlog \"All services healthy\"\nexit 0\n\n\nSchedule this script every 30 minutes with an isolated agentTurn job.\n\nAnti‑Patterns\nOver‑aggressive recovery: Restarting a service too frequently can cause instability. Set reasonable intervals (≥30 minutes) and maximum attempts (≤2).\nSilent recovery: If recovery succeeds but you never hear about it, you might not know the service was failing. At minimum, log recovery events to memory/ files.\nNo verification: Restarting a service without verifying it actually recovered can mask deeper issues. Always re‑check after restart.\nHard‑coded assumptions: Avoid assuming a specific Node version, path, or user ID. Use environment variables or detect them at runtime.\nIgnoring dependencies: Browser depends on gateway; restarting browser while gateway is down will fail. Check dependencies in order.\nAutomating unsafe actions: Do not automate deletion of logs, modification of critical configs, or any irreversible action without a rollback plan.\nRelated Patterns\nHealth‑Monitoring skill – proactive health checks and monitoring.\nSystem‑Diagnostics skill – diagnosing root causes of failures.\nCron‑Job Creation playbook – creating scheduled jobs.\nGateway Health Check and Recovery playbook – specific to gateway.\nBrowser Service Health Monitoring and Recovery playbook – specific to browser.\nMaintenance Execution playbook – incorporating recovery into regular maintenance.\nReferences\nscripts/gateway-recovery.sh (template)\nscripts/browser-recovery.sh (template)\nscripts/cron-recovery.sh (template)\nskills/health-monitoring/SKILL.md\nskills/system-diagnostics/SKILL.md\ndocs/MAINTENANCE.md\nmemory/patterns/playbooks.md\nopenclaw cron --help\nopenclaw gateway --help\nopenclaw browser --help\nSkill Integration\n\nWhen an OpenClaw error occurs (gateway, browser, cron, memory search), read this skill to create or run an automated recovery script. Store successful recovery patterns in memory/patterns/tools.md. Update pending.md if automation fails and manual intervention is needed.\n\nThis skill increases autonomy by providing standardized, automated recovery workflows for common failures, reducing the need for manual intervention and increasing system resilience."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/konscious0beast/error-recovery-automation",
    "publisherUrl": "https://clawhub.ai/konscious0beast/error-recovery-automation",
    "owner": "konscious0beast",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/error-recovery-automation",
    "downloadUrl": "https://openagent3.xyz/downloads/error-recovery-automation",
    "agentUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent",
    "manifestUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/error-recovery-automation/agent.md"
  }
}