{
  "schemaVersion": "1.0",
  "item": {
    "slug": "playwright-scraper-skill",
    "name": "Playwright Scraper Skill",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/waisimon/playwright-scraper-skill",
    "canonicalUrl": "https://clawhub.ai/waisimon/playwright-scraper-skill",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/playwright-scraper-skill",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=playwright-scraper-skill",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "CHANGELOG.md",
      "CONTRIBUTING.md",
      "INSTALL.md",
      "README.md",
      "README_ZH.md",
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/playwright-scraper-skill"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/playwright-scraper-skill",
    "agentPageUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Playwright Scraper Skill",
        "body": "A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level."
      },
      {
        "title": "🎯 Use Case Matrix",
        "body": "Target WebsiteAnti-Bot LevelRecommended MethodScriptRegular SitesLowweb_fetch toolN/A (built-in)Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.jsCloudflare ProtectedHighPlaywright Stealth ⭐scripts/playwright-stealth.jsYouTubeSpecialdeep-scraperInstall separatelyRedditSpecialreddit-scraperInstall separately"
      },
      {
        "title": "📦 Installation",
        "body": "cd playwright-scraper-skill\nnpm install\nnpx playwright install chromium"
      },
      {
        "title": "1️⃣ Simple Sites (No Anti-Bot)",
        "body": "Use OpenClaw's built-in web_fetch tool:\n\n# Invoke directly in OpenClaw\nHey, fetch me the content from https://example.com"
      },
      {
        "title": "2️⃣ Dynamic Sites (Requires JavaScript)",
        "body": "Use Playwright Simple:\n\nnode scripts/playwright-simple.js \"https://example.com\"\n\nExample output:\n\n{\n  \"url\": \"https://example.com\",\n  \"title\": \"Example Domain\",\n  \"content\": \"...\",\n  \"elapsedSeconds\": \"3.45\"\n}"
      },
      {
        "title": "3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)",
        "body": "Use Playwright Stealth:\n\nnode scripts/playwright-stealth.js \"https://m.discuss.com.hk/#hot\"\n\nFeatures:\n\nHide automation markers (navigator.webdriver = false)\nRealistic User-Agent (iPhone, Android)\nRandom delays to mimic human behavior\nScreenshot and HTML saving support"
      },
      {
        "title": "4️⃣ YouTube Video Transcripts",
        "body": "Use deep-scraper (install separately):\n\n# Install deep-scraper skill\nnpx clawhub install deep-scraper\n\n# Use it\ncd skills/deep-scraper\nnode assets/youtube_handler.js \"https://www.youtube.com/watch?v=VIDEO_ID\""
      },
      {
        "title": "scripts/playwright-simple.js",
        "body": "Use Case: Regular dynamic websites\nSpeed: Fast (3-5 seconds)\nAnti-Bot: None\nOutput: JSON (title, content, URL)"
      },
      {
        "title": "scripts/playwright-stealth.js ⭐",
        "body": "Use Case: Sites with Cloudflare or anti-bot protection\nSpeed: Medium (5-20 seconds)\nAnti-Bot: Medium-High (hides automation, realistic UA)\nOutput: JSON + Screenshot + HTML file\nVerified: 100% success on Discuss.com.hk"
      },
      {
        "title": "1. Try web_fetch First",
        "body": "If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest."
      },
      {
        "title": "2. Need JavaScript? Use Playwright Simple",
        "body": "If you need to wait for JavaScript rendering, use playwright-simple.js."
      },
      {
        "title": "3. Getting Blocked? Use Stealth",
        "body": "If you encounter 403 or Cloudflare challenges, use playwright-stealth.js."
      },
      {
        "title": "4. Special Sites Need Specialized Skills",
        "body": "YouTube → deep-scraper\nReddit → reddit-scraper\nTwitter → bird skill"
      },
      {
        "title": "🔧 Customization",
        "body": "All scripts support environment variables:\n\n# Set screenshot path\nSCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL\n\n# Set wait time (milliseconds)\nWAIT_TIME=10000 node scripts/playwright-simple.js URL\n\n# Enable headful mode (show browser)\nHEADLESS=false node scripts/playwright-stealth.js URL\n\n# Save HTML\nSAVE_HTML=true node scripts/playwright-stealth.js URL\n\n# Custom User-Agent\nUSER_AGENT=\"Mozilla/5.0 ...\" node scripts/playwright-stealth.js URL"
      },
      {
        "title": "📊 Performance Comparison",
        "body": "MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)web_fetch⚡ Fastest❌ None0%Playwright Simple🚀 Fast⚠️ Low20%Playwright Stealth⏱️ Medium✅ Medium100% ✅Puppeteer Stealth⏱️ Medium✅ Medium-High~80%Crawlee (deep-scraper)🐢 Slow❌ Detected0%Chaser (Rust)⏱️ Medium❌ Detected0%"
      },
      {
        "title": "🛡️ Anti-Bot Techniques Summary",
        "body": "Lessons learned from our testing:"
      },
      {
        "title": "✅ Effective Anti-Bot Measures",
        "body": "Hide navigator.webdriver — Essential\nRealistic User-Agent — Use real devices (iPhone, Android)\nMimic Human Behavior — Random delays, scrolling\nAvoid Framework Signatures — Crawlee, Selenium are easily detected\nUse addInitScript (Playwright) — Inject before page load"
      },
      {
        "title": "❌ Ineffective Anti-Bot Measures",
        "body": "Only changing User-Agent — Not enough\nUsing high-level frameworks (Crawlee) — More easily detected\nDocker isolation — Doesn't help with Cloudflare"
      },
      {
        "title": "Issue: 403 Forbidden",
        "body": "Solution: Use playwright-stealth.js"
      },
      {
        "title": "Issue: Cloudflare Challenge Page",
        "body": "Solution:\n\nIncrease wait time (10-15 seconds)\nTry headless: false (headful mode sometimes has higher success rate)\nConsider using proxy IPs"
      },
      {
        "title": "Issue: Blank Page",
        "body": "Solution:\n\nIncrease waitForTimeout\nUse waitUntil: 'networkidle' or 'domcontentloaded'\nCheck if login is required"
      },
      {
        "title": "2026-02-07 Discuss.com.hk Test Conclusions",
        "body": "✅ Pure Playwright + Stealth succeeded (5s, 200 OK)\n❌ Crawlee (deep-scraper) failed (403)\n❌ Chaser (Rust) failed (Cloudflare)\n❌ Puppeteer standard failed (403)\n\nBest Solution: Pure Playwright + anti-bot techniques (framework-independent)"
      },
      {
        "title": "🚧 Future Improvements",
        "body": "Add proxy IP rotation\n Implement cookie management (maintain login state)\n Add CAPTCHA handling (2captcha / Anti-Captcha)\n Batch scraping (parallel URLs)\n Integration with OpenClaw's browser tool"
      },
      {
        "title": "📚 References",
        "body": "Playwright Official Docs\npuppeteer-extra-plugin-stealth\ndeep-scraper skill"
      }
    ],
    "body": "Playwright Scraper Skill\n\nA Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.\n\n🎯 Use Case Matrix\nTarget Website\tAnti-Bot Level\tRecommended Method\tScript\nRegular Sites\tLow\tweb_fetch tool\tN/A (built-in)\nDynamic Sites\tMedium\tPlaywright Simple\tscripts/playwright-simple.js\nCloudflare Protected\tHigh\tPlaywright Stealth ⭐\tscripts/playwright-stealth.js\nYouTube\tSpecial\tdeep-scraper\tInstall separately\nReddit\tSpecial\treddit-scraper\tInstall separately\n📦 Installation\ncd playwright-scraper-skill\nnpm install\nnpx playwright install chromium\n\n🚀 Quick Start\n1️⃣ Simple Sites (No Anti-Bot)\n\nUse OpenClaw's built-in web_fetch tool:\n\n# Invoke directly in OpenClaw\nHey, fetch me the content from https://example.com\n\n2️⃣ Dynamic Sites (Requires JavaScript)\n\nUse Playwright Simple:\n\nnode scripts/playwright-simple.js \"https://example.com\"\n\n\nExample output:\n\n{\n  \"url\": \"https://example.com\",\n  \"title\": \"Example Domain\",\n  \"content\": \"...\",\n  \"elapsedSeconds\": \"3.45\"\n}\n\n3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)\n\nUse Playwright Stealth:\n\nnode scripts/playwright-stealth.js \"https://m.discuss.com.hk/#hot\"\n\n\nFeatures:\n\nHide automation markers (navigator.webdriver = false)\nRealistic User-Agent (iPhone, Android)\nRandom delays to mimic human behavior\nScreenshot and HTML saving support\n4️⃣ YouTube Video Transcripts\n\nUse deep-scraper (install separately):\n\n# Install deep-scraper skill\nnpx clawhub install deep-scraper\n\n# Use it\ncd skills/deep-scraper\nnode assets/youtube_handler.js \"https://www.youtube.com/watch?v=VIDEO_ID\"\n\n📖 Script Descriptions\nscripts/playwright-simple.js\nUse Case: Regular dynamic websites\nSpeed: Fast (3-5 seconds)\nAnti-Bot: None\nOutput: JSON (title, content, URL)\nscripts/playwright-stealth.js ⭐\nUse Case: Sites with Cloudflare or anti-bot protection\nSpeed: Medium (5-20 seconds)\nAnti-Bot: Medium-High (hides automation, realistic UA)\nOutput: JSON + Screenshot + HTML file\nVerified: 100% success on Discuss.com.hk\n🎓 Best Practices\n1. Try web_fetch First\n\nIf the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.\n\n2. Need JavaScript? Use Playwright Simple\n\nIf you need to wait for JavaScript rendering, use playwright-simple.js.\n\n3. Getting Blocked? Use Stealth\n\nIf you encounter 403 or Cloudflare challenges, use playwright-stealth.js.\n\n4. Special Sites Need Specialized Skills\nYouTube → deep-scraper\nReddit → reddit-scraper\nTwitter → bird skill\n🔧 Customization\n\nAll scripts support environment variables:\n\n# Set screenshot path\nSCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL\n\n# Set wait time (milliseconds)\nWAIT_TIME=10000 node scripts/playwright-simple.js URL\n\n# Enable headful mode (show browser)\nHEADLESS=false node scripts/playwright-stealth.js URL\n\n# Save HTML\nSAVE_HTML=true node scripts/playwright-stealth.js URL\n\n# Custom User-Agent\nUSER_AGENT=\"Mozilla/5.0 ...\" node scripts/playwright-stealth.js URL\n\n📊 Performance Comparison\nMethod\tSpeed\tAnti-Bot\tSuccess Rate (Discuss.com.hk)\nweb_fetch\t⚡ Fastest\t❌ None\t0%\nPlaywright Simple\t🚀 Fast\t⚠️ Low\t20%\nPlaywright Stealth\t⏱️ Medium\t✅ Medium\t100% ✅\nPuppeteer Stealth\t⏱️ Medium\t✅ Medium-High\t~80%\nCrawlee (deep-scraper)\t🐢 Slow\t❌ Detected\t0%\nChaser (Rust)\t⏱️ Medium\t❌ Detected\t0%\n🛡️ Anti-Bot Techniques Summary\n\nLessons learned from our testing:\n\n✅ Effective Anti-Bot Measures\nHide navigator.webdriver — Essential\nRealistic User-Agent — Use real devices (iPhone, Android)\nMimic Human Behavior — Random delays, scrolling\nAvoid Framework Signatures — Crawlee, Selenium are easily detected\nUse addInitScript (Playwright) — Inject before page load\n❌ Ineffective Anti-Bot Measures\nOnly changing User-Agent — Not enough\nUsing high-level frameworks (Crawlee) — More easily detected\nDocker isolation — Doesn't help with Cloudflare\n🔍 Troubleshooting\nIssue: 403 Forbidden\n\nSolution: Use playwright-stealth.js\n\nIssue: Cloudflare Challenge Page\n\nSolution:\n\nIncrease wait time (10-15 seconds)\nTry headless: false (headful mode sometimes has higher success rate)\nConsider using proxy IPs\nIssue: Blank Page\n\nSolution:\n\nIncrease waitForTimeout\nUse waitUntil: 'networkidle' or 'domcontentloaded'\nCheck if login is required\n📝 Memory & Experience\n2026-02-07 Discuss.com.hk Test Conclusions\n✅ Pure Playwright + Stealth succeeded (5s, 200 OK)\n❌ Crawlee (deep-scraper) failed (403)\n❌ Chaser (Rust) failed (Cloudflare)\n❌ Puppeteer standard failed (403)\n\nBest Solution: Pure Playwright + anti-bot techniques (framework-independent)\n\n🚧 Future Improvements\n Add proxy IP rotation\n Implement cookie management (maintain login state)\n Add CAPTCHA handling (2captcha / Anti-Captcha)\n Batch scraping (parallel URLs)\n Integration with OpenClaw's browser tool\n📚 References\nPlaywright Official Docs\npuppeteer-extra-plugin-stealth\ndeep-scraper skill"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/waisimon/playwright-scraper-skill",
    "publisherUrl": "https://clawhub.ai/waisimon/playwright-scraper-skill",
    "owner": "waisimon",
    "version": "1.2.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/playwright-scraper-skill",
    "downloadUrl": "https://openagent3.xyz/downloads/playwright-scraper-skill",
    "agentUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/playwright-scraper-skill/agent.md"
  }
}