Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.
Target WebsiteAnti-Bot LevelRecommended MethodScriptRegular SitesLowweb_fetch toolN/A (built-in)Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.jsCloudflare ProtectedHighPlaywright Stealth ⭐scripts/playwright-stealth.jsYouTubeSpecialdeep-scraperInstall separatelyRedditSpecialreddit-scraperInstall separately
cd playwright-scraper-skill npm install npx playwright install chromium
Use OpenClaw's built-in web_fetch tool: # Invoke directly in OpenClaw Hey, fetch me the content from https://example.com
Use Playwright Simple: node scripts/playwright-simple.js "https://example.com" Example output: { "url": "https://example.com", "title": "Example Domain", "content": "...", "elapsedSeconds": "3.45" }
Use Playwright Stealth: node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot" Features: Hide automation markers (navigator.webdriver = false) Realistic User-Agent (iPhone, Android) Random delays to mimic human behavior Screenshot and HTML saving support
Use deep-scraper (install separately): # Install deep-scraper skill npx clawhub install deep-scraper # Use it cd skills/deep-scraper node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
Use Case: Regular dynamic websites Speed: Fast (3-5 seconds) Anti-Bot: None Output: JSON (title, content, URL)
Use Case: Sites with Cloudflare or anti-bot protection Speed: Medium (5-20 seconds) Anti-Bot: Medium-High (hides automation, realistic UA) Output: JSON + Screenshot + HTML file Verified: 100% success on Discuss.com.hk
If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.
If you need to wait for JavaScript rendering, use playwright-simple.js.
If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
YouTube → deep-scraper Reddit → reddit-scraper Twitter → bird skill
All scripts support environment variables: # Set screenshot path SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL # Set wait time (milliseconds) WAIT_TIME=10000 node scripts/playwright-simple.js URL # Enable headful mode (show browser) HEADLESS=false node scripts/playwright-stealth.js URL # Save HTML SAVE_HTML=true node scripts/playwright-stealth.js URL # Custom User-Agent USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL
MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)web_fetch⚡ Fastest❌ None0%Playwright Simple🚀 Fast⚠️ Low20%Playwright Stealth⏱️ Medium✅ Medium100% ✅Puppeteer Stealth⏱️ Medium✅ Medium-High~80%Crawlee (deep-scraper)🐢 Slow❌ Detected0%Chaser (Rust)⏱️ Medium❌ Detected0%
Lessons learned from our testing:
Hide navigator.webdriver — Essential Realistic User-Agent — Use real devices (iPhone, Android) Mimic Human Behavior — Random delays, scrolling Avoid Framework Signatures — Crawlee, Selenium are easily detected Use addInitScript (Playwright) — Inject before page load
Only changing User-Agent — Not enough Using high-level frameworks (Crawlee) — More easily detected Docker isolation — Doesn't help with Cloudflare
Solution: Use playwright-stealth.js
Solution: Increase wait time (10-15 seconds) Try headless: false (headful mode sometimes has higher success rate) Consider using proxy IPs
Solution: Increase waitForTimeout Use waitUntil: 'networkidle' or 'domcontentloaded' Check if login is required
✅ Pure Playwright + Stealth succeeded (5s, 200 OK) ❌ Crawlee (deep-scraper) failed (403) ❌ Chaser (Rust) failed (Cloudflare) ❌ Puppeteer standard failed (403) Best Solution: Pure Playwright + anti-bot techniques (framework-independent)
Add proxy IP rotation Implement cookie management (maintain login state) Add CAPTCHA handling (2captcha / Anti-Captcha) Batch scraping (parallel URLs) Integration with OpenClaw's browser tool
Playwright Official Docs puppeteer-extra-plugin-stealth deep-scraper skill
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.