← All skills
Tencent SkillHub · Developer Tools

Playwright Scraper Skill 1.2.0

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
CHANGELOG.md, CONTRIBUTING.md, INSTALL.md, README.md, README_ZH.md, SKILL.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 24 sections Open source page

Playwright Scraper Skill

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.

🎯 Use Case Matrix

Target WebsiteAnti-Bot LevelRecommended MethodScriptRegular SitesLowweb_fetch toolN/A (built-in)Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.jsCloudflare ProtectedHighPlaywright Stealth ⭐scripts/playwright-stealth.jsYouTubeSpecialdeep-scraperInstall separatelyRedditSpecialreddit-scraperInstall separately

📦 Installation

cd playwright-scraper-skill npm install npx playwright install chromium

1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in web_fetch tool: # Invoke directly in OpenClaw Hey, fetch me the content from https://example.com

2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple: node scripts/playwright-simple.js "https://example.com" Example output: { "url": "https://example.com", "title": "Example Domain", "content": "...", "elapsedSeconds": "3.45" }

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth: node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot" Features: Hide automation markers (navigator.webdriver = false) Realistic User-Agent (iPhone, Android) Random delays to mimic human behavior Screenshot and HTML saving support

4️⃣ YouTube Video Transcripts

Use deep-scraper (install separately): # Install deep-scraper skill npx clawhub install deep-scraper # Use it cd skills/deep-scraper node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"

scripts/playwright-simple.js

Use Case: Regular dynamic websites Speed: Fast (3-5 seconds) Anti-Bot: None Output: JSON (title, content, URL)

scripts/playwright-stealth.js ⭐

Use Case: Sites with Cloudflare or anti-bot protection Speed: Medium (5-20 seconds) Anti-Bot: Medium-High (hides automation, realistic UA) Output: JSON + Screenshot + HTML file Verified: 100% success on Discuss.com.hk

1. Try web_fetch First

If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.

2. Need JavaScript? Use Playwright Simple

If you need to wait for JavaScript rendering, use playwright-simple.js.

3. Getting Blocked? Use Stealth

If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.

4. Special Sites Need Specialized Skills

YouTube → deep-scraper Reddit → reddit-scraper Twitter → bird skill

🔧 Customization

All scripts support environment variables: # Set screenshot path SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL # Set wait time (milliseconds) WAIT_TIME=10000 node scripts/playwright-simple.js URL # Enable headful mode (show browser) HEADLESS=false node scripts/playwright-stealth.js URL # Save HTML SAVE_HTML=true node scripts/playwright-stealth.js URL # Custom User-Agent USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL

📊 Performance Comparison

MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)web_fetch⚡ Fastest❌ None0%Playwright Simple🚀 Fast⚠️ Low20%Playwright Stealth⏱️ Medium✅ Medium100% ✅Puppeteer Stealth⏱️ Medium✅ Medium-High~80%Crawlee (deep-scraper)🐢 Slow❌ Detected0%Chaser (Rust)⏱️ Medium❌ Detected0%

🛡️ Anti-Bot Techniques Summary

Lessons learned from our testing:

✅ Effective Anti-Bot Measures

Hide navigator.webdriver — Essential Realistic User-Agent — Use real devices (iPhone, Android) Mimic Human Behavior — Random delays, scrolling Avoid Framework Signatures — Crawlee, Selenium are easily detected Use addInitScript (Playwright) — Inject before page load

❌ Ineffective Anti-Bot Measures

Only changing User-Agent — Not enough Using high-level frameworks (Crawlee) — More easily detected Docker isolation — Doesn't help with Cloudflare

Issue: 403 Forbidden

Solution: Use playwright-stealth.js

Issue: Cloudflare Challenge Page

Solution: Increase wait time (10-15 seconds) Try headless: false (headful mode sometimes has higher success rate) Consider using proxy IPs

Issue: Blank Page

Solution: Increase waitForTimeout Use waitUntil: 'networkidle' or 'domcontentloaded' Check if login is required

2026-02-07 Discuss.com.hk Test Conclusions

✅ Pure Playwright + Stealth succeeded (5s, 200 OK) ❌ Crawlee (deep-scraper) failed (403) ❌ Chaser (Rust) failed (Cloudflare) ❌ Puppeteer standard failed (403) Best Solution: Pure Playwright + anti-bot techniques (framework-independent)

🚧 Future Improvements

Add proxy IP rotation Implement cookie management (maintain login state) Add CAPTCHA handling (2captcha / Anti-Captcha) Batch scraping (parallel URLs) Integration with OpenClaw's browser tool

📚 References

Playwright Official Docs puppeteer-extra-plugin-stealth deep-scraper skill

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
6 Docs
  • SKILL.md Primary doc
  • CHANGELOG.md Docs
  • CONTRIBUTING.md Docs
  • INSTALL.md Docs
  • README_ZH.md Docs
  • README.md Docs