# Send OpenClaw Scrapling to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "openclaw-scrapling",
    "name": "OpenClaw Scrapling",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/cryptos3c/openclaw-scrapling",
    "canonicalUrl": "https://clawhub.ai/cryptos3c/openclaw-scrapling",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/openclaw-scrapling",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=openclaw-scrapling",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "INSTALLATION_GUIDE.md",
      "QUICK_START.md",
      "README.md",
      "SKILL.md",
      "examples/README.md",
      "examples/adaptive.py"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "openclaw-scrapling",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-06T12:24:47.303Z",
      "expiresAt": "2026-05-13T12:24:47.303Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=openclaw-scrapling",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=openclaw-scrapling",
        "contentDisposition": "attachment; filename=\"openclaw-scrapling-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "openclaw-scrapling"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/openclaw-scrapling"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/openclaw-scrapling",
    "downloadUrl": "https://openagent3.xyz/downloads/openclaw-scrapling",
    "agentUrl": "https://openagent3.xyz/skills/openclaw-scrapling/agent",
    "manifestUrl": "https://openagent3.xyz/skills/openclaw-scrapling/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/openclaw-scrapling/agent.md"
  }
}
```
## Documentation

### Scrapling Web Scraping Skill

Use Scrapling to scrape modern websites, including those with anti-bot protection, JavaScript-rendered content, and adaptive element tracking.

### When to Use This Skill

User asks to scrape a website or extract data from a URL
Need to bypass Cloudflare, bot detection, or anti-scraping measures
Need to handle JavaScript-rendered/dynamic content (React, Vue, etc.)
Website requires login or session management
Website structure changes frequently (adaptive selectors)
Need to scrape multiple pages with rate limiting

### Commands

All commands use the scrape.py script in this skill's directory.

### Basic HTTP Scraping (Fast)

python scrape.py \\
  --url "https://example.com" \\
  --selector ".product" \\
  --output products.json

Use when: Static HTML, no JavaScript, no bot protection

### Stealth Mode (Bypass Anti-Bot)

python scrape.py \\
  --url "https://nopecha.com/demo/cloudflare" \\
  --stealth \\
  --selector "#content" \\
  --output data.json

Use when: Cloudflare protection, bot detection, fingerprinting

Features:

Bypasses Cloudflare Turnstile automatically
Browser fingerprint spoofing
Headless browser mode

### Dynamic/JavaScript Content

python scrape.py \\
  --url "https://spa-website.com" \\
  --dynamic \\
  --selector ".loaded-content" \\
  --wait-for ".loaded-content" \\
  --output data.json

Use when: React/Vue/Angular apps, lazy-loaded content, AJAX

Features:

Full Playwright browser automation
Wait for elements to load
Network idle detection

### Adaptive Selectors (Survives Website Changes)

# First time - save the selector pattern
python scrape.py \\
  --url "https://example.com" \\
  --selector ".product-card" \\
  --adaptive-save \\
  --output products.json

# Later, if website structure changes
python scrape.py \\
  --url "https://example.com" \\
  --adaptive \\
  --output products.json

Use when: Website frequently redesigns, need robust scraping

How it works:

First run: Saves element patterns/structure
Later runs: Uses similarity algorithms to relocate moved elements
Auto-updates selector cache

### Session Management (Login Required)

# Login and save session
python scrape.py \\
  --url "https://example.com/dashboard" \\
  --stealth \\
  --login \\
  --username "user@example.com" \\
  --password "password123" \\
  --session-name "my-session" \\
  --selector ".protected-data" \\
  --output data.json

# Reuse saved session (no login needed)
python scrape.py \\
  --url "https://example.com/another-page" \\
  --stealth \\
  --session-name "my-session" \\
  --selector ".more-data" \\
  --output more_data.json

Use when: Content requires authentication, multi-step scraping

### Extract Specific Data Types

Text only:

python scrape.py \\
  --url "https://example.com" \\
  --selector ".content" \\
  --extract text \\
  --output content.txt

Markdown:

python scrape.py \\
  --url "https://docs.example.com" \\
  --selector "article" \\
  --extract markdown \\
  --output article.md

Attributes:

# Extract href links
python scrape.py \\
  --url "https://example.com" \\
  --selector "a.product-link" \\
  --extract attr:href \\
  --output links.json

Multiple fields:

python scrape.py \\
  --url "https://example.com/products" \\
  --selector ".product" \\
  --fields "title:.title::text,price:.price::text,link:a::attr(href)" \\
  --output products.json

### Advanced Options

Proxy support:

python scrape.py \\
  --url "https://example.com" \\
  --proxy "http://user:pass@proxy.com:8080" \\
  --selector ".content"

Rate limiting:

python scrape.py \\
  --url "https://example.com" \\
  --selector ".content" \\
  --delay 2  # 2 seconds between requests

Custom headers:

python scrape.py \\
  --url "https://api.example.com" \\
  --headers '{"Authorization": "Bearer token123"}' \\
  --selector "body"

Screenshot (for debugging):

python scrape.py \\
  --url "https://example.com" \\
  --stealth \\
  --screenshot debug.png

### Python API (For Custom Scripts)

You can also use Scrapling directly in Python scripts:

from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher

# Basic HTTP request
page = Fetcher.get('https://example.com')
products = page.css('.product')
for product in products:
    title = product.css('.title::text').get()
    price = product.css('.price::text').get()
    print(f"{title}: {price}")

# Stealth mode (bypass anti-bot)
page = StealthyFetcher.fetch('https://protected-site.com', headless=True)
data = page.css('.content').getall()

# Dynamic content (full browser)
page = DynamicFetcher.fetch('https://spa-app.com', network_idle=True)
items = page.css('.loaded-item').getall()

# Sessions (login)
from scrapling.fetchers import StealthySession

with StealthySession(headless=True) as session:
    # Login
    login_page = session.fetch('https://example.com/login')
    login_page.fill('#username', 'user@example.com')
    login_page.fill('#password', 'password123')
    login_page.click('#submit')
    
    # Access protected content
    protected_page = session.fetch('https://example.com/dashboard')
    data = protected_page.css('.private-data').getall()

### Output Formats

JSON (default): --output data.json
JSONL (streaming): --output data.jsonl
CSV: --output data.csv
TXT (text only): --output data.txt
MD (markdown): --output data.md
HTML (raw): --output data.html

### Selector Types

Scrapling supports multiple selector formats:

CSS selectors:

--selector ".product"
--selector "div.container > p.text"
--selector "a[href*='product']"

XPath selectors:

--selector "//div[@class='product']"
--selector "//a[contains(@href, 'product')]"

Pseudo-elements (like Scrapy):

--selector ".product::text"          # Text content
--selector "a::attr(href)"           # Attribute value
--selector ".price::text::strip"     # Text with whitespace removed

Combined selectors:

--selector ".product .title::text"   # Nested elements

### Troubleshooting

Issue: "Element not found"

Try --dynamic if content is JavaScript-loaded
Use --wait-for SELECTOR to wait for element
Use --screenshot to debug what's visible

Issue: "Cloudflare blocking"

Use --stealth mode
Add --solve-cloudflare flag (enabled by default in stealth)
Try --delay 2 to slow down requests

Issue: "Login not working"

Use --headless false to see browser interaction
Check credentials are correct
Website might use CAPTCHA (manual intervention needed)

Issue: "Selector broke after website update"

Use --adaptive mode to auto-relocate elements
Re-run with --adaptive-save to update saved patterns

### Scrape Hacker News Front Page

python scrape.py \\
  --url "https://news.ycombinator.com" \\
  --selector ".athing" \\
  --fields "title:.titleline>a::text,link:.titleline>a::attr(href)" \\
  --output hn_stories.json

### Scrape Protected Site with Login

python scrape.py \\
  --url "https://example.com/data" \\
  --stealth \\
  --login \\
  --username "user@example.com" \\
  --password "secret" \\
  --session-name "example-session" \\
  --selector ".data-table tr" \\
  --output protected_data.json

### Monitor Price Changes

# Save initial selector pattern
python scrape.py \\
  --url "https://store.com/product/123" \\
  --selector ".price" \\
  --adaptive-save \\
  --output price.txt

# Later, check price (even if page redesigned)
python scrape.py \\
  --url "https://store.com/product/123" \\
  --adaptive \\
  --output price_new.txt

### Scrape Dynamic JavaScript App

python scrape.py \\
  --url "https://react-app.com/data" \\
  --dynamic \\
  --wait-for ".loaded-content" \\
  --selector ".item" \\
  --fields "name:.name::text,value:.value::text" \\
  --output app_data.json

### Notes

First run: Scrapling downloads browsers (~500MB). This is automatic.
Sessions: Saved in sessions/ directory, reusable across runs
Adaptive cache: Saved in selector_cache.json, auto-updated
Rate limiting: Always respect robots.txt and add delays for ethical scraping
Legal: Use only on sites you have permission to scrape

### Dependencies

Installed automatically when skill is installed:

scrapling[all] - Main library with all features
pyyaml - For config file support

### Skill Structure

scrapling/
├── SKILL.md           # This file
├── scrape.py          # Main CLI script
├── requirements.txt   # Python dependencies
├── sessions/          # Saved browser sessions
├── selector_cache.json # Adaptive selector patterns
└── examples/          # Example scripts
    ├── basic.py
    ├── stealth.py
    ├── dynamic.py
    └── adaptive.py

### Advanced: Custom Python Scripts

For complex scraping tasks, you can create custom Python scripts in this directory:

# custom_scraper.py
from scrapling.fetchers import StealthyFetcher
from scrapling.spiders import Spider, Response
import json

class MySpider(Spider):
    name = "custom"
    start_urls = ["https://example.com/page1"]
    
    async def parse(self, response: Response):
        for item in response.css('.product'):
            yield {
                "title": item.css('.title::text').get(),
                "price": item.css('.price::text').get()
            }
        
        # Follow pagination
        next_page = response.css('.next-page::attr(href)').get()
        if next_page:
            yield response.follow(next_page)

# Run spider
result = MySpider().start()
with open('output.json', 'w') as f:
    json.dump(result.items, f, indent=2)

Run with:

python custom_scraper.py

Questions? Check Scrapling docs: https://scrapling.readthedocs.io
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: cryptos3c
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-06T12:24:47.303Z
- Expires at: 2026-05-13T12:24:47.303Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/openclaw-scrapling)
- [Send to Agent page](https://openagent3.xyz/skills/openclaw-scrapling/agent)
- [JSON manifest](https://openagent3.xyz/skills/openclaw-scrapling/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/openclaw-scrapling/agent.md)
- [Download page](https://openagent3.xyz/downloads/openclaw-scrapling)