โ† All skills
Tencent SkillHub ยท Developer Tools

Smart Web Scraper

Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a we...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a we...

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, SKILL.md, _meta.json, scripts/scraper.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 9 sections Open source page

Smart Web Scraper

Extract structured data from web pages into clean JSON or CSV.

Quick Start

# Scrape a page, extract all text content uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" # Extract specific elements with CSS selector uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com/products" -s ".product-card" # Auto-detect and extract tables uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing" # Extract all links from a page uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com" # Extract structured data (title, meta, headings, links) uv run --with beautifulsoup4 --with lxml python scripts/scraper.py structure "https://example.com" # Output as JSON uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".item" -f json # Output as CSV uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s "table tr" -f csv # Save to file uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".product" -f json -o products.json # Multi-page scrape (follow pagination) uv run --with beautifulsoup4 --with lxml python scripts/scraper.py crawl "https://example.com/page/1" --pages 5 -s ".article"

Commands

CommandArgsDescriptionextract<url> [-s selector] [-f format] [-o file]Extract content, optionally filtered by CSS selectortables<url> [-f format] [-o file]Auto-detect and extract all HTML tableslinks<url> [--external] [--internal]Extract all links (href + text)structure<url>Extract page structure: title, meta, headings, images, linkscrawl<url> --pages N [-s selector] [-f format] [-o file]Follow pagination links, extract from multiple pages

Output Formats

FormatFlagDescriptionText-f textPlain text (default)JSON-f jsonStructured JSON arrayCSV-f csvComma-separated valuesMarkdown-f mdMarkdown-formatted

Extract product listings

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://shop.example.com" -s ".product" -f json Output: [ {"text": "Widget Pro - $29.99", "tag": "div", "class": "product"}, {"text": "Widget Max - $49.99", "tag": "div", "class": "product"} ]

Extract pricing table

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing" -f csv

Get all external links

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com" --external

Rate Limiting

Default: 1 request per second (respectful crawling) Override with --delay 0.5 (seconds between requests) Respects robots.txt by default (override with --ignore-robots)

Notes

Requires beautifulsoup4 and lxml (auto-installed by uv run --with) Uses a standard browser User-Agent to avoid blocks Handles redirects, encoding detection, and error pages gracefully No JavaScript rendering (use for static HTML pages)

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs1 Scripts1 Config
  • SKILL.md Primary doc
  • README.md Docs
  • scripts/scraper.py Scripts
  • _meta.json Config