Tencent SkillHub · Developer Tools

URL Fetcher

Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, scripts/url_fetcher.py

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.0

Provenance

Publisher: johstracke
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 24 sections Open source page

URL Fetcher

Fetch web content without API keys or external dependencies. Uses Python standard library only.

Quick Start

url_fetcher.py fetch <url> url_fetcher.py fetch --markdown <url> [output_file] Examples: # Fetch and preview url_fetcher.py fetch https://example.com # Fetch and save HTML url_fetcher.py fetch https://example.com ~/workspace/page.html # Fetch and convert to basic markdown url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

Features

No dependencies - Uses Python stdlib (urllib) only No API keys - Completely free to use URL validation - Blocks localhost/internal networks Basic markdown conversion - Extract content from HTML Path validation - Safe file writes only (workspace, home, /tmp) Error handling - Timeout and network error handling

When to Use

Content aggregation - Collect pages for processing Research collection - Save articles/pages locally Simple scraping - Extract text from web pages Markdown conversion - Basic HTML to text/markdown No-API alternatives - When you can't use paid APIs

Limitations

Basic markdown - Simple regex-based conversion (not a full parser) No JavaScript - Only fetches static HTML Rate limiting - No built-in rate limiting (add your own if needed) Bot detection - Some sites may block the default User-Agent

URL Validation

✅ Allows: http/https URLs ❌ Blocks: file://, data://, javascript: URLs ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

File Path Validation

✅ Allows: workspace, home directory, /tmp ❌ Blocks: system paths (/etc, /usr, /var, etc.) ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

Error Handling

Timeout after 10 seconds HTTP error handling Network error handling Character encoding handling

Collecting Research

# Fetch multiple articles url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md # Convert to markdown for reading url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md

Content Aggregation

# Fetch pages for processing url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html # Extract text url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md

Quick Preview

# Just preview content (no file save) url_fetcher.py fetch https://example.com

Batch Fetching

#!/bin/bash # batch_fetch.sh URLS=( "https://example.com/page1" "https://example.com/page2" "https://example.com/page3" ) OUTPUT_DIR="$HOME/workspace/fetched" mkdir -p "$OUTPUT_DIR" for url in "${URLS[@]}"; do filename=$(echo $url | sed 's|/||g') url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md" sleep 1 # Be nice to servers done

Integration with Other Skills

Combine with research-assistant: # Fetch article url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md # Extract key points # Then use research-assistant to organize findings Combine with task-runner: # Add task to fetch content task_runner.py add "Fetch article on topic X" "research" # Fetch when ready url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

Connection Timeout

Error: Request timeout after 10s Solution: The server is slow or unreachable. Try again later or check the URL.

HTTP 403/429 Errors

Error: HTTP 403: Forbidden Solution: The site blocks automated requests. Try: Add delay between requests Use a different User-Agent (modify source) Respect robots.txt Consider using an API if available

Encoding Issues

Error with special characters Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.

Markdown Quality

Note: Basic markdown extraction Solution: This tool uses simple regex for HTML→MD conversion. For better results: Use dedicated markdown parsers Or post-process the output Or use a paid API with better parsing

Best Practices

Be respectful - Add delays between requests (don't hammer servers) Check robots.txt - Respect site's crawling policies Rate limit yourself - Don't fetch too fast Validate URLs - Only fetch from trusted sources Save safely - Always use path-validated outputs Preview first - Use preview mode before saving

Python Integration

from pathlib import Path import subprocess def fetch_and_process(url): """Fetch URL and process""" output = Path.home() / "workspace" / "fetched" / "page.md" output.parent.mkdir(parents=True, exist_ok=True) # Fetch subprocess.run([ "python3", "/path/to/url_fetcher.py", "fetch", "--markdown", url, str(output) ]) # Process content content = output.read_text() return content

Bash Integration

# Function for fetching fetch_content() { local url="$1" local output="$2" python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \ fetch --markdown "$url" "$output" } # Usage fetch_content "https://example.com" ~/workspace/example.md

When You Need More Features

For full-featured scraping: Use requests + beautifulsoup4 (requires pip install) Or use scrapy framework (requires pip install) Or use paid APIs (Firecrawl, Apify) For better markdown: markdownify library (requires pip install) Or use AI-based parsing (OpenAI, Anthropic APIs) For complex workflows: Browser automation (OpenClaw browser tool) Headless Chrome (Puppeteer, Playwright) Or use scraping APIs (Zyte, ScraperAPI)

Zero-Cost Advantage

This skill requires: ✅ Python 3 (included with OpenClaw) ✅ No API keys ✅ No external packages ✅ No paid services ✅ No rate limiting (other than what you add) Perfect for autonomous agents with budget constraints.

Contributing

If you improve this skill, please: Test with security-checker Document new features Publish to ClawHub with credit

License

Use freely in your OpenClaw skills and workflows.

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

1 Docs1 Scripts

SKILL.md Primary doc
scripts/url_fetcher.py Scripts

Install for OpenClaw

Requirements

Package facts

Validation

Install with your agent

Trust & source

Release facts

Provenance

Documentation

URL Fetcher

Quick Start

Features

When to Use

Limitations

URL Validation

File Path Validation

Error Handling

Collecting Research

Content Aggregation

Quick Preview

Batch Fetching

Integration with Other Skills

Connection Timeout

HTTP 403/429 Errors

Encoding Issues

Markdown Quality

Best Practices

Python Integration

Bash Integration

When You Need More Features

Zero-Cost Advantage

Contributing

License

Package contents

Related skills

API Architect

The Dev Team

Web Platform Engineer