โ† All skills
Tencent SkillHub ยท Developer Tools

URL Fetcher

Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, scripts/url_fetcher.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 24 sections Open source page

URL Fetcher

Fetch web content without API keys or external dependencies. Uses Python standard library only.

Quick Start

url_fetcher.py fetch <url> url_fetcher.py fetch --markdown <url> [output_file] Examples: # Fetch and preview url_fetcher.py fetch https://example.com # Fetch and save HTML url_fetcher.py fetch https://example.com ~/workspace/page.html # Fetch and convert to basic markdown url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

Features

No dependencies - Uses Python stdlib (urllib) only No API keys - Completely free to use URL validation - Blocks localhost/internal networks Basic markdown conversion - Extract content from HTML Path validation - Safe file writes only (workspace, home, /tmp) Error handling - Timeout and network error handling

When to Use

Content aggregation - Collect pages for processing Research collection - Save articles/pages locally Simple scraping - Extract text from web pages Markdown conversion - Basic HTML to text/markdown No-API alternatives - When you can't use paid APIs

Limitations

Basic markdown - Simple regex-based conversion (not a full parser) No JavaScript - Only fetches static HTML Rate limiting - No built-in rate limiting (add your own if needed) Bot detection - Some sites may block the default User-Agent

URL Validation

โœ… Allows: http/https URLs โŒ Blocks: file://, data://, javascript: URLs โŒ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

File Path Validation

โœ… Allows: workspace, home directory, /tmp โŒ Blocks: system paths (/etc, /usr, /var, etc.) โŒ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

Error Handling

Timeout after 10 seconds HTTP error handling Network error handling Character encoding handling

Collecting Research

# Fetch multiple articles url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md # Convert to markdown for reading url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md

Content Aggregation

# Fetch pages for processing url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html # Extract text url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md

Quick Preview

# Just preview content (no file save) url_fetcher.py fetch https://example.com

Batch Fetching

#!/bin/bash # batch_fetch.sh URLS=( "https://example.com/page1" "https://example.com/page2" "https://example.com/page3" ) OUTPUT_DIR="$HOME/workspace/fetched" mkdir -p "$OUTPUT_DIR" for url in "${URLS[@]}"; do filename=$(echo $url | sed 's|/||g') url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md" sleep 1 # Be nice to servers done

Integration with Other Skills

Combine with research-assistant: # Fetch article url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md # Extract key points # Then use research-assistant to organize findings Combine with task-runner: # Add task to fetch content task_runner.py add "Fetch article on topic X" "research" # Fetch when ready url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

Connection Timeout

Error: Request timeout after 10s Solution: The server is slow or unreachable. Try again later or check the URL.

HTTP 403/429 Errors

Error: HTTP 403: Forbidden Solution: The site blocks automated requests. Try: Add delay between requests Use a different User-Agent (modify source) Respect robots.txt Consider using an API if available

Encoding Issues

Error with special characters Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.

Markdown Quality

Note: Basic markdown extraction Solution: This tool uses simple regex for HTMLโ†’MD conversion. For better results: Use dedicated markdown parsers Or post-process the output Or use a paid API with better parsing

Best Practices

Be respectful - Add delays between requests (don't hammer servers) Check robots.txt - Respect site's crawling policies Rate limit yourself - Don't fetch too fast Validate URLs - Only fetch from trusted sources Save safely - Always use path-validated outputs Preview first - Use preview mode before saving

Python Integration

from pathlib import Path import subprocess def fetch_and_process(url): """Fetch URL and process""" output = Path.home() / "workspace" / "fetched" / "page.md" output.parent.mkdir(parents=True, exist_ok=True) # Fetch subprocess.run([ "python3", "/path/to/url_fetcher.py", "fetch", "--markdown", url, str(output) ]) # Process content content = output.read_text() return content

Bash Integration

# Function for fetching fetch_content() { local url="$1" local output="$2" python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \ fetch --markdown "$url" "$output" } # Usage fetch_content "https://example.com" ~/workspace/example.md

When You Need More Features

For full-featured scraping: Use requests + beautifulsoup4 (requires pip install) Or use scrapy framework (requires pip install) Or use paid APIs (Firecrawl, Apify) For better markdown: markdownify library (requires pip install) Or use AI-based parsing (OpenAI, Anthropic APIs) For complex workflows: Browser automation (OpenClaw browser tool) Headless Chrome (Puppeteer, Playwright) Or use scraping APIs (Zyte, ScraperAPI)

Zero-Cost Advantage

This skill requires: โœ… Python 3 (included with OpenClaw) โœ… No API keys โœ… No external packages โœ… No paid services โœ… No rate limiting (other than what you add) Perfect for autonomous agents with budget constraints.

Contributing

If you improve this skill, please: Test with security-checker Document new features Publish to ClawHub with credit

License

Use freely in your OpenClaw skills and workflows.

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
1 Docs1 Scripts
  • SKILL.md Primary doc
  • scripts/url_fetcher.py Scripts