Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.
Fetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Fetch web content without API keys or external dependencies. Uses Python standard library only.
url_fetcher.py fetch <url> url_fetcher.py fetch --markdown <url> [output_file] Examples: # Fetch and preview url_fetcher.py fetch https://example.com # Fetch and save HTML url_fetcher.py fetch https://example.com ~/workspace/page.html # Fetch and convert to basic markdown url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
No dependencies - Uses Python stdlib (urllib) only No API keys - Completely free to use URL validation - Blocks localhost/internal networks Basic markdown conversion - Extract content from HTML Path validation - Safe file writes only (workspace, home, /tmp) Error handling - Timeout and network error handling
Content aggregation - Collect pages for processing Research collection - Save articles/pages locally Simple scraping - Extract text from web pages Markdown conversion - Basic HTML to text/markdown No-API alternatives - When you can't use paid APIs
Basic markdown - Simple regex-based conversion (not a full parser) No JavaScript - Only fetches static HTML Rate limiting - No built-in rate limiting (add your own if needed) Bot detection - Some sites may block the default User-Agent
โ Allows: http/https URLs โ Blocks: file://, data://, javascript: URLs โ Blocks: localhost, 127.0.0.1, ::1 (internal networks)
โ Allows: workspace, home directory, /tmp โ Blocks: system paths (/etc, /usr, /var, etc.) โ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)
Timeout after 10 seconds HTTP error handling Network error handling Character encoding handling
# Fetch multiple articles url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md # Convert to markdown for reading url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md
# Fetch pages for processing url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html # Extract text url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md
# Just preview content (no file save) url_fetcher.py fetch https://example.com
#!/bin/bash # batch_fetch.sh URLS=( "https://example.com/page1" "https://example.com/page2" "https://example.com/page3" ) OUTPUT_DIR="$HOME/workspace/fetched" mkdir -p "$OUTPUT_DIR" for url in "${URLS[@]}"; do filename=$(echo $url | sed 's|/||g') url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md" sleep 1 # Be nice to servers done
Combine with research-assistant: # Fetch article url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md # Extract key points # Then use research-assistant to organize findings Combine with task-runner: # Add task to fetch content task_runner.py add "Fetch article on topic X" "research" # Fetch when ready url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md
Error: Request timeout after 10s Solution: The server is slow or unreachable. Try again later or check the URL.
Error: HTTP 403: Forbidden Solution: The site blocks automated requests. Try: Add delay between requests Use a different User-Agent (modify source) Respect robots.txt Consider using an API if available
Error with special characters Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.
Note: Basic markdown extraction Solution: This tool uses simple regex for HTMLโMD conversion. For better results: Use dedicated markdown parsers Or post-process the output Or use a paid API with better parsing
Be respectful - Add delays between requests (don't hammer servers) Check robots.txt - Respect site's crawling policies Rate limit yourself - Don't fetch too fast Validate URLs - Only fetch from trusted sources Save safely - Always use path-validated outputs Preview first - Use preview mode before saving
from pathlib import Path import subprocess def fetch_and_process(url): """Fetch URL and process""" output = Path.home() / "workspace" / "fetched" / "page.md" output.parent.mkdir(parents=True, exist_ok=True) # Fetch subprocess.run([ "python3", "/path/to/url_fetcher.py", "fetch", "--markdown", url, str(output) ]) # Process content content = output.read_text() return content
# Function for fetching fetch_content() { local url="$1" local output="$2" python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \ fetch --markdown "$url" "$output" } # Usage fetch_content "https://example.com" ~/workspace/example.md
For full-featured scraping: Use requests + beautifulsoup4 (requires pip install) Or use scrapy framework (requires pip install) Or use paid APIs (Firecrawl, Apify) For better markdown: markdownify library (requires pip install) Or use AI-based parsing (OpenAI, Anthropic APIs) For complex workflows: Browser automation (OpenClaw browser tool) Headless Chrome (Puppeteer, Playwright) Or use scraping APIs (Zyte, ScraperAPI)
This skill requires: โ Python 3 (included with OpenClaw) โ No API keys โ No external packages โ No paid services โ No rate limiting (other than what you add) Perfect for autonomous agents with budget constraints.
If you improve this skill, please: Test with security-checker Document new features Publish to ClawHub with credit
Use freely in your OpenClaw skills and workflows.
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.