{
  "schemaVersion": "1.0",
  "item": {
    "slug": "url-fetcher",
    "name": "URL Fetcher",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/johstracke/url-fetcher",
    "canonicalUrl": "https://clawhub.ai/johstracke/url-fetcher",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/url-fetcher",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=url-fetcher",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md",
      "scripts/url_fetcher.py"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-07T17:22:31.273Z",
      "expiresAt": "2026-05-14T17:22:31.273Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
        "contentDisposition": "attachment; filename=\"afrexai-annual-report-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/url-fetcher"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/url-fetcher",
    "agentPageUrl": "https://openagent3.xyz/skills/url-fetcher/agent",
    "manifestUrl": "https://openagent3.xyz/skills/url-fetcher/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/url-fetcher/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "URL Fetcher",
        "body": "Fetch web content without API keys or external dependencies. Uses Python standard library only."
      },
      {
        "title": "Quick Start",
        "body": "url_fetcher.py fetch <url>\nurl_fetcher.py fetch --markdown <url> [output_file]\n\nExamples:\n\n# Fetch and preview\nurl_fetcher.py fetch https://example.com\n\n# Fetch and save HTML\nurl_fetcher.py fetch https://example.com ~/workspace/page.html\n\n# Fetch and convert to basic markdown\nurl_fetcher.py fetch --markdown https://example.com ~/workspace/page.md"
      },
      {
        "title": "Features",
        "body": "No dependencies - Uses Python stdlib (urllib) only\nNo API keys - Completely free to use\nURL validation - Blocks localhost/internal networks\nBasic markdown conversion - Extract content from HTML\nPath validation - Safe file writes only (workspace, home, /tmp)\nError handling - Timeout and network error handling"
      },
      {
        "title": "When to Use",
        "body": "Content aggregation - Collect pages for processing\nResearch collection - Save articles/pages locally\nSimple scraping - Extract text from web pages\nMarkdown conversion - Basic HTML to text/markdown\nNo-API alternatives - When you can't use paid APIs"
      },
      {
        "title": "Limitations",
        "body": "Basic markdown - Simple regex-based conversion (not a full parser)\nNo JavaScript - Only fetches static HTML\nRate limiting - No built-in rate limiting (add your own if needed)\nBot detection - Some sites may block the default User-Agent"
      },
      {
        "title": "URL Validation",
        "body": "✅ Allows: http/https URLs\n❌ Blocks: file://, data://, javascript: URLs\n❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)"
      },
      {
        "title": "File Path Validation",
        "body": "✅ Allows: workspace, home directory, /tmp\n❌ Blocks: system paths (/etc, /usr, /var, etc.)\n❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)"
      },
      {
        "title": "Error Handling",
        "body": "Timeout after 10 seconds\nHTTP error handling\nNetwork error handling\nCharacter encoding handling"
      },
      {
        "title": "Collecting Research",
        "body": "# Fetch multiple articles\nurl_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md\nurl_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md\n\n# Convert to markdown for reading\nurl_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md"
      },
      {
        "title": "Content Aggregation",
        "body": "# Fetch pages for processing\nurl_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html\n\n# Extract text\nurl_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md"
      },
      {
        "title": "Quick Preview",
        "body": "# Just preview content (no file save)\nurl_fetcher.py fetch https://example.com"
      },
      {
        "title": "Batch Fetching",
        "body": "#!/bin/bash\n# batch_fetch.sh\n\nURLS=(\n    \"https://example.com/page1\"\n    \"https://example.com/page2\"\n    \"https://example.com/page3\"\n)\n\nOUTPUT_DIR=\"$HOME/workspace/fetched\"\nmkdir -p \"$OUTPUT_DIR\"\n\nfor url in \"${URLS[@]}\"; do\n    filename=$(echo $url | sed 's|/||g')\n    url_fetcher.py fetch --markdown \"$url\" \"$OUTPUT_DIR/$filename.md\"\n    sleep 1  # Be nice to servers\ndone"
      },
      {
        "title": "Integration with Other Skills",
        "body": "Combine with research-assistant:\n\n# Fetch article\nurl_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md\n\n# Extract key points\n# Then use research-assistant to organize findings\n\nCombine with task-runner:\n\n# Add task to fetch content\ntask_runner.py add \"Fetch article on topic X\" \"research\"\n\n# Fetch when ready\nurl_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md"
      },
      {
        "title": "Connection Timeout",
        "body": "Error: Request timeout after 10s\n\nSolution: The server is slow or unreachable. Try again later or check the URL."
      },
      {
        "title": "HTTP 403/429 Errors",
        "body": "Error: HTTP 403: Forbidden\n\nSolution: The site blocks automated requests. Try:\n\nAdd delay between requests\nUse a different User-Agent (modify source)\nRespect robots.txt\nConsider using an API if available"
      },
      {
        "title": "Encoding Issues",
        "body": "Error with special characters\n\nSolution: The tool uses UTF-8 with error-ignore. Some characters may be lost."
      },
      {
        "title": "Markdown Quality",
        "body": "Note: Basic markdown extraction\n\nSolution: This tool uses simple regex for HTML→MD conversion. For better results:\n\nUse dedicated markdown parsers\nOr post-process the output\nOr use a paid API with better parsing"
      },
      {
        "title": "Best Practices",
        "body": "Be respectful - Add delays between requests (don't hammer servers)\nCheck robots.txt - Respect site's crawling policies\nRate limit yourself - Don't fetch too fast\nValidate URLs - Only fetch from trusted sources\nSave safely - Always use path-validated outputs\nPreview first - Use preview mode before saving"
      },
      {
        "title": "Python Integration",
        "body": "from pathlib import Path\nimport subprocess\n\ndef fetch_and_process(url):\n    \"\"\"Fetch URL and process\"\"\"\n    output = Path.home() / \"workspace\" / \"fetched\" / \"page.md\"\n    output.parent.mkdir(parents=True, exist_ok=True)\n    \n    # Fetch\n    subprocess.run([\n        \"python3\",\n        \"/path/to/url_fetcher.py\",\n        \"fetch\",\n        \"--markdown\",\n        url,\n        str(output)\n    ])\n    \n    # Process content\n    content = output.read_text()\n    return content"
      },
      {
        "title": "Bash Integration",
        "body": "# Function for fetching\nfetch_content() {\n    local url=\"$1\"\n    local output=\"$2\"\n    python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \\\n        fetch --markdown \"$url\" \"$output\"\n}\n\n# Usage\nfetch_content \"https://example.com\" ~/workspace/example.md"
      },
      {
        "title": "When You Need More Features",
        "body": "For full-featured scraping:\n\nUse requests + beautifulsoup4 (requires pip install)\nOr use scrapy framework (requires pip install)\nOr use paid APIs (Firecrawl, Apify)\n\nFor better markdown:\n\nmarkdownify library (requires pip install)\nOr use AI-based parsing (OpenAI, Anthropic APIs)\n\nFor complex workflows:\n\nBrowser automation (OpenClaw browser tool)\nHeadless Chrome (Puppeteer, Playwright)\nOr use scraping APIs (Zyte, ScraperAPI)"
      },
      {
        "title": "Zero-Cost Advantage",
        "body": "This skill requires:\n\n✅ Python 3 (included with OpenClaw)\n✅ No API keys\n✅ No external packages\n✅ No paid services\n✅ No rate limiting (other than what you add)\n\nPerfect for autonomous agents with budget constraints."
      },
      {
        "title": "Contributing",
        "body": "If you improve this skill, please:\n\nTest with security-checker\nDocument new features\nPublish to ClawHub with credit"
      },
      {
        "title": "License",
        "body": "Use freely in your OpenClaw skills and workflows."
      }
    ],
    "body": "URL Fetcher\n\nFetch web content without API keys or external dependencies. Uses Python standard library only.\n\nQuick Start\nurl_fetcher.py fetch <url>\nurl_fetcher.py fetch --markdown <url> [output_file]\n\n\nExamples:\n\n# Fetch and preview\nurl_fetcher.py fetch https://example.com\n\n# Fetch and save HTML\nurl_fetcher.py fetch https://example.com ~/workspace/page.html\n\n# Fetch and convert to basic markdown\nurl_fetcher.py fetch --markdown https://example.com ~/workspace/page.md\n\nFeatures\nNo dependencies - Uses Python stdlib (urllib) only\nNo API keys - Completely free to use\nURL validation - Blocks localhost/internal networks\nBasic markdown conversion - Extract content from HTML\nPath validation - Safe file writes only (workspace, home, /tmp)\nError handling - Timeout and network error handling\nWhen to Use\nContent aggregation - Collect pages for processing\nResearch collection - Save articles/pages locally\nSimple scraping - Extract text from web pages\nMarkdown conversion - Basic HTML to text/markdown\nNo-API alternatives - When you can't use paid APIs\nLimitations\nBasic markdown - Simple regex-based conversion (not a full parser)\nNo JavaScript - Only fetches static HTML\nRate limiting - No built-in rate limiting (add your own if needed)\nBot detection - Some sites may block the default User-Agent\nSecurity Features\nURL Validation\n✅ Allows: http/https URLs\n❌ Blocks: file://, data://, javascript: URLs\n❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)\nFile Path Validation\n✅ Allows: workspace, home directory, /tmp\n❌ Blocks: system paths (/etc, /usr, /var, etc.)\n❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)\nError Handling\nTimeout after 10 seconds\nHTTP error handling\nNetwork error handling\nCharacter encoding handling\nUsage Patterns\nCollecting Research\n# Fetch multiple articles\nurl_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md\nurl_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md\n\n# Convert to markdown for reading\nurl_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md\n\nContent Aggregation\n# Fetch pages for processing\nurl_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html\n\n# Extract text\nurl_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md\n\nQuick Preview\n# Just preview content (no file save)\nurl_fetcher.py fetch https://example.com\n\nAdvanced Usage\nBatch Fetching\n#!/bin/bash\n# batch_fetch.sh\n\nURLS=(\n    \"https://example.com/page1\"\n    \"https://example.com/page2\"\n    \"https://example.com/page3\"\n)\n\nOUTPUT_DIR=\"$HOME/workspace/fetched\"\nmkdir -p \"$OUTPUT_DIR\"\n\nfor url in \"${URLS[@]}\"; do\n    filename=$(echo $url | sed 's|/||g')\n    url_fetcher.py fetch --markdown \"$url\" \"$OUTPUT_DIR/$filename.md\"\n    sleep 1  # Be nice to servers\ndone\n\nIntegration with Other Skills\n\nCombine with research-assistant:\n\n# Fetch article\nurl_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md\n\n# Extract key points\n# Then use research-assistant to organize findings\n\n\nCombine with task-runner:\n\n# Add task to fetch content\ntask_runner.py add \"Fetch article on topic X\" \"research\"\n\n# Fetch when ready\nurl_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md\n\nTroubleshooting\nConnection Timeout\nError: Request timeout after 10s\n\n\nSolution: The server is slow or unreachable. Try again later or check the URL.\n\nHTTP 403/429 Errors\nError: HTTP 403: Forbidden\n\n\nSolution: The site blocks automated requests. Try:\n\nAdd delay between requests\nUse a different User-Agent (modify source)\nRespect robots.txt\nConsider using an API if available\nEncoding Issues\nError with special characters\n\n\nSolution: The tool uses UTF-8 with error-ignore. Some characters may be lost.\n\nMarkdown Quality\nNote: Basic markdown extraction\n\n\nSolution: This tool uses simple regex for HTML→MD conversion. For better results:\n\nUse dedicated markdown parsers\nOr post-process the output\nOr use a paid API with better parsing\nBest Practices\nBe respectful - Add delays between requests (don't hammer servers)\nCheck robots.txt - Respect site's crawling policies\nRate limit yourself - Don't fetch too fast\nValidate URLs - Only fetch from trusted sources\nSave safely - Always use path-validated outputs\nPreview first - Use preview mode before saving\nIntegration Examples\nPython Integration\nfrom pathlib import Path\nimport subprocess\n\ndef fetch_and_process(url):\n    \"\"\"Fetch URL and process\"\"\"\n    output = Path.home() / \"workspace\" / \"fetched\" / \"page.md\"\n    output.parent.mkdir(parents=True, exist_ok=True)\n    \n    # Fetch\n    subprocess.run([\n        \"python3\",\n        \"/path/to/url_fetcher.py\",\n        \"fetch\",\n        \"--markdown\",\n        url,\n        str(output)\n    ])\n    \n    # Process content\n    content = output.read_text()\n    return content\n\nBash Integration\n# Function for fetching\nfetch_content() {\n    local url=\"$1\"\n    local output=\"$2\"\n    python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \\\n        fetch --markdown \"$url\" \"$output\"\n}\n\n# Usage\nfetch_content \"https://example.com\" ~/workspace/example.md\n\nAlternatives\nWhen You Need More Features\n\nFor full-featured scraping:\n\nUse requests + beautifulsoup4 (requires pip install)\nOr use scrapy framework (requires pip install)\nOr use paid APIs (Firecrawl, Apify)\n\nFor better markdown:\n\nmarkdownify library (requires pip install)\nOr use AI-based parsing (OpenAI, Anthropic APIs)\n\nFor complex workflows:\n\nBrowser automation (OpenClaw browser tool)\nHeadless Chrome (Puppeteer, Playwright)\nOr use scraping APIs (Zyte, ScraperAPI)\nZero-Cost Advantage\n\nThis skill requires:\n\n✅ Python 3 (included with OpenClaw)\n✅ No API keys\n✅ No external packages\n✅ No paid services\n✅ No rate limiting (other than what you add)\n\nPerfect for autonomous agents with budget constraints.\n\nContributing\n\nIf you improve this skill, please:\n\nTest with security-checker\nDocument new features\nPublish to ClawHub with credit\nLicense\n\nUse freely in your OpenClaw skills and workflows."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/johstracke/url-fetcher",
    "publisherUrl": "https://clawhub.ai/johstracke/url-fetcher",
    "owner": "johstracke",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/url-fetcher",
    "downloadUrl": "https://openagent3.xyz/downloads/url-fetcher",
    "agentUrl": "https://openagent3.xyz/skills/url-fetcher/agent",
    "manifestUrl": "https://openagent3.xyz/skills/url-fetcher/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/url-fetcher/agent.md"
  }
}