{
  "schemaVersion": "1.0",
  "item": {
    "slug": "anycrawl",
    "name": "AnyCrawl-API",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/techlaai/anycrawl",
    "canonicalUrl": "https://clawhub.ai/techlaai/anycrawl",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/anycrawl",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=anycrawl",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "index.js",
      "package.json",
      "README.md",
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-23T16:43:11.935Z",
      "expiresAt": "2026-04-30T16:43:11.935Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
        "contentDisposition": "attachment; filename=\"4claw-imageboard-1.0.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/anycrawl"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/anycrawl",
    "agentPageUrl": "https://openagent3.xyz/skills/anycrawl/agent",
    "manifestUrl": "https://openagent3.xyz/skills/anycrawl/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/anycrawl/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "AnyCrawl Skill",
        "body": "AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling."
      },
      {
        "title": "Method 1: Environment variable (Recommended)",
        "body": "export ANYCRAWL_API_KEY=\"your-api-key\"\n\nMake it permanent by adding to ~/.bashrc or ~/.zshrc:\n\necho 'export ANYCRAWL_API_KEY=\"your-api-key\"' >> ~/.bashrc\nsource ~/.bashrc\n\nGet your API key at: https://anycrawl.dev"
      },
      {
        "title": "Method 2: OpenClaw gateway config",
        "body": "openclaw config.patch --set ANYCRAWL_API_KEY=\"your-api-key\""
      },
      {
        "title": "1. anycrawl_scrape",
        "body": "Scrape a single URL and convert to LLM-ready structured data.\n\nParameters:\n\nurl (string, required): URL to scrape\nengine (string, optional): Scraping engine - \"cheerio\" (default), \"playwright\", \"puppeteer\"\nformats (array, optional): Output formats - [\"markdown\"], [\"html\"], [\"text\"], [\"json\"], [\"screenshot\"]\ntimeout (number, optional): Timeout in milliseconds (default: 30000)\nwait_for (number, optional): Delay before extraction in ms (browser engines only)\nwait_for_selector (string/object/array, optional): Wait for CSS selectors\ninclude_tags (array, optional): Include only these HTML tags (e.g., [\"h1\", \"p\", \"article\"])\nexclude_tags (array, optional): Exclude these HTML tags\nproxy (string, optional): Proxy URL (e.g., \"http://proxy:port\")\njson_options (object, optional): JSON extraction with schema/prompt\nextract_source (string, optional): \"markdown\" (default) or \"html\"\n\nExamples:\n\n// Basic scrape with default cheerio\nanycrawl_scrape({ url: \"https://example.com\" })\n\n// Scrape SPA with Playwright\nanycrawl_scrape({ \n  url: \"https://spa-example.com\",\n  engine: \"playwright\",\n  formats: [\"markdown\", \"screenshot\"]\n})\n\n// Extract structured JSON\nanycrawl_scrape({\n  url: \"https://product-page.com\",\n  engine: \"cheerio\",\n  json_options: {\n    schema: {\n      type: \"object\",\n      properties: {\n        product_name: { type: \"string\" },\n        price: { type: \"number\" },\n        description: { type: \"string\" }\n      },\n      required: [\"product_name\", \"price\"]\n    },\n    user_prompt: \"Extract product details from this page\"\n  }\n})"
      },
      {
        "title": "2. anycrawl_search",
        "body": "Search Google and return structured results.\n\nParameters:\n\nquery (string, required): Search query\nengine (string, optional): Search engine - \"google\" (default)\nlimit (number, optional): Max results per page (default: 10)\noffset (number, optional): Number of results to skip (default: 0)\npages (number, optional): Number of pages to retrieve (default: 1, max: 20)\nlang (string, optional): Language locale (e.g., \"en\", \"zh\", \"vi\")\nsafe_search (number, optional): 0 (off), 1 (medium), 2 (high)\nscrape_options (object, optional): Scrape each result URL with these options\n\nExamples:\n\n// Basic search\nanycrawl_search({ query: \"OpenAI ChatGPT\" })\n\n// Multi-page search in Vietnamese\nanycrawl_search({ \n  query: \"hướng dẫn Node.js\",\n  pages: 3,\n  lang: \"vi\"\n})\n\n// Search and auto-scrape results\nanycrawl_search({\n  query: \"best AI tools 2026\",\n  limit: 5,\n  scrape_options: {\n    engine: \"cheerio\",\n    formats: [\"markdown\"]\n  }\n})"
      },
      {
        "title": "3. anycrawl_crawl_start",
        "body": "Start crawling an entire website (async job).\n\nParameters:\n\nurl (string, required): Seed URL to start crawling\nengine (string, optional): \"cheerio\" (default), \"playwright\", \"puppeteer\"\nstrategy (string, optional): \"all\", \"same-domain\" (default), \"same-hostname\", \"same-origin\"\nmax_depth (number, optional): Max depth from seed URL (default: 10)\nlimit (number, optional): Max pages to crawl (default: 100)\ninclude_paths (array, optional): Path patterns to include (e.g., [\"/blog/*\"])\nexclude_paths (array, optional): Path patterns to exclude (e.g., [\"/admin/*\"])\nscrape_paths (array, optional): Only scrape URLs matching these patterns\nscrape_options (object, optional): Per-page scrape options\n\nExamples:\n\n// Crawl entire website\nanycrawl_crawl_start({ \n  url: \"https://docs.example.com\",\n  engine: \"cheerio\",\n  max_depth: 5,\n  limit: 50\n})\n\n// Crawl only blog posts\nanycrawl_crawl_start({\n  url: \"https://example.com\",\n  strategy: \"same-domain\",\n  include_paths: [\"/blog/*\"],\n  exclude_paths: [\"/blog/tags/*\"],\n  scrape_options: {\n    formats: [\"markdown\"]\n  }\n})\n\n// Crawl product pages only\nanycrawl_crawl_start({\n  url: \"https://shop.example.com\",\n  strategy: \"same-domain\",\n  scrape_paths: [\"/products/*\"],\n  limit: 200\n})"
      },
      {
        "title": "4. anycrawl_crawl_status",
        "body": "Check crawl job status.\n\nParameters:\n\njob_id (string, required): Crawl job ID\n\nExample:\n\nanycrawl_crawl_status({ job_id: \"7a2e165d-8f81-4be6-9ef7-23222330a396\" })"
      },
      {
        "title": "5. anycrawl_crawl_results",
        "body": "Get crawl results (paginated).\n\nParameters:\n\njob_id (string, required): Crawl job ID\nskip (number, optional): Number of results to skip (default: 0)\n\nExample:\n\n// Get first 100 results\nanycrawl_crawl_results({ job_id: \"xxx\", skip: 0 })\n\n// Get next 100 results\nanycrawl_crawl_results({ job_id: \"xxx\", skip: 100 })"
      },
      {
        "title": "6. anycrawl_crawl_cancel",
        "body": "Cancel a running crawl job.\n\nParameters:\n\njob_id (string, required): Crawl job ID"
      },
      {
        "title": "7. anycrawl_search_and_scrape",
        "body": "Quick helper: Search Google then scrape top results.\n\nParameters:\n\nquery (string, required): Search query\nmax_results (number, optional): Max results to scrape (default: 3)\nscrape_engine (string, optional): Engine for scraping (default: \"cheerio\")\nformats (array, optional): Output formats (default: [\"markdown\"])\nlang (string, optional): Search language\n\nExample:\n\nanycrawl_search_and_scrape({\n  query: \"latest AI news\",\n  max_results: 5,\n  formats: [\"markdown\"]\n})"
      },
      {
        "title": "Engine Selection Guide",
        "body": "EngineBest ForSpeedJS RenderingcheerioStatic HTML, news, blogs⚡ Fastest❌ NoplaywrightSPAs, complex web apps🐢 Slower✅ YespuppeteerChrome-specific, metrics🐢 Slower✅ Yes"
      },
      {
        "title": "Response Format",
        "body": "All responses follow this structure:\n\n{\n  \"success\": true,\n  \"data\": { ... },\n  \"message\": \"Optional message\"\n}\n\nError response:\n\n{\n  \"success\": false,\n  \"error\": \"Error type\",\n  \"message\": \"Human-readable message\"\n}"
      },
      {
        "title": "Common Error Codes",
        "body": "400 - Bad Request (validation errors)\n401 - Unauthorized (invalid API key)\n402 - Payment Required (insufficient credits)\n404 - Not Found\n429 - Rate limit exceeded\n500 - Internal server error"
      },
      {
        "title": "API Limits",
        "body": "Rate limits apply based on your plan\nCrawl jobs expire after 24 hours\nMax crawl limit: depends on credits"
      },
      {
        "title": "Links",
        "body": "API Docs: https://docs.anycrawl.dev\nWebsite: https://anycrawl.dev\nPlayground: https://anycrawl.dev/playground"
      }
    ],
    "body": "AnyCrawl Skill\n\nAnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.\n\nSetup\nMethod 1: Environment variable (Recommended)\nexport ANYCRAWL_API_KEY=\"your-api-key\"\n\n\nMake it permanent by adding to ~/.bashrc or ~/.zshrc:\n\necho 'export ANYCRAWL_API_KEY=\"your-api-key\"' >> ~/.bashrc\nsource ~/.bashrc\n\n\nGet your API key at: https://anycrawl.dev\n\nMethod 2: OpenClaw gateway config\nopenclaw config.patch --set ANYCRAWL_API_KEY=\"your-api-key\"\n\nFunctions\n1. anycrawl_scrape\n\nScrape a single URL and convert to LLM-ready structured data.\n\nParameters:\n\nurl (string, required): URL to scrape\nengine (string, optional): Scraping engine - \"cheerio\" (default), \"playwright\", \"puppeteer\"\nformats (array, optional): Output formats - [\"markdown\"], [\"html\"], [\"text\"], [\"json\"], [\"screenshot\"]\ntimeout (number, optional): Timeout in milliseconds (default: 30000)\nwait_for (number, optional): Delay before extraction in ms (browser engines only)\nwait_for_selector (string/object/array, optional): Wait for CSS selectors\ninclude_tags (array, optional): Include only these HTML tags (e.g., [\"h1\", \"p\", \"article\"])\nexclude_tags (array, optional): Exclude these HTML tags\nproxy (string, optional): Proxy URL (e.g., \"http://proxy:port\")\njson_options (object, optional): JSON extraction with schema/prompt\nextract_source (string, optional): \"markdown\" (default) or \"html\"\n\nExamples:\n\n// Basic scrape with default cheerio\nanycrawl_scrape({ url: \"https://example.com\" })\n\n// Scrape SPA with Playwright\nanycrawl_scrape({ \n  url: \"https://spa-example.com\",\n  engine: \"playwright\",\n  formats: [\"markdown\", \"screenshot\"]\n})\n\n// Extract structured JSON\nanycrawl_scrape({\n  url: \"https://product-page.com\",\n  engine: \"cheerio\",\n  json_options: {\n    schema: {\n      type: \"object\",\n      properties: {\n        product_name: { type: \"string\" },\n        price: { type: \"number\" },\n        description: { type: \"string\" }\n      },\n      required: [\"product_name\", \"price\"]\n    },\n    user_prompt: \"Extract product details from this page\"\n  }\n})\n\n2. anycrawl_search\n\nSearch Google and return structured results.\n\nParameters:\n\nquery (string, required): Search query\nengine (string, optional): Search engine - \"google\" (default)\nlimit (number, optional): Max results per page (default: 10)\noffset (number, optional): Number of results to skip (default: 0)\npages (number, optional): Number of pages to retrieve (default: 1, max: 20)\nlang (string, optional): Language locale (e.g., \"en\", \"zh\", \"vi\")\nsafe_search (number, optional): 0 (off), 1 (medium), 2 (high)\nscrape_options (object, optional): Scrape each result URL with these options\n\nExamples:\n\n// Basic search\nanycrawl_search({ query: \"OpenAI ChatGPT\" })\n\n// Multi-page search in Vietnamese\nanycrawl_search({ \n  query: \"hướng dẫn Node.js\",\n  pages: 3,\n  lang: \"vi\"\n})\n\n// Search and auto-scrape results\nanycrawl_search({\n  query: \"best AI tools 2026\",\n  limit: 5,\n  scrape_options: {\n    engine: \"cheerio\",\n    formats: [\"markdown\"]\n  }\n})\n\n3. anycrawl_crawl_start\n\nStart crawling an entire website (async job).\n\nParameters:\n\nurl (string, required): Seed URL to start crawling\nengine (string, optional): \"cheerio\" (default), \"playwright\", \"puppeteer\"\nstrategy (string, optional): \"all\", \"same-domain\" (default), \"same-hostname\", \"same-origin\"\nmax_depth (number, optional): Max depth from seed URL (default: 10)\nlimit (number, optional): Max pages to crawl (default: 100)\ninclude_paths (array, optional): Path patterns to include (e.g., [\"/blog/*\"])\nexclude_paths (array, optional): Path patterns to exclude (e.g., [\"/admin/*\"])\nscrape_paths (array, optional): Only scrape URLs matching these patterns\nscrape_options (object, optional): Per-page scrape options\n\nExamples:\n\n// Crawl entire website\nanycrawl_crawl_start({ \n  url: \"https://docs.example.com\",\n  engine: \"cheerio\",\n  max_depth: 5,\n  limit: 50\n})\n\n// Crawl only blog posts\nanycrawl_crawl_start({\n  url: \"https://example.com\",\n  strategy: \"same-domain\",\n  include_paths: [\"/blog/*\"],\n  exclude_paths: [\"/blog/tags/*\"],\n  scrape_options: {\n    formats: [\"markdown\"]\n  }\n})\n\n// Crawl product pages only\nanycrawl_crawl_start({\n  url: \"https://shop.example.com\",\n  strategy: \"same-domain\",\n  scrape_paths: [\"/products/*\"],\n  limit: 200\n})\n\n4. anycrawl_crawl_status\n\nCheck crawl job status.\n\nParameters:\n\njob_id (string, required): Crawl job ID\n\nExample:\n\nanycrawl_crawl_status({ job_id: \"7a2e165d-8f81-4be6-9ef7-23222330a396\" })\n\n5. anycrawl_crawl_results\n\nGet crawl results (paginated).\n\nParameters:\n\njob_id (string, required): Crawl job ID\nskip (number, optional): Number of results to skip (default: 0)\n\nExample:\n\n// Get first 100 results\nanycrawl_crawl_results({ job_id: \"xxx\", skip: 0 })\n\n// Get next 100 results\nanycrawl_crawl_results({ job_id: \"xxx\", skip: 100 })\n\n6. anycrawl_crawl_cancel\n\nCancel a running crawl job.\n\nParameters:\n\njob_id (string, required): Crawl job ID\n7. anycrawl_search_and_scrape\n\nQuick helper: Search Google then scrape top results.\n\nParameters:\n\nquery (string, required): Search query\nmax_results (number, optional): Max results to scrape (default: 3)\nscrape_engine (string, optional): Engine for scraping (default: \"cheerio\")\nformats (array, optional): Output formats (default: [\"markdown\"])\nlang (string, optional): Search language\n\nExample:\n\nanycrawl_search_and_scrape({\n  query: \"latest AI news\",\n  max_results: 5,\n  formats: [\"markdown\"]\n})\n\nEngine Selection Guide\nEngine\tBest For\tSpeed\tJS Rendering\ncheerio\tStatic HTML, news, blogs\t⚡ Fastest\t❌ No\nplaywright\tSPAs, complex web apps\t🐢 Slower\t✅ Yes\npuppeteer\tChrome-specific, metrics\t🐢 Slower\t✅ Yes\nResponse Format\n\nAll responses follow this structure:\n\n{\n  \"success\": true,\n  \"data\": { ... },\n  \"message\": \"Optional message\"\n}\n\n\nError response:\n\n{\n  \"success\": false,\n  \"error\": \"Error type\",\n  \"message\": \"Human-readable message\"\n}\n\nCommon Error Codes\n400 - Bad Request (validation errors)\n401 - Unauthorized (invalid API key)\n402 - Payment Required (insufficient credits)\n404 - Not Found\n429 - Rate limit exceeded\n500 - Internal server error\nAPI Limits\nRate limits apply based on your plan\nCrawl jobs expire after 24 hours\nMax crawl limit: depends on credits\nLinks\nAPI Docs: https://docs.anycrawl.dev\nWebsite: https://anycrawl.dev\nPlayground: https://anycrawl.dev/playground"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/techlaai/anycrawl",
    "publisherUrl": "https://clawhub.ai/techlaai/anycrawl",
    "owner": "techlaai",
    "version": "1.0.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/anycrawl",
    "downloadUrl": "https://openagent3.xyz/downloads/anycrawl",
    "agentUrl": "https://openagent3.xyz/skills/anycrawl/agent",
    "manifestUrl": "https://openagent3.xyz/skills/anycrawl/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/anycrawl/agent.md"
  }
}