Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.
export ANYCRAWL_API_KEY="your-api-key" Make it permanent by adding to ~/.bashrc or ~/.zshrc: echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc Get your API key at: https://anycrawl.dev
openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"
Scrape a single URL and convert to LLM-ready structured data. Parameters: url (string, required): URL to scrape engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer" formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"] timeout (number, optional): Timeout in milliseconds (default: 30000) wait_for (number, optional): Delay before extraction in ms (browser engines only) wait_for_selector (string/object/array, optional): Wait for CSS selectors include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"]) exclude_tags (array, optional): Exclude these HTML tags proxy (string, optional): Proxy URL (e.g., "http://proxy:port") json_options (object, optional): JSON extraction with schema/prompt extract_source (string, optional): "markdown" (default) or "html" Examples: // Basic scrape with default cheerio anycrawl_scrape({ url: "https://example.com" }) // Scrape SPA with Playwright anycrawl_scrape({ url: "https://spa-example.com", engine: "playwright", formats: ["markdown", "screenshot"] }) // Extract structured JSON anycrawl_scrape({ url: "https://product-page.com", engine: "cheerio", json_options: { schema: { type: "object", properties: { product_name: { type: "string" }, price: { type: "number" }, description: { type: "string" } }, required: ["product_name", "price"] }, user_prompt: "Extract product details from this page" } })
Search Google and return structured results. Parameters: query (string, required): Search query engine (string, optional): Search engine - "google" (default) limit (number, optional): Max results per page (default: 10) offset (number, optional): Number of results to skip (default: 0) pages (number, optional): Number of pages to retrieve (default: 1, max: 20) lang (string, optional): Language locale (e.g., "en", "zh", "vi") safe_search (number, optional): 0 (off), 1 (medium), 2 (high) scrape_options (object, optional): Scrape each result URL with these options Examples: // Basic search anycrawl_search({ query: "OpenAI ChatGPT" }) // Multi-page search in Vietnamese anycrawl_search({ query: "hướng dẫn Node.js", pages: 3, lang: "vi" }) // Search and auto-scrape results anycrawl_search({ query: "best AI tools 2026", limit: 5, scrape_options: { engine: "cheerio", formats: ["markdown"] } })
Start crawling an entire website (async job). Parameters: url (string, required): Seed URL to start crawling engine (string, optional): "cheerio" (default), "playwright", "puppeteer" strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin" max_depth (number, optional): Max depth from seed URL (default: 10) limit (number, optional): Max pages to crawl (default: 100) include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"]) exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"]) scrape_paths (array, optional): Only scrape URLs matching these patterns scrape_options (object, optional): Per-page scrape options Examples: // Crawl entire website anycrawl_crawl_start({ url: "https://docs.example.com", engine: "cheerio", max_depth: 5, limit: 50 }) // Crawl only blog posts anycrawl_crawl_start({ url: "https://example.com", strategy: "same-domain", include_paths: ["/blog/*"], exclude_paths: ["/blog/tags/*"], scrape_options: { formats: ["markdown"] } }) // Crawl product pages only anycrawl_crawl_start({ url: "https://shop.example.com", strategy: "same-domain", scrape_paths: ["/products/*"], limit: 200 })
Check crawl job status. Parameters: job_id (string, required): Crawl job ID Example: anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })
Get crawl results (paginated). Parameters: job_id (string, required): Crawl job ID skip (number, optional): Number of results to skip (default: 0) Example: // Get first 100 results anycrawl_crawl_results({ job_id: "xxx", skip: 0 }) // Get next 100 results anycrawl_crawl_results({ job_id: "xxx", skip: 100 })
Cancel a running crawl job. Parameters: job_id (string, required): Crawl job ID
Quick helper: Search Google then scrape top results. Parameters: query (string, required): Search query max_results (number, optional): Max results to scrape (default: 3) scrape_engine (string, optional): Engine for scraping (default: "cheerio") formats (array, optional): Output formats (default: ["markdown"]) lang (string, optional): Search language Example: anycrawl_search_and_scrape({ query: "latest AI news", max_results: 5, formats: ["markdown"] })
EngineBest ForSpeedJS RenderingcheerioStatic HTML, news, blogs⚡ Fastest❌ NoplaywrightSPAs, complex web apps🐢 Slower✅ YespuppeteerChrome-specific, metrics🐢 Slower✅ Yes
All responses follow this structure: { "success": true, "data": { ... }, "message": "Optional message" } Error response: { "success": false, "error": "Error type", "message": "Human-readable message" }
400 - Bad Request (validation errors) 401 - Unauthorized (invalid API key) 402 - Payment Required (insufficient credits) 404 - Not Found 429 - Rate limit exceeded 500 - Internal server error
Rate limits apply based on your plan Crawl jobs expire after 24 hours Max crawl limit: depends on credits
API Docs: https://docs.anycrawl.dev Website: https://anycrawl.dev Playground: https://anycrawl.dev/playground
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.