# Send Crawl4ai to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "crawl4ai",
    "name": "Crawl4ai",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/codylrn804/crawl4ai",
    "canonicalUrl": "https://clawhub.ai/codylrn804/crawl4ai",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/crawl4ai",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=crawl4ai",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "references/api_reference.md",
      "references/error_handling.md",
      "references/examples.md",
      "scripts/extract_from_html.py",
      "scripts/scrape_multiple_pages.py"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "crawl4ai",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-02T04:41:22.627Z",
      "expiresAt": "2026-05-09T04:41:22.627Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=crawl4ai",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=crawl4ai",
        "contentDisposition": "attachment; filename=\"crawl4ai-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "crawl4ai"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/crawl4ai"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/crawl4ai",
    "downloadUrl": "https://openagent3.xyz/downloads/crawl4ai",
    "agentUrl": "https://openagent3.xyz/skills/crawl4ai/agent",
    "manifestUrl": "https://openagent3.xyz/skills/crawl4ai/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/crawl4ai/agent.md"
  }
}
```
## Documentation

### Overview

Crawl4ai is an AI-powered web scraping framework designed to extract structured data from websites efficiently. It combines traditional HTML parsing with AI to handle dynamic content, extract text intelligently, and clean and structure data from complex web pages.

### When to Use This Skill

Use when Codex needs to:

Extract structured data from web pages (products, articles, forms, tables, etc.)
Scrape websites with dynamic content or complex JavaScript
Clean and normalize extracted data from various HTML structures
Work with APIs or web services that return HTML
Handle CORS limitations by scraping directly
Process web content at scale with reliability

Trigger phrases:

"Extract data from this website"
"Scrape this page for [specific data]"
"Parse this HTML"
"Get data from [URL]"
"Extract structured information from [website]"
"Scrape [website] for [data type]"
"Web scrape [URL]"

### Basic Usage

from crawl4ai import AsyncWebCrawler, BrowserMode

async def scrape_page(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            browser_mode=BrowserMode.LATEST,
            headless=True
        )
        return result.markdown, result.clean_html

### Extracting Structured Data

from crawl4ai import AsyncWebCrawler, JsonModeScreener
import json

async def extract_products(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            screenshot=True,
            javascript=True,
            bypass_cache=True
        )
        # Extract product data
        products = []
        for item in result.extracted_content:
            if item['type'] == 'product':
                products.append({
                    'name': item['name'],
                    'price': item['price'],
                    'url': item['url']
                })
        return products

### Web Scraping Basics

Scenario: User wants to scrape a website for all article titles.

from crawl4ai import AsyncWebCrawler

async def scrape_articles(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            javascript=True,
            verbose=True
        )
        # Extract article titles from HTML
        articles = result.extracted_content if result.extracted_content else []
        titles = [item.get('name', item.get('text', '')) for item in articles]
        return titles

Trigger: "Scrape this site for article titles" or "Get all titles from [URL]"

### Dynamic Content Handling

Scenario: Website loads data via JavaScript.

from crawl4ai import AsyncWebCrawler

async def scrape_dynamic_site(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            javascript=True,  # Wait for JS execution
            wait_for="body",   # Wait for specific element
            delay=1.5,         # Wait time after load
            headless=True
        )
        return result.markdown

Trigger: "Scrape this dynamic website" or "This page needs JavaScript to load data"

### Structured Data Extraction

Scenario: Extract specific fields like prices, descriptions, etc.

from crawl4ai import AsyncWebCrawler

async def extract_product_details(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            screenshot=True,
            js_code="""
                const products = document.querySelectorAll('.product');
                return Array.from(products).map(p => ({
                    name: p.querySelector('.name')?.textContent,
                    price: p.querySelector('.price')?.textContent,
                    url: p.querySelector('a')?.href
                }));
            """
        )
        return result.extracted_content

Trigger: "Extract product details from this page" or "Get price and name from [URL]"

### HTML Cleaning and Parsing

Scenario: Clean messy HTML and extract clean text.

from crawl4ai import AsyncWebCrawler

async def clean_and_parse(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            remove_tags=['script', 'style', 'nav', 'footer', 'header'],
            only_main_content=True
        )
        # Clean and return markdown
        clean_text = result.clean_html
        return clean_text

Trigger: "Clean this HTML" or "Extract main content from this page"

### Custom JavaScript Injection

async def custom_scrape(url, custom_js):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            js_code=custom_js,
            js_only=True  # Only execute JS, don't download resources
        )
        return result.extracted_content

### Session Management

from crawl4ai import AsyncWebCrawler

async def multi_page_scrape(base_url, urls):
    async with AsyncWebCrawler() as crawler:
        results = []
        for url in urls:
            result = await crawler.arun(
                url=url,
                session_id=f"session_{url}",
                bypass_cache=True
            )
            results.append({
                'url': url,
                'content': result.markdown,
                'status': result.success
            })
        return results

### Best Practices

Always check if the site allows scraping - Respect robots.txt and terms of service
Use appropriate delays - Add delays between requests to avoid overwhelming servers
Handle errors gracefully - Implement retry logic and error handling
Be selective with data - Extract only what you need, don't dump entire pages
Store data reliably - Save extracted data in structured formats (JSON, CSV)
Clean URLs - Handle redirects and malformed URLs

### Error Handling

async def robust_scrape(url):
    try:
        async with AsyncWebCrawler() as crawler:
            result = await crawler.arun(
                url=url,
                timeout=30000  # 30 seconds timeout
            )
            if result.success:
                return result.markdown, result.extracted_content
            else:
                print(f"Scraping failed: {result.error_message}")
                return None, None
    except Exception as e:
        print(f"Scraping error: {str(e)}")
        return None, None

### Output Formats

Crawl4ai supports multiple output formats:

Markdown: Clean, readable text (result.markdown)
Clean HTML: Structured, cleaned HTML (result.clean_html)
Extracted Content: Structured JSON data (result.extracted_content)
Screenshot: Visual representation (result.screenshot)
Links: All links found on page (result.links)

### scripts/

Python scripts for common crawling operations:

scrape_single_page.py - Basic scraping utility
scrape_multiple_pages.py - Batch scraping with pagination
extract_from_html.py - HTML parsing helper
clean_html.py - HTML cleaning utility

### references/

Documentation and examples:

api_reference.md - Complete API documentation
examples.md - Common use cases and patterns
error_handling.md - Troubleshooting guide
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: codylrn804
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-02T04:41:22.627Z
- Expires at: 2026-05-09T04:41:22.627Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/crawl4ai)
- [Send to Agent page](https://openagent3.xyz/skills/crawl4ai/agent)
- [JSON manifest](https://openagent3.xyz/skills/crawl4ai/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/crawl4ai/agent.md)
- [Download page](https://openagent3.xyz/downloads/crawl4ai)