# Send PaddleOCR Document Parsing to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "paddleocr-doc-parsing",
    "name": "PaddleOCR Document Parsing",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/Bobholamovic/paddleocr-doc-parsing",
    "canonicalUrl": "https://clawhub.ai/Bobholamovic/paddleocr-doc-parsing",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/paddleocr-doc-parsing",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=paddleocr-doc-parsing",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "scripts/smoke_test.py",
      "scripts/requirements.txt",
      "scripts/lib.py",
      "scripts/split_pdf.py",
      "scripts/optimize_file.py"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "paddleocr-doc-parsing",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-06T20:48:29.069Z",
      "expiresAt": "2026-05-13T20:48:29.069Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=paddleocr-doc-parsing",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=paddleocr-doc-parsing",
        "contentDisposition": "attachment; filename=\"paddleocr-doc-parsing-2.0.16.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "paddleocr-doc-parsing"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/paddleocr-doc-parsing"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/paddleocr-doc-parsing",
    "downloadUrl": "https://openagent3.xyz/downloads/paddleocr-doc-parsing",
    "agentUrl": "https://openagent3.xyz/skills/paddleocr-doc-parsing/agent",
    "manifestUrl": "https://openagent3.xyz/skills/paddleocr-doc-parsing/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/paddleocr-doc-parsing/agent.md"
  }
}
```
## Documentation

### When to Use This Skill

Use Document Parsing for:

Documents with tables (invoices, financial reports, spreadsheets)
Documents with mathematical formulas (academic papers, scientific documents)
Documents with charts and diagrams
Multi-column layouts (newspapers, magazines, brochures)
Complex document structures requiring layout analysis
Any document requiring structured understanding

Use Text Recognition instead for:

Simple text-only extraction
Quick OCR tasks where speed is critical
Screenshots or simple images with clear text

### Installation

Install Python dependencies before using this skill. From the skill directory (skills/paddleocr-doc-parsing):

pip install -r scripts/requirements.txt

Optional — for document optimization and split_pdf.py (page extraction):

pip install -r scripts/requirements-optimize.txt

### How to Use This Skill

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use PaddleOCR Document Parsing API - Execute the script python scripts/vl_caller.py
NEVER parse documents directly - Do NOT parse documents yourself
NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt document parsing any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try parsing it?"
Simply stop and wait for user to fix the configuration

### Basic Workflow

Execute document parsing:
python scripts/vl_caller.py --file-url "URL provided by user" --pretty

Or for local files:
python scripts/vl_caller.py --file-path "file path" --pretty

Optional: explicitly set file type:
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty


--file-type 0: PDF
--file-type 1: image
If omitted, the service can infer file type from input.

Default behavior: save raw JSON to a temp file:

If --output is omitted, the script saves automatically under the system temp directory
Default path pattern: <system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json
If --output is provided, it overrides the default temp-file destination
If --stdout is provided, JSON is printed to stdout and no file is saved
In save mode, the script prints the absolute saved path on stderr: Result saved to: /absolute/path/...
In default/custom save mode, read and parse the saved JSON file before responding
In save mode, always tell the user the saved file path and that full raw JSON is available there
Use --stdout only when you explicitly want to skip file persistence



The output JSON contains COMPLETE content with all document data:

Headers, footers, page numbers
Main text content
Tables with structure
Formulas (with LaTeX)
Figures and charts
Footnotes and references
Seals and stamps
Layout and reading order

Input type note:

Supported file types depend on the model and endpoint configuration.
Always follow the file type constraints documented by your endpoint API.



Extract what the user needs from the output JSON using these fields:

Top-level text
result[n].markdown
result[n].prunedResult

### IMPORTANT: Complete Content Display

CRITICAL: You must display the COMPLETE extracted content to the user based on their needs.

The output JSON contains ALL document content in a structured format
In save mode, the raw provider result can be inspected in the saved JSON file
Display the full content requested by the user, do NOT truncate or summarize
If user asks for "all text", show the entire text field
If user asks for "tables", show ALL tables in the document
If user asks for "main content", filter out headers/footers but show ALL body text

What this means:

DO: Display complete text, all tables, all formulas as requested
DO: Present content using these fields: top-level text, result[n].markdown, and result[n].prunedResult
DON'T: Truncate with "..." unless content is excessively long (>10,000 chars)
DON'T: Summarize or provide excerpts when user asks for full content
DON'T: Say "Here's a preview" when user expects complete output

Example - Correct:

User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:

[Display entire text field or concatenated regions in reading order]

Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)

Example - Incorrect:

User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"

### Understanding the JSON Response

The output JSON uses an envelope wrapping the raw API result:

{
  "ok": true,
  "text": "Full markdown/HTML text extracted from all pages",
  "result": { ... },  // raw provider response
  "error": null
}

Key fields:

text — extracted markdown text from all pages (use this for quick text display)
result - raw provider response object
result[n].prunedResult - structured parsing output for each page (layout/content/confidence and related metadata)
result[n].markdown — full rendered page output in markdown/HTML

Raw result location (default): the temp-file path printed by the script on stderr

### Usage Examples

Example 1: Extract Full Document Text

python scripts/vl_caller.py \\
  --file-url "https://example.com/paper.pdf" \\
  --pretty

Then use:

Top-level text for quick full-text output
result[n].markdown when page-level output is needed

Example 2: Extract Structured Page Data

python scripts/vl_caller.py \\
  --file-path "./financial_report.pdf" \\
  --pretty

Then use:

result[n].prunedResult for structured parsing data (layout/content/confidence)
result[n].markdown for rendered page content

Example 3: Print JSON Without Saving

python scripts/vl_caller.py \\
  --file-url "URL" \\
  --stdout \\
  --pretty

Then return:

Full text when user asks for full document content
result[n].prunedResult and result[n].markdown when user needs complete structured page data

### First-Time Configuration

When API is not configured:

The error will show:

CONFIG_ERROR: PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com

Configuration workflow:

Show the exact error message to the user (including the URL).


Guide the user to configure securely:

Recommend configuring through the host application's standard method (e.g., settings file, environment variable UI) rather than pasting credentials in chat.
List the required environment variables:
- PADDLEOCR_DOC_PARSING_API_URL
- PADDLEOCR_ACCESS_TOKEN
- Optional: PADDLEOCR_DOC_PARSING_TIMEOUT





If the user provides credentials in chat anyway (accept any reasonable format), for example:

PADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...
Here's my API: https://xxx and token: abc123
Copy-pasted code format
Any other reasonable format
Security note: Warn the user that credentials shared in chat may be stored in conversation history. Recommend setting them through the host application's configuration instead when possible.

Then parse and validate the values:

Extract PADDLEOCR_DOC_PARSING_API_URL (look for URLs with paddleocr.com or similar)
Confirm PADDLEOCR_DOC_PARSING_API_URL is a full endpoint ending with /layout-parsing
Extract PADDLEOCR_ACCESS_TOKEN (long alphanumeric string, usually 40+ chars)



Ask the user to confirm the environment is configured.


Retry only after confirmation:

Once the user confirms the environment variables are available, retry the original parsing task

### Handling Large Files

There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.

Tips for large files:

Use URL for Large Local Files (Recommended)

For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:

python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"

Process Specific Pages (PDF Only)

If you only need certain pages from a large PDF, extract them first:

# Extract pages 1-5
python scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"

# Mixed ranges are supported
python scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"

# Then process the smaller file
python scripts/vl_caller.py --file-path "pages_1_5.pdf"

### Error Handling

Authentication failed (403):

error: Authentication failed

→ Token is invalid, reconfigure with correct credentials

API quota exceeded (429):

error: API quota exceeded

→ Daily API quota exhausted, inform user to wait or upgrade

Unsupported format:

error: Unsupported file format

→ File format not supported, convert to PDF/PNG/JPG

### Important Notes

The script NEVER filters content - It always returns complete data
The AI agent decides what to present - Based on user's specific request
All data is always available - Can be re-interpreted for different needs
No information is lost - Complete document structure preserved

### Reference Documentation

references/output_schema.md - Output format specification

Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_DOC_PARSING_API_URL).

Load these reference documents into context when:

Debugging complex parsing issues
Need to understand output format
Working with provider API details

### Testing the Skill

To verify the skill is working properly:

python scripts/smoke_test.py

This tests configuration and optionally API connectivity.
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: Bobholamovic
- Version: 2.0.7
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-06T20:48:29.069Z
- Expires at: 2026-05-13T20:48:29.069Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/paddleocr-doc-parsing)
- [Send to Agent page](https://openagent3.xyz/skills/paddleocr-doc-parsing/agent)
- [JSON manifest](https://openagent3.xyz/skills/paddleocr-doc-parsing/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/paddleocr-doc-parsing/agent.md)
- [Download page](https://openagent3.xyz/downloads/paddleocr-doc-parsing)