# Send Mineru Pdf to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "mineru-pdf",
    "name": "Mineru Pdf",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/Etoile04/mineru-pdf",
    "canonicalUrl": "https://clawhub.ai/Etoile04/mineru-pdf",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/mineru-pdf",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mineru-pdf",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "parse.py",
      "test.sh"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "mineru-pdf",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-05T16:09:12.042Z",
      "expiresAt": "2026-05-12T16:09:12.042Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mineru-pdf",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mineru-pdf",
        "contentDisposition": "attachment; filename=\"mineru-pdf-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "mineru-pdf"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/mineru-pdf"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/mineru-pdf",
    "downloadUrl": "https://openagent3.xyz/downloads/mineru-pdf",
    "agentUrl": "https://openagent3.xyz/skills/mineru-pdf/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mineru-pdf/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mineru-pdf/agent.md"
  }
}
```
## Documentation

### MinerU PDF Parser

Parse PDF documents using MinerU MCP to extract structured content including text, tables, and formulas with MLX acceleration on Apple Silicon.

### Option 1: Install MinerU MCP (for Claude Code)

claude mcp add --transport stdio --scope user mineru -- \\
  uvx --from mcp-mineru python -m mcp_mineru.server

This installs and configures MinerU for all Claude projects. Models are downloaded on first use.

### Option 2: Use Direct Tool (preserves files)

The skill includes a direct parsing tool that saves output to a persistent directory:

python /Users/lwj04/clawd/skills/mineru-pdf/parse.py <pdf_path> <output_dir> [options]

Advantages:

✅ Files are saved permanently (not auto-deleted)
✅ Full control over output location
✅ No MCP overhead
✅ Works with any Python environment that has MinerU

### Method 1: Using the Direct Tool (Recommended)

# Parse entire PDF
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \\
  "/path/to/document.pdf" \\
  "/path/to/output"

# Parse specific pages
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \\
  "/path/to/document.pdf" \\
  "/path/to/output" \\
  --start-page 0 --end-page 2

# Use Apple Silicon optimization
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \\
  "/path/to/document.pdf" \\
  "/path/to/output" \\
  --backend vlm-mlx-engine

# Text only (faster)
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \\
  "/path/to/document.pdf" \\
  "/path/to/output" \\
  --no-table --no-formula

### Parse a PDF document

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'pipeline',
            'formula_enable': True,
            'table_enable': True,
            'start_page': 0,
            'end_page': -1  # -1 for all pages
        }
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(parse_pdf())
"

### Check system capabilities

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def list_backends():
    result = await call_tool(
        name='list_backends',
        arguments={}
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(list_backends())
"

### parse_pdf

Required:

file_path - Absolute path to the PDF file

Optional:

backend - Processing backend (default: pipeline)

pipeline - Fast, general-purpose (recommended)
vlm-mlx-engine - Fastest on Apple Silicon (M1/M2/M3/M4)
vlm-transformers - Slowest but most accurate


formula_enable - Enable formula recognition (default: true)
table_enable - Enable table recognition (default: true)
start_page - Starting page (0-indexed, default: 0)
end_page - Ending page (default: -1 for all pages)

### list_backends

No parameters required. Returns system information and backend recommendations.

### Extract tables from a specific page range

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'pipeline',
            'table_enable': True,
            'start_page': 5,
            'end_page': 10
        }
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(parse_pdf())
"

### Parse with formula recognition only (faster)

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'vlm-mlx-engine',
            'formula_enable': True,
            'table_enable': False  # Disable for speed
        }
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(parse_pdf())
"

### Parse single page (fastest for testing)

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'pipeline',
            'formula_enable': False,
            'table_enable': False,
            'start_page': 0,
            'end_page': 0
        }
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(parse_pdf())
"

### Performance

On Apple Silicon M4 (16GB RAM):

pipeline: ~32s/page, CPU-only, good quality
vlm-mlx-engine: ~38s/page, Apple Silicon optimized, excellent quality
vlm-transformers: ~148s/page, highest quality, slowest

Note: First run downloads models (can take 5-10 minutes). Models are cached in ~/.cache/uv/ for faster subsequent runs.

### Output Format

Returns structured Markdown with:

Document metadata (file, backend, pages, settings)
Extracted text with preserved structure
Tables formatted as Markdown tables
Formulas converted to LaTeX

### Supported Formats

PDF documents (.pdf)
JPEG images (.jpg, .jpeg)
PNG images (.png)
Other image formats (WebP, GIF, etc.)

### Module not found error

If you get "No module named 'mcp_mineru'", make sure you installed it:

claude mcp add --transport stdio --scope user mineru -- \\
  uvx --from mcp-mineru python -m mcp_mineru.server

### Slow processing on first run

This is normal. MinerU downloads ML models on first use. Subsequent runs will be much faster.

### Timeout errors

Increase timeout for large documents or use smaller page ranges for testing.

### Notes

Output is returned as Markdown text
Tables are preserved in Markdown format
Mathematical formulas are converted to LaTeX
Works with scanned documents (OCR built-in)
Optimized for Apple Silicon (M1/M2/M3/M4) with MLX backend

### Why Files Get Deleted (MCP Method)

The MinerU MCP server uses Python's tempfile.TemporaryDirectory(), which automatically deletes files when the context exits. This is by design to prevent temporary files from accumulating.

### How to Preserve Files

Method A: Use the Direct Tool (Recommended)

The skill provides parse.py which saves files to a persistent directory:

python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \\
  /path/to/input.pdf \\
  /path/to/output_dir

Advantages:

✅ Files are never auto-deleted
✅ Full control over output location
✅ Can be used in batch processing
✅ No MCP connection needed

Generated Structure:

/path/to/output_dir/
├── input.pdf_name/
│   └── auto/          # or vlm/ depending on backend
│       ├── input.pdf_name.md
│       └── images/
│           └── *.jpg
└── input.pdf_name_parsed.md  # Copy at root for easy access

Method B: Redirect MCP Output

If using the MCP method, capture the output and save it:

# Capture to file
claude -p "Parse this PDF: /path/to/file.pdf" > /tmp/output.md

# Or use within a script that saves the result

### Comparison

FeatureDirect ToolMCP MethodFiles persisted✅ Yes❌ No (auto-deleted)Custom output dir✅ Yes❌ No (temp only)Claude Code integration⚠️ Manual✅ NativeSpeed✅ Fast⚠️ MCP overheadOffline use✅ Yes⚠️ Needs Claude Code

### Recommendation

Use Direct Tool when you need to keep the files for later use
Use MCP Method when working within Claude Code and only need the text content
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: Etoile04
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-05T16:09:12.042Z
- Expires at: 2026-05-12T16:09:12.042Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/mineru-pdf)
- [Send to Agent page](https://openagent3.xyz/skills/mineru-pdf/agent)
- [JSON manifest](https://openagent3.xyz/skills/mineru-pdf/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/mineru-pdf/agent.md)
- [Download page](https://openagent3.xyz/downloads/mineru-pdf)