Tencent SkillHub · Developer Tools

MinerU PDF Extractor

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, SKILL_zh.md, scripts/local_file_step1_apply_upload_url.sh, scripts/local_file_step2_upload_file.sh, scripts/local_file_step3_poll_result.sh, scripts/local_file_step4_download.sh

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.5

Provenance

Publisher: A-I-R
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

MinerU PDF Extractor

Extract PDF documents to structured Markdown using the MinerU API. Supports formula recognition, table extraction, and OCR. Note: This is a community skill, not an official MinerU product. You need to obtain your own API key from MinerU.

📁 Skill Structure

mineru-pdf-extractor/ ├── SKILL.md # English documentation ├── SKILL_zh.md # Chinese documentation ├── docs/ # Documentation │ ├── Local_File_Parsing_Guide.md # Local PDF parsing detailed guide (English) │ ├── Online_URL_Parsing_Guide.md # Online PDF parsing detailed guide (English) │ ├── MinerU_本地文档解析完整流程.md # Local parsing complete guide (Chinese) │ └── MinerU_在线文档解析完整流程.md # Online parsing complete guide (Chinese) └── scripts/ # Executable scripts ├── local_file_step1_apply_upload_url.sh # Local parsing Step 1 ├── local_file_step2_upload_file.sh # Local parsing Step 2 ├── local_file_step3_poll_result.sh # Local parsing Step 3 ├── local_file_step4_download.sh # Local parsing Step 4 ├── online_file_step1_submit_task.sh # Online parsing Step 1 └── online_file_step2_poll_result.sh # Online parsing Step 2

Required Environment Variables

Scripts automatically read MinerU Token from environment variables (choose one): # Option 1: Set MINERU_TOKEN export MINERU_TOKEN="your_api_token_here" # Option 2: Set MINERU_API_KEY export MINERU_API_KEY="your_api_token_here"

Required Command-Line Tools

curl - For HTTP requests (usually pre-installed) unzip - For extracting results (usually pre-installed)

Optional Tools

jq - For enhanced JSON parsing and security (recommended but not required) If not installed, scripts will use fallback methods Install: apt-get install jq (Debian/Ubuntu) or brew install jq (macOS)

Optional Configuration

# Set API base URL (default is pre-configured) export MINERU_BASE_URL="https://mineru.net/api/v4" 💡 Get Token: Visit https://mineru.net/apiManage/docs to register and obtain an API Key

📄 Feature 1: Parse Local PDF Documents

For locally stored PDF files. Requires 4 steps.

Quick Start

cd scripts/ # Step 1: Apply for upload URL ./local_file_step1_apply_upload_url.sh /path/to/your.pdf # Output: BATCH_ID=xxx UPLOAD_URL=xxx # Step 2: Upload file ./local_file_step2_upload_file.sh "$UPLOAD_URL" /path/to/your.pdf # Step 3: Poll for results ./local_file_step3_poll_result.sh "$BATCH_ID" # Output: FULL_ZIP_URL=xxx # Step 4: Download results ./local_file_step4_download.sh "$FULL_ZIP_URL" result.zip extracted/

Script Descriptions

local_file_step1_apply_upload_url.sh Apply for upload URL and batch_id. Usage: ./local_file_step1_apply_upload_url.sh <pdf_file_path> [language] [layout_model] Parameters: language: ch (Chinese), en (English), auto (auto-detect), default ch layout_model: doclayout_yolo (fast), layoutlmv3 (accurate), default doclayout_yolo Output: BATCH_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx UPLOAD_URL=https://mineru.oss-cn-shanghai.aliyuncs.com/... local_file_step2_upload_file.sh Upload PDF file to the presigned URL. Usage: ./local_file_step2_upload_file.sh <upload_url> <pdf_file_path> local_file_step3_poll_result.sh Poll extraction results until completion or failure. Usage: ./local_file_step3_poll_result.sh <batch_id> [max_retries] [retry_interval_seconds] Output: FULL_ZIP_URL=https://cdn-mineru.openxlab.org.cn/pdf/.../xxx.zip local_file_step4_download.sh Download result ZIP and extract. Usage: ./local_file_step4_download.sh <zip_url> [output_zip_filename] [extract_directory_name] Output Structure: extracted/ ├── full.md # 📄 Markdown document (main result) ├── images/ # 🖼️ Extracted images ├── content_list.json # Structured content └── layout.json # Layout analysis data

Detailed Documentation

📚 Complete Guide: See docs/Local_File_Parsing_Guide.md

🌐 Feature 2: Parse Online PDF Documents (URL Method)

For PDF files already available online (e.g., arXiv, websites). Only 2 steps, more concise and efficient.

Quick Start

cd scripts/ # Step 1: Submit parsing task (provide URL directly) ./online_file_step1_submit_task.sh "https://arxiv.org/pdf/2410.17247.pdf" # Output: TASK_ID=xxx # Step 2: Poll results and auto-download/extract ./online_file_step2_poll_result.sh "$TASK_ID" extracted/

Script Descriptions

online_file_step1_submit_task.sh Submit parsing task for online PDF. Usage: ./online_file_step1_submit_task.sh <pdf_url> [language] [layout_model] Parameters: pdf_url: Complete URL of the online PDF (required) language: ch (Chinese), en (English), auto (auto-detect), default ch layout_model: doclayout_yolo (fast), layoutlmv3 (accurate), default doclayout_yolo Output: TASK_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx online_file_step2_poll_result.sh Poll extraction results, automatically download and extract when complete. Usage: ./online_file_step2_poll_result.sh <task_id> [output_directory] [max_retries] [retry_interval_seconds] Output Structure: extracted/ ├── full.md # 📄 Markdown document (main result) ├── images/ # 🖼️ Extracted images ├── content_list.json # Structured content └── layout.json # Layout analysis data

Detailed Documentation

📚 Complete Guide: See docs/Online_URL_Parsing_Guide.md

📊 Comparison of Two Parsing Methods

FeatureLocal PDF ParsingOnline PDF ParsingSteps4 steps2 stepsUpload Required✅ Yes❌ NoAverage Time30-60 seconds10-20 secondsUse CaseLocal filesFiles already online (arXiv, websites, etc.)File Size Limit200MBLimited by source server

Batch Process Local Files

for pdf in /path/to/pdfs/*.pdf; do echo "Processing: $pdf" # Step 1 result=$(./local_file_step1_apply_upload_url.sh "$pdf" 2>&1) batch_id=$(echo "$result" | grep BATCH_ID | cut -d= -f2) upload_url=$(echo "$result" | grep UPLOAD_URL | cut -d= -f2) # Step 2 ./local_file_step2_upload_file.sh "$upload_url" "$pdf" # Step 3 zip_url=$(./local_file_step3_poll_result.sh "$batch_id" | grep FULL_ZIP_URL | cut -d= -f2) # Step 4 filename=$(basename "$pdf" .pdf) ./local_file_step4_download.sh "$zip_url" "${filename}.zip" "${filename}_extracted" done

Batch Process Online Files

for url in \ "https://arxiv.org/pdf/2410.17247.pdf" \ "https://arxiv.org/pdf/2409.12345.pdf"; do echo "Processing: $url" # Step 1 result=$(./online_file_step1_submit_task.sh "$url" 2>&1) task_id=$(echo "$result" | grep TASK_ID | cut -d= -f2) # Step 2 filename=$(basename "$url" .pdf) ./online_file_step2_poll_result.sh "$task_id" "${filename}_extracted" done

⚠️ Notes

Token Configuration: Scripts prioritize MINERU_TOKEN, fall back to MINERU_API_KEY if not found Token Security: Do not hard-code tokens in scripts; use environment variables URL Accessibility: For online parsing, ensure the provided URL is publicly accessible File Limits: Single file recommended not exceeding 200MB, maximum 600 pages Network Stability: Ensure stable network when uploading large files Security: This skill includes input validation and sanitization to prevent JSON injection and directory traversal attacks Optional jq: Installing jq provides enhanced JSON parsing and additional security checks

📚 Reference Documentation

DocumentDescriptiondocs/Local_File_Parsing_Guide.mdDetailed curl commands and parameters for local PDF parsingdocs/Online_URL_Parsing_Guide.mdDetailed curl commands and parameters for online PDF parsing External Resources: 🏠 MinerU Official: https://mineru.net/ 📖 API Documentation: https://mineru.net/apiManage/docs 💻 GitHub Repository: https://github.com/opendatalab/MinerU Skill Version: 1.0.0 Release Date: 2026-02-18 Community Skill - Not affiliated with MinerU official

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

4 Scripts2 Docs

SKILL.md Primary doc
SKILL_zh.md Docs
scripts/local_file_step1_apply_upload_url.sh Scripts
scripts/local_file_step2_upload_file.sh Scripts
scripts/local_file_step3_poll_result.sh Scripts
scripts/local_file_step4_download.sh Scripts

Install for OpenClaw

Requirements

Package facts

Validation

Install with your agent

Trust & source

Release facts

Provenance

Documentation

MinerU PDF Extractor

📁 Skill Structure

Required Environment Variables

Required Command-Line Tools

Optional Tools

Optional Configuration

📄 Feature 1: Parse Local PDF Documents

Quick Start

Script Descriptions

Detailed Documentation

🌐 Feature 2: Parse Online PDF Documents (URL Method)

Quick Start

Script Descriptions

Detailed Documentation

📊 Comparison of Two Parsing Methods

Batch Process Local Files

Batch Process Online Files

⚠️ Notes

📚 Reference Documentation

Package contents

Related skills

API Architect

The Dev Team

Web Platform Engineer