Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.
Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Extract and read text from PDF files using PyMuPDF.
pip install pymupdf
# Extract text (first 10 pages by default) python pdf_reader.py "path/to/file.pdf" 10 # Output to JSON file (for reading) python pdf_reader.py "path/to/file.pdf" 10 --output=extracted.json # Read specific number of pages python pdf_reader.py "path/to/file.pdf" 5
Extracts text from any PDF Supports large files Outputs JSON for AI reading Handles encoding issues Shows metadata (title, author, etc.)
For safety, the script enforces: Input files: Must be .pdf files within the current working directory Output files: Must be .json files within the current working directory No path traversal (../) allowed Files can only be read/written in the directory where the script runs
pdf_reader.py - Main Python script SKILL.md - This documentation
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.