Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
CLI tool that scrapes documents from various sources into local PDF files using browser automation.
npm install -g docs-scraper
Scrape any document URL to PDF: docs-scraper scrape https://example.com/document Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf
Scrape with daemon (recommended, keeps browser warm): docs-scraper scrape <url> Scrape with named profile (for authenticated sites): docs-scraper scrape <url> -p <profile-name> Scrape with pre-filled data (e.g., email for DocSend): docs-scraper scrape <url> -D email=user@example.com Direct mode (single-shot, no daemon): docs-scraper scrape <url> --no-daemon
When a document requires authentication (login, email verification, passcode): Initial scrape returns a job ID: docs-scraper scrape https://docsend.com/view/xxx # Output: Scrape blocked # Job ID: abc123 Retry with data: docs-scraper update abc123 -D email=user@example.com # or with password docs-scraper update abc123 -D email=user@example.com -D password=1234
Profiles store session cookies for authenticated sites. docs-scraper profiles list # List saved profiles docs-scraper profiles clear # Clear all profiles docs-scraper scrape <url> -p myprofile # Use a profile
The daemon keeps browser instances warm for faster scraping. docs-scraper daemon status # Check status docs-scraper daemon start # Start manually docs-scraper daemon stop # Stop daemon Note: Daemon auto-starts when running scrape commands.
PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour. Manual cleanup: docs-scraper cleanup # Delete all PDFs docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hour
docs-scraper jobs list # List blocked jobs awaiting auth
Direct PDF links - Downloads PDF directly Notion pages - Exports Notion page to PDF DocSend documents - Handles DocSend viewer LLM fallback - Uses Claude API for any other webpage
Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.
Handles: URLs ending in .pdf Data fields: None (downloads directly) Example: docs-scraper scrape https://example.com/document.pdf
Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com) URL patterns: Documents: https://docsend.com/view/{id} or https://docsend.com/v/{id} Folders: https://docsend.com/view/s/{id} Subdomains: https://{subdomain}.docsend.com/view/{id} Data fields: FieldTypeDescriptionemailemailEmail address for document accesspasswordpasswordPasscode/password for protected documentsnametextYour name (required for NDA-gated documents) Examples: # Pre-fill email for DocSend docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com # With password protection docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123 # With NDA name requirement docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe" # Retry blocked job docs-scraper update abc123 -D email=user@example.com -D password=secret123 Notes: DocSend may require any combination of email, password, and name Folders are scraped as a table of contents PDF with document links The scraper auto-checks NDA checkboxes when name is provided
Handles: notion.so/*, *.notion.site/* Data fields: FieldTypeDescriptionemailemailNotion account emailpasswordpasswordNotion account password Examples: # Public page (no auth needed) docs-scraper scrape https://notion.so/Public-Page-abc123 # Private page with login docs-scraper scrape https://notion.so/Private-Page-abc123 \ -D email=user@example.com -D password=mypassword # Custom domain docs-scraper scrape https://docs.company.notion.site/Page-abc123 Notes: Public Notion pages don't require authentication Toggle blocks are automatically expanded before PDF generation Uses session profiles to persist login across scrapes
Handles: Any URL not matched by other scrapers (automatic fallback) Data fields: Dynamic - determined by Claude analyzing the page The LLM scraper uses Claude to analyze the page HTML and detect: Login forms (extracts field names dynamically) Cookie banners (auto-dismisses) Expandable content (auto-expands) CAPTCHAs (reports as blocked) Paywalls (reports as blocked) Common dynamic fields: FieldTypeDescriptionemailemailLogin email (if detected)passwordpasswordLogin password (if detected)usernametextUsername (if login uses username) Examples: # Generic webpage (no auth) docs-scraper scrape https://example.com/article # Webpage requiring login docs-scraper scrape https://members.example.com/article \ -D email=user@example.com -D password=secret # When blocked, check the job for required fields docs-scraper jobs list # Then retry with the fields the scraper detected docs-scraper update abc123 -D username=myuser -D password=secret Notes: Requires ANTHROPIC_API_KEY environment variable Field names are extracted from the page's actual form fields Limited to 2 login attempts before failing CAPTCHAs require manual intervention
ScraperemailpasswordnameOtherDirectPdf----DocSendβββ-Notionββ--LLM Fallbackβ*β*-Dynamic* *Fields detected dynamically from page analysis
Only needed for LLM fallback scraper: export ANTHROPIC_API_KEY=your_key Optional browser settings: export BROWSER_HEADLESS=true # Set false for debugging
Archive a Notion page: docs-scraper scrape https://notion.so/My-Page-abc123 Download protected DocSend: docs-scraper scrape https://docsend.com/view/xxx # If blocked: docs-scraper update <job-id> -D email=user@example.com -D password=1234 Batch scraping with profiles: docs-scraper scrape https://site.com/doc1 -p mysite docs-scraper scrape https://site.com/doc2 -p mysite
Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types
Timeout: docs-scraper daemon stop && docs-scraper daemon start Auth fails: docs-scraper jobs list to check pending jobs Disk full: docs-scraper cleanup to remove old PDFs
Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.
Largest current source with strong distribution and engagement signals.