← All skills
Tencent SkillHub Β· Productivity

Links to PDFs

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
0.0.1

Documentation

ClawHub primary doc Primary doc: SKILL.md 20 sections Open source page

docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

Installation

npm install -g docs-scraper

Quick start

Scrape any document URL to PDF: docs-scraper scrape https://example.com/document Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf

Basic scraping

Scrape with daemon (recommended, keeps browser warm): docs-scraper scrape <url> Scrape with named profile (for authenticated sites): docs-scraper scrape <url> -p <profile-name> Scrape with pre-filled data (e.g., email for DocSend): docs-scraper scrape <url> -D email=user@example.com Direct mode (single-shot, no daemon): docs-scraper scrape <url> --no-daemon

Authentication workflow

When a document requires authentication (login, email verification, passcode): Initial scrape returns a job ID: docs-scraper scrape https://docsend.com/view/xxx # Output: Scrape blocked # Job ID: abc123 Retry with data: docs-scraper update abc123 -D email=user@example.com # or with password docs-scraper update abc123 -D email=user@example.com -D password=1234

Profile management

Profiles store session cookies for authenticated sites. docs-scraper profiles list # List saved profiles docs-scraper profiles clear # Clear all profiles docs-scraper scrape <url> -p myprofile # Use a profile

Daemon management

The daemon keeps browser instances warm for faster scraping. docs-scraper daemon status # Check status docs-scraper daemon start # Start manually docs-scraper daemon stop # Stop daemon Note: Daemon auto-starts when running scrape commands.

Cleanup

PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour. Manual cleanup: docs-scraper cleanup # Delete all PDFs docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hour

Job management

docs-scraper jobs list # List blocked jobs awaiting auth

Supported sources

Direct PDF links - Downloads PDF directly Notion pages - Exports Notion page to PDF DocSend documents - Handles DocSend viewer LLM fallback - Uses Claude API for any other webpage

Scraper Reference

Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.

DirectPdfScraper

Handles: URLs ending in .pdf Data fields: None (downloads directly) Example: docs-scraper scrape https://example.com/document.pdf

DocsendScraper

Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com) URL patterns: Documents: https://docsend.com/view/{id} or https://docsend.com/v/{id} Folders: https://docsend.com/view/s/{id} Subdomains: https://{subdomain}.docsend.com/view/{id} Data fields: FieldTypeDescriptionemailemailEmail address for document accesspasswordpasswordPasscode/password for protected documentsnametextYour name (required for NDA-gated documents) Examples: # Pre-fill email for DocSend docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com # With password protection docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123 # With NDA name requirement docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe" # Retry blocked job docs-scraper update abc123 -D email=user@example.com -D password=secret123 Notes: DocSend may require any combination of email, password, and name Folders are scraped as a table of contents PDF with document links The scraper auto-checks NDA checkboxes when name is provided

NotionScraper

Handles: notion.so/*, *.notion.site/* Data fields: FieldTypeDescriptionemailemailNotion account emailpasswordpasswordNotion account password Examples: # Public page (no auth needed) docs-scraper scrape https://notion.so/Public-Page-abc123 # Private page with login docs-scraper scrape https://notion.so/Private-Page-abc123 \ -D email=user@example.com -D password=mypassword # Custom domain docs-scraper scrape https://docs.company.notion.site/Page-abc123 Notes: Public Notion pages don't require authentication Toggle blocks are automatically expanded before PDF generation Uses session profiles to persist login across scrapes

LlmFallbackScraper

Handles: Any URL not matched by other scrapers (automatic fallback) Data fields: Dynamic - determined by Claude analyzing the page The LLM scraper uses Claude to analyze the page HTML and detect: Login forms (extracts field names dynamically) Cookie banners (auto-dismisses) Expandable content (auto-expands) CAPTCHAs (reports as blocked) Paywalls (reports as blocked) Common dynamic fields: FieldTypeDescriptionemailemailLogin email (if detected)passwordpasswordLogin password (if detected)usernametextUsername (if login uses username) Examples: # Generic webpage (no auth) docs-scraper scrape https://example.com/article # Webpage requiring login docs-scraper scrape https://members.example.com/article \ -D email=user@example.com -D password=secret # When blocked, check the job for required fields docs-scraper jobs list # Then retry with the fields the scraper detected docs-scraper update abc123 -D username=myuser -D password=secret Notes: Requires ANTHROPIC_API_KEY environment variable Field names are extracted from the page's actual form fields Limited to 2 login attempts before failing CAPTCHAs require manual intervention

Data field summary

ScraperemailpasswordnameOtherDirectPdf----DocSendβœ“βœ“βœ“-Notionβœ“βœ“--LLM Fallbackβœ“*βœ“*-Dynamic* *Fields detected dynamically from page analysis

Environment setup (optional)

Only needed for LLM fallback scraper: export ANTHROPIC_API_KEY=your_key Optional browser settings: export BROWSER_HEADLESS=true # Set false for debugging

Common patterns

Archive a Notion page: docs-scraper scrape https://notion.so/My-Page-abc123 Download protected DocSend: docs-scraper scrape https://docsend.com/view/xxx # If blocked: docs-scraper update <job-id> -D email=user@example.com -D password=1234 Batch scraping with profiles: docs-scraper scrape https://site.com/doc1 -p mysite docs-scraper scrape https://site.com/doc2 -p mysite

Output

Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types

Troubleshooting

Timeout: docs-scraper daemon stop && docs-scraper daemon start Auth fails: docs-scraper jobs list to check pending jobs Disk full: docs-scraper cleanup to remove old PDFs

Category context

Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
1 Docs
  • SKILL.md Primary doc