{
  "schemaVersion": "1.0",
  "item": {
    "slug": "links-to-pdfs",
    "name": "Links to PDFs",
    "source": "tencent",
    "type": "skill",
    "category": "效率提升",
    "sourceUrl": "https://clawhub.ai/chrisling-dev/links-to-pdfs",
    "canonicalUrl": "https://clawhub.ai/chrisling-dev/links-to-pdfs",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/links-to-pdfs",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=links-to-pdfs",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-23T16:43:11.935Z",
      "expiresAt": "2026-04-30T16:43:11.935Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
        "contentDisposition": "attachment; filename=\"4claw-imageboard-1.0.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/links-to-pdfs"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/links-to-pdfs",
    "agentPageUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent",
    "manifestUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "docs-scraper",
        "body": "CLI tool that scrapes documents from various sources into local PDF files using browser automation."
      },
      {
        "title": "Installation",
        "body": "npm install -g docs-scraper"
      },
      {
        "title": "Quick start",
        "body": "Scrape any document URL to PDF:\n\ndocs-scraper scrape https://example.com/document\n\nReturns local path: ~/.docs-scraper/output/1706123456-abc123.pdf"
      },
      {
        "title": "Basic scraping",
        "body": "Scrape with daemon (recommended, keeps browser warm):\n\ndocs-scraper scrape <url>\n\nScrape with named profile (for authenticated sites):\n\ndocs-scraper scrape <url> -p <profile-name>\n\nScrape with pre-filled data (e.g., email for DocSend):\n\ndocs-scraper scrape <url> -D email=user@example.com\n\nDirect mode (single-shot, no daemon):\n\ndocs-scraper scrape <url> --no-daemon"
      },
      {
        "title": "Authentication workflow",
        "body": "When a document requires authentication (login, email verification, passcode):\n\nInitial scrape returns a job ID:\ndocs-scraper scrape https://docsend.com/view/xxx\n# Output: Scrape blocked\n#         Job ID: abc123\n\n\n\nRetry with data:\ndocs-scraper update abc123 -D email=user@example.com\n# or with password\ndocs-scraper update abc123 -D email=user@example.com -D password=1234"
      },
      {
        "title": "Profile management",
        "body": "Profiles store session cookies for authenticated sites.\n\ndocs-scraper profiles list     # List saved profiles\ndocs-scraper profiles clear    # Clear all profiles\ndocs-scraper scrape <url> -p myprofile  # Use a profile"
      },
      {
        "title": "Daemon management",
        "body": "The daemon keeps browser instances warm for faster scraping.\n\ndocs-scraper daemon status     # Check status\ndocs-scraper daemon start      # Start manually\ndocs-scraper daemon stop       # Stop daemon\n\nNote: Daemon auto-starts when running scrape commands."
      },
      {
        "title": "Cleanup",
        "body": "PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.\n\nManual cleanup:\n\ndocs-scraper cleanup                    # Delete all PDFs\ndocs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour"
      },
      {
        "title": "Job management",
        "body": "docs-scraper jobs list         # List blocked jobs awaiting auth"
      },
      {
        "title": "Supported sources",
        "body": "Direct PDF links - Downloads PDF directly\nNotion pages - Exports Notion page to PDF\nDocSend documents - Handles DocSend viewer\nLLM fallback - Uses Claude API for any other webpage"
      },
      {
        "title": "Scraper Reference",
        "body": "Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type."
      },
      {
        "title": "DirectPdfScraper",
        "body": "Handles: URLs ending in .pdf\n\nData fields: None (downloads directly)\n\nExample:\n\ndocs-scraper scrape https://example.com/document.pdf"
      },
      {
        "title": "DocsendScraper",
        "body": "Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)\n\nURL patterns:\n\nDocuments: https://docsend.com/view/{id} or https://docsend.com/v/{id}\nFolders: https://docsend.com/view/s/{id}\nSubdomains: https://{subdomain}.docsend.com/view/{id}\n\nData fields:\n\nFieldTypeDescriptionemailemailEmail address for document accesspasswordpasswordPasscode/password for protected documentsnametextYour name (required for NDA-gated documents)\n\nExamples:\n\n# Pre-fill email for DocSend\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com\n\n# With password protection\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123\n\n# With NDA name requirement\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name=\"John Doe\"\n\n# Retry blocked job\ndocs-scraper update abc123 -D email=user@example.com -D password=secret123\n\nNotes:\n\nDocSend may require any combination of email, password, and name\nFolders are scraped as a table of contents PDF with document links\nThe scraper auto-checks NDA checkboxes when name is provided"
      },
      {
        "title": "NotionScraper",
        "body": "Handles: notion.so/*, *.notion.site/*\n\nData fields:\n\nFieldTypeDescriptionemailemailNotion account emailpasswordpasswordNotion account password\n\nExamples:\n\n# Public page (no auth needed)\ndocs-scraper scrape https://notion.so/Public-Page-abc123\n\n# Private page with login\ndocs-scraper scrape https://notion.so/Private-Page-abc123 \\\n  -D email=user@example.com -D password=mypassword\n\n# Custom domain\ndocs-scraper scrape https://docs.company.notion.site/Page-abc123\n\nNotes:\n\nPublic Notion pages don't require authentication\nToggle blocks are automatically expanded before PDF generation\nUses session profiles to persist login across scrapes"
      },
      {
        "title": "LlmFallbackScraper",
        "body": "Handles: Any URL not matched by other scrapers (automatic fallback)\n\nData fields: Dynamic - determined by Claude analyzing the page\n\nThe LLM scraper uses Claude to analyze the page HTML and detect:\n\nLogin forms (extracts field names dynamically)\nCookie banners (auto-dismisses)\nExpandable content (auto-expands)\nCAPTCHAs (reports as blocked)\nPaywalls (reports as blocked)\n\nCommon dynamic fields:\n\nFieldTypeDescriptionemailemailLogin email (if detected)passwordpasswordLogin password (if detected)usernametextUsername (if login uses username)\n\nExamples:\n\n# Generic webpage (no auth)\ndocs-scraper scrape https://example.com/article\n\n# Webpage requiring login\ndocs-scraper scrape https://members.example.com/article \\\n  -D email=user@example.com -D password=secret\n\n# When blocked, check the job for required fields\ndocs-scraper jobs list\n# Then retry with the fields the scraper detected\ndocs-scraper update abc123 -D username=myuser -D password=secret\n\nNotes:\n\nRequires ANTHROPIC_API_KEY environment variable\nField names are extracted from the page's actual form fields\nLimited to 2 login attempts before failing\nCAPTCHAs require manual intervention"
      },
      {
        "title": "Data field summary",
        "body": "ScraperemailpasswordnameOtherDirectPdf----DocSend✓✓✓-Notion✓✓--LLM Fallback✓*✓*-Dynamic*\n\n*Fields detected dynamically from page analysis"
      },
      {
        "title": "Environment setup (optional)",
        "body": "Only needed for LLM fallback scraper:\n\nexport ANTHROPIC_API_KEY=your_key\n\nOptional browser settings:\n\nexport BROWSER_HEADLESS=true   # Set false for debugging"
      },
      {
        "title": "Common patterns",
        "body": "Archive a Notion page:\n\ndocs-scraper scrape https://notion.so/My-Page-abc123\n\nDownload protected DocSend:\n\ndocs-scraper scrape https://docsend.com/view/xxx\n# If blocked:\ndocs-scraper update <job-id> -D email=user@example.com -D password=1234\n\nBatch scraping with profiles:\n\ndocs-scraper scrape https://site.com/doc1 -p mysite\ndocs-scraper scrape https://site.com/doc2 -p mysite"
      },
      {
        "title": "Output",
        "body": "Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf)\nBlocked: Job ID + required credential types"
      },
      {
        "title": "Troubleshooting",
        "body": "Timeout: docs-scraper daemon stop && docs-scraper daemon start\nAuth fails: docs-scraper jobs list to check pending jobs\nDisk full: docs-scraper cleanup to remove old PDFs"
      }
    ],
    "body": "docs-scraper\n\nCLI tool that scrapes documents from various sources into local PDF files using browser automation.\n\nInstallation\nnpm install -g docs-scraper\n\nQuick start\n\nScrape any document URL to PDF:\n\ndocs-scraper scrape https://example.com/document\n\n\nReturns local path: ~/.docs-scraper/output/1706123456-abc123.pdf\n\nBasic scraping\n\nScrape with daemon (recommended, keeps browser warm):\n\ndocs-scraper scrape <url>\n\n\nScrape with named profile (for authenticated sites):\n\ndocs-scraper scrape <url> -p <profile-name>\n\n\nScrape with pre-filled data (e.g., email for DocSend):\n\ndocs-scraper scrape <url> -D email=user@example.com\n\n\nDirect mode (single-shot, no daemon):\n\ndocs-scraper scrape <url> --no-daemon\n\nAuthentication workflow\n\nWhen a document requires authentication (login, email verification, passcode):\n\nInitial scrape returns a job ID:\n\ndocs-scraper scrape https://docsend.com/view/xxx\n# Output: Scrape blocked\n#         Job ID: abc123\n\n\nRetry with data:\n\ndocs-scraper update abc123 -D email=user@example.com\n# or with password\ndocs-scraper update abc123 -D email=user@example.com -D password=1234\n\nProfile management\n\nProfiles store session cookies for authenticated sites.\n\ndocs-scraper profiles list     # List saved profiles\ndocs-scraper profiles clear    # Clear all profiles\ndocs-scraper scrape <url> -p myprofile  # Use a profile\n\nDaemon management\n\nThe daemon keeps browser instances warm for faster scraping.\n\ndocs-scraper daemon status     # Check status\ndocs-scraper daemon start      # Start manually\ndocs-scraper daemon stop       # Stop daemon\n\n\nNote: Daemon auto-starts when running scrape commands.\n\nCleanup\n\nPDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.\n\nManual cleanup:\n\ndocs-scraper cleanup                    # Delete all PDFs\ndocs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour\n\nJob management\ndocs-scraper jobs list         # List blocked jobs awaiting auth\n\nSupported sources\nDirect PDF links - Downloads PDF directly\nNotion pages - Exports Notion page to PDF\nDocSend documents - Handles DocSend viewer\nLLM fallback - Uses Claude API for any other webpage\nScraper Reference\n\nEach scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.\n\nDirectPdfScraper\n\nHandles: URLs ending in .pdf\n\nData fields: None (downloads directly)\n\nExample:\n\ndocs-scraper scrape https://example.com/document.pdf\n\nDocsendScraper\n\nHandles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)\n\nURL patterns:\n\nDocuments: https://docsend.com/view/{id} or https://docsend.com/v/{id}\nFolders: https://docsend.com/view/s/{id}\nSubdomains: https://{subdomain}.docsend.com/view/{id}\n\nData fields:\n\nField\tType\tDescription\nemail\temail\tEmail address for document access\npassword\tpassword\tPasscode/password for protected documents\nname\ttext\tYour name (required for NDA-gated documents)\n\nExamples:\n\n# Pre-fill email for DocSend\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com\n\n# With password protection\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123\n\n# With NDA name requirement\ndocs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name=\"John Doe\"\n\n# Retry blocked job\ndocs-scraper update abc123 -D email=user@example.com -D password=secret123\n\n\nNotes:\n\nDocSend may require any combination of email, password, and name\nFolders are scraped as a table of contents PDF with document links\nThe scraper auto-checks NDA checkboxes when name is provided\nNotionScraper\n\nHandles: notion.so/*, *.notion.site/*\n\nData fields:\n\nField\tType\tDescription\nemail\temail\tNotion account email\npassword\tpassword\tNotion account password\n\nExamples:\n\n# Public page (no auth needed)\ndocs-scraper scrape https://notion.so/Public-Page-abc123\n\n# Private page with login\ndocs-scraper scrape https://notion.so/Private-Page-abc123 \\\n  -D email=user@example.com -D password=mypassword\n\n# Custom domain\ndocs-scraper scrape https://docs.company.notion.site/Page-abc123\n\n\nNotes:\n\nPublic Notion pages don't require authentication\nToggle blocks are automatically expanded before PDF generation\nUses session profiles to persist login across scrapes\nLlmFallbackScraper\n\nHandles: Any URL not matched by other scrapers (automatic fallback)\n\nData fields: Dynamic - determined by Claude analyzing the page\n\nThe LLM scraper uses Claude to analyze the page HTML and detect:\n\nLogin forms (extracts field names dynamically)\nCookie banners (auto-dismisses)\nExpandable content (auto-expands)\nCAPTCHAs (reports as blocked)\nPaywalls (reports as blocked)\n\nCommon dynamic fields:\n\nField\tType\tDescription\nemail\temail\tLogin email (if detected)\npassword\tpassword\tLogin password (if detected)\nusername\ttext\tUsername (if login uses username)\n\nExamples:\n\n# Generic webpage (no auth)\ndocs-scraper scrape https://example.com/article\n\n# Webpage requiring login\ndocs-scraper scrape https://members.example.com/article \\\n  -D email=user@example.com -D password=secret\n\n# When blocked, check the job for required fields\ndocs-scraper jobs list\n# Then retry with the fields the scraper detected\ndocs-scraper update abc123 -D username=myuser -D password=secret\n\n\nNotes:\n\nRequires ANTHROPIC_API_KEY environment variable\nField names are extracted from the page's actual form fields\nLimited to 2 login attempts before failing\nCAPTCHAs require manual intervention\nData field summary\nScraper\temail\tpassword\tname\tOther\nDirectPdf\t-\t-\t-\t-\nDocSend\t✓\t✓\t✓\t-\nNotion\t✓\t✓\t-\t-\nLLM Fallback\t✓*\t✓*\t-\tDynamic*\n\n*Fields detected dynamically from page analysis\n\nEnvironment setup (optional)\n\nOnly needed for LLM fallback scraper:\n\nexport ANTHROPIC_API_KEY=your_key\n\n\nOptional browser settings:\n\nexport BROWSER_HEADLESS=true   # Set false for debugging\n\nCommon patterns\n\nArchive a Notion page:\n\ndocs-scraper scrape https://notion.so/My-Page-abc123\n\n\nDownload protected DocSend:\n\ndocs-scraper scrape https://docsend.com/view/xxx\n# If blocked:\ndocs-scraper update <job-id> -D email=user@example.com -D password=1234\n\n\nBatch scraping with profiles:\n\ndocs-scraper scrape https://site.com/doc1 -p mysite\ndocs-scraper scrape https://site.com/doc2 -p mysite\n\nOutput\n\nSuccess: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types\n\nTroubleshooting\nTimeout: docs-scraper daemon stop && docs-scraper daemon start\nAuth fails: docs-scraper jobs list to check pending jobs\nDisk full: docs-scraper cleanup to remove old PDFs"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/chrisling-dev/links-to-pdfs",
    "publisherUrl": "https://clawhub.ai/chrisling-dev/links-to-pdfs",
    "owner": "chrisling-dev",
    "version": "0.0.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/links-to-pdfs",
    "downloadUrl": "https://openagent3.xyz/downloads/links-to-pdfs",
    "agentUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent",
    "manifestUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/links-to-pdfs/agent.md"
  }
}