{
  "schemaVersion": "1.0",
  "item": {
    "slug": "wecom-doc-fetcher",
    "name": "WeChat Work Doc Fetcher",
    "source": "tencent",
    "type": "skill",
    "category": "效率提升",
    "sourceUrl": "https://clawhub.ai/mouzhi/wecom-doc-fetcher",
    "canonicalUrl": "https://clawhub.ai/mouzhi/wecom-doc-fetcher",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/wecom-doc-fetcher",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=wecom-doc-fetcher",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "README.md",
      "SKILL.md",
      "wx_doc_fetch.py"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/wecom-doc-fetcher"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/wecom-doc-fetcher",
    "agentPageUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent",
    "manifestUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "wecom-doc-fetcher",
        "body": "Use this skill when the user wants to save any page from the WeChat Work (企业微信) developer documentation site (developer.work.weixin.qq.com/document/path/*) as a clean Markdown file in their Obsidian vault."
      },
      {
        "title": "Files in this skill",
        "body": "wecom-doc-fetcher/\n├── SKILL.md          # this file\n└── wx_doc_fetch.py   # the fetch & convert script"
      },
      {
        "title": "Setup (one-time)",
        "body": "Run these once before using the skill:\n\npip install requests playwright\nplaywright install chromium\n\nplaywright install chromium downloads a ~150 MB headless Chromium binary. This is required for automatic doc_id detection.\n\nPython 3.8+ is required."
      },
      {
        "title": "Usage",
        "body": "Place wx_doc_fetch.py anywhere convenient (e.g. your vault's scripts folder), then run:\n\n# Basic: auto-detect doc_id, print to stdout\npython wx_doc_fetch.py <URL>\n\n# Save to file\npython wx_doc_fetch.py <URL> output.md\n\n# Skip Playwright, supply doc_id manually\npython wx_doc_fetch.py <URL> output.md --doc-id <integer>\n\n# Override cookies at runtime\npython wx_doc_fetch.py <URL> output.md --cookies \"wwapidoc.sid=xxx; ...\""
      },
      {
        "title": "Example",
        "body": "python wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md\n# [info] path_id=94677  doc_id=31152\n# [done] 已写入：发送消息.md"
      },
      {
        "title": "How It Works",
        "body": "The WeChat Work docs site is a Vue SPA — the visible content is not in the initial HTML. It is loaded at runtime via a private POST API:\n\nPOST https://developer.work.weixin.qq.com/docFetch/fetchCnt?lang=zh_CN&ajax=1&f=json\nBody: doc_id=<integer>   (application/x-www-form-urlencoded)\n\nThe response includes data.content_md — the page content as a Markdown string. The script fetches this field, cleans it, and writes the result."
      },
      {
        "title": "Why not WebFetch / defuddle?",
        "body": "The page renders client-side. WebFetch and defuddle only see the pre-JS HTML skeleton — no content. Scraping innerText via browser tools works but produces a very large accessibility tree with poor formatting. The content_md API field is the cleanest, most token-efficient source."
      },
      {
        "title": "URL path ID ≠ doc_id",
        "body": "The number in the browser URL (e.g. 94677) is a routing slug — not the doc_id the API needs. The actual doc_id (e.g. 31152) is determined at runtime by loading the page with Playwright and intercepting the fetchCnt XHR request."
      },
      {
        "title": "Manual doc_id Fallback",
        "body": "If Playwright is unavailable or times out:\n\nOpen the target URL in Chrome\nDevTools → Network tab → filter by fetchCnt\nClick the request → Payload tab\nRead the doc_id value\nPass it with --doc-id:\n\npython wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md --doc-id 31152"
      },
      {
        "title": "Cookie Configuration",
        "body": "The fetchCnt API requires an authenticated session. Playwright's headless browser obtains session cookies automatically when loading the page — no manual cookie setup needed for normal use.\n\nIf you see errCode: -30001 in the output, the session is rejected. Fix:\n\nOpen the site in Chrome while logged in\nDevTools → Network → any fetchCnt request → Copy as cURL\nFind the -b '...' cookie string in the copied command\nEither paste it into COOKIES_RAW at the top of wx_doc_fetch.py, or pass it via --cookies \"...\"\n\nKey cookies and their lifetimes:\n\nCookiePurposeLifetimewwapidoc.sidSession identifier~24 hourswwapidoc.token_wtJWT auth token~30 minutes"
      },
      {
        "title": "API Reference",
        "body": "ItemDetailEndpointPOST /docFetch/fetchCnt?lang=zh_CN&ajax=1&f=json&random=<rand>Bodydoc_id=<integer> (form-urlencoded)AuthSession cookiesKey response fielddata.content_mdOther response fieldsdata.content_html, data.content_html_v2, data.content_txt, data.title, data.time"
      },
      {
        "title": "content_md Cleaning Rules",
        "body": "The content_md field is mostly valid CommonMark but has site-specific issues. The clean_md() function in wx_doc_fetch.py handles all of them:\n\n#ProblemRaw exampleAfter cleaning1[TOC] marker at top[TOC]\\n# 概述# 概述2Heading missing space after ###接口定义## 接口定义3Internal numeric anchor links[接收事件](#12977)接收事件3Anchors with sub-path[开启API](#31106/如何开启API)开启API4HTML line breaks inside table cells说明</br>补充说明 补充5<b> bold tags<b>注意</b>**注意**6<code> inline tags<code>open_kfid</code>`open_kfid`7<font> color tags<font color=\"red\">警告</font>警告8!!#rrggbb text!! site-specific highlight!!#ff0000 重要!!重要9Leading spaces before table rows··| 参数 || 参数 |10No blank line before table (Obsidian won't render)文字\\n| col |文字\\n\\n| col |11Excess blank lines3+ \\n in a row2 \\n max"
      },
      {
        "title": "Rule 10 — critical regex note",
        "body": "The blank-line-before-table rule must match on lines that don't start with |, not just on the trailing character of the previous line:\n\n# CORRECT — matches on start of line, avoids breaking table rows apart\nre.sub(r\"^([^|\\n][^\\n]*)\\n(\\|)\", r\"\\1\\n\\n\\2\", content, flags=re.MULTILINE)\n\n# WRONG — table rows end with \"| \" (trailing space), so last char is space,\n#          causing blank lines to be inserted between every table row\nre.sub(r\"([^\\n])\\n(\\|)\", r\"\\1\\n\\n\\2\", content)"
      }
    ],
    "body": "wecom-doc-fetcher\n\nUse this skill when the user wants to save any page from the WeChat Work (企业微信) developer documentation site (developer.work.weixin.qq.com/document/path/*) as a clean Markdown file in their Obsidian vault.\n\nFiles in this skill\nwecom-doc-fetcher/\n├── SKILL.md          # this file\n└── wx_doc_fetch.py   # the fetch & convert script\n\nSetup (one-time)\n\nRun these once before using the skill:\n\npip install requests playwright\nplaywright install chromium\n\n\nplaywright install chromium downloads a ~150 MB headless Chromium binary. This is required for automatic doc_id detection.\n\nPython 3.8+ is required.\n\nUsage\n\nPlace wx_doc_fetch.py anywhere convenient (e.g. your vault's scripts folder), then run:\n\n# Basic: auto-detect doc_id, print to stdout\npython wx_doc_fetch.py <URL>\n\n# Save to file\npython wx_doc_fetch.py <URL> output.md\n\n# Skip Playwright, supply doc_id manually\npython wx_doc_fetch.py <URL> output.md --doc-id <integer>\n\n# Override cookies at runtime\npython wx_doc_fetch.py <URL> output.md --cookies \"wwapidoc.sid=xxx; ...\"\n\nExample\npython wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md\n# [info] path_id=94677  doc_id=31152\n# [done] 已写入：发送消息.md\n\nHow It Works\n\nThe WeChat Work docs site is a Vue SPA — the visible content is not in the initial HTML. It is loaded at runtime via a private POST API:\n\nPOST https://developer.work.weixin.qq.com/docFetch/fetchCnt?lang=zh_CN&ajax=1&f=json\nBody: doc_id=<integer>   (application/x-www-form-urlencoded)\n\n\nThe response includes data.content_md — the page content as a Markdown string. The script fetches this field, cleans it, and writes the result.\n\nWhy not WebFetch / defuddle?\n\nThe page renders client-side. WebFetch and defuddle only see the pre-JS HTML skeleton — no content. Scraping innerText via browser tools works but produces a very large accessibility tree with poor formatting. The content_md API field is the cleanest, most token-efficient source.\n\nURL path ID ≠ doc_id\n\nThe number in the browser URL (e.g. 94677) is a routing slug — not the doc_id the API needs. The actual doc_id (e.g. 31152) is determined at runtime by loading the page with Playwright and intercepting the fetchCnt XHR request.\n\nManual doc_id Fallback\n\nIf Playwright is unavailable or times out:\n\nOpen the target URL in Chrome\nDevTools → Network tab → filter by fetchCnt\nClick the request → Payload tab\nRead the doc_id value\nPass it with --doc-id:\npython wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md --doc-id 31152\n\nCookie Configuration\n\nThe fetchCnt API requires an authenticated session. Playwright's headless browser obtains session cookies automatically when loading the page — no manual cookie setup needed for normal use.\n\nIf you see errCode: -30001 in the output, the session is rejected. Fix:\n\nOpen the site in Chrome while logged in\nDevTools → Network → any fetchCnt request → Copy as cURL\nFind the -b '...' cookie string in the copied command\nEither paste it into COOKIES_RAW at the top of wx_doc_fetch.py, or pass it via --cookies \"...\"\n\nKey cookies and their lifetimes:\n\nCookie\tPurpose\tLifetime\nwwapidoc.sid\tSession identifier\t~24 hours\nwwapidoc.token_wt\tJWT auth token\t~30 minutes\nAPI Reference\nItem\tDetail\nEndpoint\tPOST /docFetch/fetchCnt?lang=zh_CN&ajax=1&f=json&random=<rand>\nBody\tdoc_id=<integer> (form-urlencoded)\nAuth\tSession cookies\nKey response field\tdata.content_md\nOther response fields\tdata.content_html, data.content_html_v2, data.content_txt, data.title, data.time\ncontent_md Cleaning Rules\n\nThe content_md field is mostly valid CommonMark but has site-specific issues. The clean_md() function in wx_doc_fetch.py handles all of them:\n\n#\tProblem\tRaw example\tAfter cleaning\n1\t[TOC] marker at top\t[TOC]\\n# 概述\t# 概述\n2\tHeading missing space after #\t##接口定义\t## 接口定义\n3\tInternal numeric anchor links\t[接收事件](#12977)\t接收事件\n3\tAnchors with sub-path\t[开启API](#31106/如何开启API)\t开启API\n4\tHTML line breaks inside table cells\t说明</br>补充\t说明 补充\n5\t<b> bold tags\t<b>注意</b>\t**注意**\n6\t<code> inline tags\t<code>open_kfid</code>\t`open_kfid`\n7\t<font> color tags\t<font color=\"red\">警告</font>\t警告\n8\t!!#rrggbb text!! site-specific highlight\t!!#ff0000 重要!!\t重要\n9\tLeading spaces before table rows\t··| 参数 |\t| 参数 |\n10\tNo blank line before table (Obsidian won't render)\t文字\\n| col |\t文字\\n\\n| col |\n11\tExcess blank lines\t3+ \\n in a row\t2 \\n max\nRule 10 — critical regex note\n\nThe blank-line-before-table rule must match on lines that don't start with |, not just on the trailing character of the previous line:\n\n# CORRECT — matches on start of line, avoids breaking table rows apart\nre.sub(r\"^([^|\\n][^\\n]*)\\n(\\|)\", r\"\\1\\n\\n\\2\", content, flags=re.MULTILINE)\n\n# WRONG — table rows end with \"| \" (trailing space), so last char is space,\n#          causing blank lines to be inserted between every table row\nre.sub(r\"([^\\n])\\n(\\|)\", r\"\\1\\n\\n\\2\", content)"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/mouzhi/wecom-doc-fetcher",
    "publisherUrl": "https://clawhub.ai/mouzhi/wecom-doc-fetcher",
    "owner": "mouzhi",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher",
    "downloadUrl": "https://openagent3.xyz/downloads/wecom-doc-fetcher",
    "agentUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent",
    "manifestUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/wecom-doc-fetcher/agent.md"
  }
}