{
  "schemaVersion": "1.0",
  "item": {
    "slug": "knowledge-base-collector",
    "name": "Knowledge Base Collector",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/reed1898/knowledge-base-collector",
    "canonicalUrl": "https://clawhub.ai/reed1898/knowledge-base-collector",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/knowledge-base-collector",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=knowledge-base-collector",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md",
      "scripts/ingest_image.py",
      "scripts/ingest_url.py",
      "scripts/search_kb.py",
      "scripts/tagger.py",
      "scripts/wechat_backlog.py"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/knowledge-base-collector"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/knowledge-base-collector",
    "agentPageUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent",
    "manifestUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Summary",
        "body": "Ingest: web URLs, X/Twitter links, WeChat Official Account links (mp.weixin.qq.com), and screenshots\nStore: writes to a shared KB folder with per-item content.md + meta.json and a global index.jsonl\nOrganize: tag-first classification with richer tags (e.g. #agent, #coding-agent, #claude-code, #mcp, #rag, #prompt-injection, #security, #pricing, #database)\nWeChat: cloud fetch may be blocked; when a macOS node (e.g. Reed-Mac) is online, prefer node-side fetch to improve success rate; otherwise create a placeholder entry\nSearch: designed to support Telegram Q&A / search flows on top of the index and content\n\n把用户发来的链接/截图沉淀到共享知识库（KB），并做标签化整理。"
      },
      {
        "title": "默认 KB 位置",
        "body": "KB Root（可改）：/home/ubuntu/.openclaw/kb\n索引：kb/20_Inbox/urls/index.jsonl\n每条内容目录：kb/20_Inbox/urls/<YYYY-MM>/<item>/content.md + meta.json\n\n目标：先入库不丢，再迭代“摘要/标签/检索”。"
      },
      {
        "title": "1) 普通网页 / X(Twitter) / 公众号 URL 入库",
        "body": "运行脚本：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_url.py \"<URL>\" --tags \"#optional\" --note \"context\"\n\n行为：\n\n自动识别来源（web/x/wechat）\n优先用 r.jina.ai 抽取正文（无需登录）\n公众号遇到风控会写占位条目：status=blocked_verification + tag #needs-manual\n对同一 URL 做 key 去重（已存在则跳过）\n\nWeChat 更高成功率（推荐路径）\n\n当云端抓取命中“环境异常/验证”时：\n\n如果有已连接的 macOS 节点（例如 Reed-Mac）且该节点能访问该文章，可用 nodes.run 在节点上执行抓取（requests+bs4），然后写入 KB。\n注意：这条路径依赖节点在线与网络环境；无法承诺 100%。"
      },
      {
        "title": "2) 截图/图片入库（含 OCR 文本）",
        "body": "脚本：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_image.py /path/to/image.jpg \\\n  --text-file /path/to/ocr.txt \\\n  --title \"...\" --tags \"#ai #product\" --note \"...\"\n\n说明：\n\ningest_image.py 负责“落盘+索引”。OCR 可用：\n\n本机 tesseract（若安装了 tesseract-ocr + chi_sim）\n或用多模态 LLM 抽取文字后写入 --text-file"
      },
      {
        "title": "Telegram 里直接问（检索）",
        "body": "推荐先用脚本（本机/服务器）：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --q \"claude code\" --limit 10\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --tags \"#claude-code #coding-agent\" --limit 20\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --source wechat --since 7d --q \"Elys\""
      },
      {
        "title": "公众号待补抓队列（占位条目）",
        "body": "python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/wechat_backlog.py --limit 30"
      },
      {
        "title": "周报/主题报告候选清单（给 LLM 写总结用）",
        "body": "python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/weekly_digest.py --days 7 --limit 30"
      },
      {
        "title": "重要注意事项（安全/隐私）",
        "body": "截图/网页可能包含 token/验证码/密钥：入库前应做脱敏（替换为 REDACTED）。\n公众号抓取受风控影响：建议允许“占位入库”，后续再补全。"
      }
    ],
    "body": "Summary\nIngest: web URLs, X/Twitter links, WeChat Official Account links (mp.weixin.qq.com), and screenshots\nStore: writes to a shared KB folder with per-item content.md + meta.json and a global index.jsonl\nOrganize: tag-first classification with richer tags (e.g. #agent, #coding-agent, #claude-code, #mcp, #rag, #prompt-injection, #security, #pricing, #database)\nWeChat: cloud fetch may be blocked; when a macOS node (e.g. Reed-Mac) is online, prefer node-side fetch to improve success rate; otherwise create a placeholder entry\nSearch: designed to support Telegram Q&A / search flows on top of the index and content\n\n把用户发来的链接/截图沉淀到共享知识库（KB），并做标签化整理。\n\n默认 KB 位置\nKB Root（可改）：/home/ubuntu/.openclaw/kb\n索引：kb/20_Inbox/urls/index.jsonl\n每条内容目录：kb/20_Inbox/urls/<YYYY-MM>/<item>/content.md + meta.json\n\n目标：先入库不丢，再迭代“摘要/标签/检索”。\n\n你要做的事（按输入类型）\n1) 普通网页 / X(Twitter) / 公众号 URL 入库\n\n运行脚本：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_url.py \"<URL>\" --tags \"#optional\" --note \"context\"\n\n\n行为：\n\n自动识别来源（web/x/wechat）\n优先用 r.jina.ai 抽取正文（无需登录）\n公众号遇到风控会写占位条目：status=blocked_verification + tag #needs-manual\n对同一 URL 做 key 去重（已存在则跳过）\nWeChat 更高成功率（推荐路径）\n\n当云端抓取命中“环境异常/验证”时：\n\n如果有已连接的 macOS 节点（例如 Reed-Mac）且该节点能访问该文章，可用 nodes.run 在节点上执行抓取（requests+bs4），然后写入 KB。\n注意：这条路径依赖节点在线与网络环境；无法承诺 100%。\n2) 截图/图片入库（含 OCR 文本）\n\n脚本：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_image.py /path/to/image.jpg \\\n  --text-file /path/to/ocr.txt \\\n  --title \"...\" --tags \"#ai #product\" --note \"...\"\n\n\n说明：\n\ningest_image.py 负责“落盘+索引”。OCR 可用：\n本机 tesseract（若安装了 tesseract-ocr + chi_sim）\n或用多模态 LLM 抽取文字后写入 --text-file\nTelegram 里直接问（检索）\n\n推荐先用脚本（本机/服务器）：\n\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --q \"claude code\" --limit 10\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --tags \"#claude-code #coding-agent\" --limit 20\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --source wechat --since 7d --q \"Elys\"\n\n公众号待补抓队列（占位条目）\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/wechat_backlog.py --limit 30\n\n周报/主题报告候选清单（给 LLM 写总结用）\npython3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/weekly_digest.py --days 7 --limit 30\n\n重要注意事项（安全/隐私）\n截图/网页可能包含 token/验证码/密钥：入库前应做脱敏（替换为 REDACTED）。\n公众号抓取受风控影响：建议允许“占位入库”，后续再补全。"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/reed1898/knowledge-base-collector",
    "publisherUrl": "https://clawhub.ai/reed1898/knowledge-base-collector",
    "owner": "reed1898",
    "version": "0.1.3",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/knowledge-base-collector",
    "downloadUrl": "https://openagent3.xyz/downloads/knowledge-base-collector",
    "agentUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent",
    "manifestUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/knowledge-base-collector/agent.md"
  }
}