{
  "schemaVersion": "1.0",
  "item": {
    "slug": "tech-news-digest",
    "name": "Tech News Digest",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/dinstein/tech-news-digest",
    "canonicalUrl": "https://clawhub.ai/dinstein/tech-news-digest",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/tech-news-digest",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=tech-news-digest",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "CHANGELOG.md",
      "CONTRIBUTING.md",
      "README.md",
      "README_CN.md",
      "SKILL.md",
      "config/defaults/sources.json"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/tech-news-digest"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/tech-news-digest",
    "agentPageUrl": "https://openagent3.xyz/skills/tech-news-digest/agent",
    "manifestUrl": "https://openagent3.xyz/skills/tech-news-digest/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/tech-news-digest/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Tech News Digest",
        "body": "Automated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation."
      },
      {
        "title": "Quick Start",
        "body": "Configuration Setup: Default configs are in config/defaults/. Copy to workspace for customization:\nmkdir -p workspace/config\ncp config/defaults/sources.json workspace/config/tech-news-digest-sources.json\ncp config/defaults/topics.json workspace/config/tech-news-digest-topics.json\n\n\n\nEnvironment Variables:\n\nTWITTERAPI_IO_KEY - twitterapi.io API key (optional, preferred)\nX_BEARER_TOKEN - Twitter/X official API bearer token (optional, fallback)\nTAVILY_API_KEY - Tavily Search API key, alternative to Brave (optional)\nWEB_SEARCH_BACKEND - Web search backend: auto|brave|tavily (optional, default: auto)\nBRAVE_API_KEYS - Brave Search API keys, comma-separated for rotation (optional)\nBRAVE_API_KEY - Single Brave key fallback (optional)\nGITHUB_TOKEN - GitHub personal access token (optional, improves rate limits)\n\n\n\nGenerate Digest:\n# Unified pipeline (recommended) — runs all 6 sources in parallel + merge\npython3 scripts/run-pipeline.py \\\n  --defaults config/defaults \\\n  --config workspace/config \\\n  --hours 48 --freshness pd \\\n  --archive-dir workspace/archive/tech-news-digest/ \\\n  --output /tmp/td-merged.json --verbose --force\n\n\n\nUse Templates: Apply Discord, email, or PDF templates to merged output"
      },
      {
        "title": "sources.json - Unified Data Sources",
        "body": "{\n  \"sources\": [\n    {\n      \"id\": \"openai-rss\",\n      \"type\": \"rss\",\n      \"name\": \"OpenAI Blog\",\n      \"url\": \"https://openai.com/blog/rss.xml\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"llm\", \"ai-agent\"],\n      \"note\": \"Official OpenAI updates\"\n    },\n    {\n      \"id\": \"sama-twitter\",\n      \"type\": \"twitter\", \n      \"name\": \"Sam Altman\",\n      \"handle\": \"sama\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"llm\", \"frontier-tech\"],\n      \"note\": \"OpenAI CEO\"\n    }\n  ]\n}"
      },
      {
        "title": "topics.json - Enhanced Topic Definitions",
        "body": "{\n  \"topics\": [\n    {\n      \"id\": \"llm\",\n      \"emoji\": \"🧠\",\n      \"label\": \"LLM / Large Models\",\n      \"description\": \"Large Language Models, foundation models, breakthroughs\",\n      \"search\": {\n        \"queries\": [\"LLM latest news\", \"large language model breakthroughs\"],\n        \"must_include\": [\"LLM\", \"large language model\", \"foundation model\"],\n        \"exclude\": [\"tutorial\", \"beginner guide\"]\n      },\n      \"display\": {\n        \"max_items\": 8,\n        \"style\": \"detailed\"\n      }\n    }\n  ]\n}"
      },
      {
        "title": "run-pipeline.py - Unified Pipeline (Recommended)",
        "body": "python3 scripts/run-pipeline.py \\\n  --defaults config/defaults [--config CONFIG_DIR] \\\n  --hours 48 --freshness pd \\\n  --archive-dir workspace/archive/tech-news-digest/ \\\n  --output /tmp/td-merged.json --verbose --force\n\nFeatures: Runs all 6 fetch steps in parallel, then merges + deduplicates + scores\nOutput: Final merged JSON ready for report generation (~30s total)\nMetadata: Saves per-step timing and counts to *.meta.json\nGitHub Auth: Auto-generates GitHub App token if $GITHUB_TOKEN not set\nFallback: If this fails, run individual scripts below"
      },
      {
        "title": "Individual Scripts (Fallback)",
        "body": "fetch-rss.py - RSS Feed Fetcher\n\npython3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]\n\nParallel fetching (10 workers), retry with backoff, feedparser + regex fallback\nTimeout: 30s per feed, ETag/Last-Modified caching\n\nfetch-twitter.py - Twitter/X KOL Monitor\n\npython3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--backend auto|official|twitterapiio]\n\nBackend auto-detection: uses twitterapi.io if TWITTERAPI_IO_KEY set, else official X API v2 if X_BEARER_TOKEN set\nRate limit handling, engagement metrics, retry with backoff\n\nfetch-web.py - Web Search Engine\n\npython3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]\n\nAuto-detects Brave API rate limit: paid plans → parallel queries, free → sequential\nWithout API: generates search interface for agents\n\nfetch-github.py - GitHub Releases Monitor\n\npython3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]\n\nParallel fetching (10 workers), 30s timeout\nAuth priority: $GITHUB_TOKEN → GitHub App auto-generate → gh CLI → unauthenticated (60 req/hr)\n\nfetch-github.py --trending - GitHub Trending Repos\n\npython3 scripts/fetch-github.py --trending [--hours 48] [--output FILE] [--verbose]\n\nSearches GitHub API for trending repos across 4 topics (LLM, AI Agent, Crypto, Frontier Tech)\nQuality scoring: base 5 + daily_stars_est / 10, max 15\n\nfetch-reddit.py - Reddit Posts Fetcher\n\npython3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]\n\nParallel fetching (4 workers), public JSON API (no auth required)\n13 subreddits with score filtering\n\nenrich-articles.py - Article Full-Text Enrichment\n\npython3 scripts/enrich-articles.py --input merged.json --output enriched.json [--min-score 10] [--max-articles 15] [--verbose]\n\nFetches full article text for high-scoring articles\nCloudflare Markdown for Agents (preferred) → HTML extraction (fallback) → Skip (paywalled/social)\nBlog domain whitelist with lower score threshold (≥3)\nParallel fetching (5 workers, 10s timeout)\n\nmerge-sources.py - Quality Scoring & Deduplication\n\npython3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --reddit FILE\n\nQuality scoring, title similarity dedup (85%), previous digest penalty\nOutput: topic-grouped articles sorted by score\n\nvalidate-config.py - Configuration Validator\n\npython3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]\n\nJSON schema validation, topic reference checks, duplicate ID detection\n\ngenerate-pdf.py - PDF Report Generator\n\npython3 scripts/generate-pdf.py --input report.md --output digest.pdf [--verbose]\n\nConverts markdown digest to styled A4 PDF with Chinese typography (Noto Sans CJK SC)\nEmoji icons, page headers/footers, blue accent theme. Requires weasyprint.\n\nsanitize-html.py - Safe HTML Email Converter\n\npython3 scripts/sanitize-html.py --input report.md --output email.html [--verbose]\n\nConverts markdown to XSS-safe HTML email with inline CSS\nURL whitelist (http/https only), HTML-escaped text content\n\nsource-health.py - Source Health Monitor\n\npython3 scripts/source-health.py --rss FILE --twitter FILE --github FILE --reddit FILE --web FILE [--verbose]\n\nTracks per-source success/failure history over 7 days\nReports unhealthy sources (>50% failure rate)\n\nsummarize-merged.py - Merged Data Summary\n\npython3 scripts/summarize-merged.py --input merged.json [--top N] [--topic TOPIC]\n\nHuman-readable summary of merged data for LLM consumption\nShows top articles per topic with scores and metrics"
      },
      {
        "title": "Workspace Configuration Override",
        "body": "Place custom configs in workspace/config/ to override defaults:\n\nSources: Append new sources, disable defaults with \"enabled\": false\nTopics: Override topic definitions, search queries, display settings\nMerge Logic:\n\nSources with same id → user version takes precedence\nSources with new id → appended to defaults\nTopics with same id → user version completely replaces default"
      },
      {
        "title": "Example Workspace Override",
        "body": "// workspace/config/tech-news-digest-sources.json\n{\n  \"sources\": [\n    {\n      \"id\": \"simonwillison-rss\",\n      \"enabled\": false,\n      \"note\": \"Disabled: too noisy for my use case\"\n    },\n    {\n      \"id\": \"my-custom-blog\", \n      \"type\": \"rss\",\n      \"name\": \"My Custom Tech Blog\",\n      \"url\": \"https://myblog.com/rss\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"frontier-tech\"]\n    }\n  ]\n}"
      },
      {
        "title": "Discord Template (references/templates/discord.md)",
        "body": "Bullet list format with link suppression (<link>)\nMobile-optimized, emoji headers\n2000 character limit awareness"
      },
      {
        "title": "Email Template (references/templates/email.md)",
        "body": "Rich metadata, technical stats, archive links\nExecutive summary, top articles section\nHTML-compatible formatting"
      },
      {
        "title": "PDF Template (references/templates/pdf.md)",
        "body": "A4 layout with Noto Sans CJK SC font for Chinese support\nEmoji icons, page headers/footers with page numbers\nGenerated via scripts/generate-pdf.py (requires weasyprint)"
      },
      {
        "title": "Default Sources (151 total)",
        "body": "RSS Feeds (62): AI labs, tech blogs, crypto news, Chinese tech media\nTwitter/X KOLs (48): AI researchers, crypto leaders, tech executives\nGitHub Repos (28): Major open-source projects (LangChain, vLLM, DeepSeek, Llama, etc.)\nReddit (13): r/MachineLearning, r/LocalLLaMA, r/CryptoCurrency, r/ChatGPT, r/OpenAI, etc.\nWeb Search (4 topics): LLM, AI Agent, Crypto, Frontier Tech\n\nAll sources pre-configured with appropriate topic tags and priority levels."
      },
      {
        "title": "Dependencies",
        "body": "pip install -r requirements.txt\n\nOptional but Recommended:\n\nfeedparser>=6.0.0 - Better RSS parsing (fallback to regex if unavailable)\njsonschema>=4.0.0 - Configuration validation\n\nAll scripts work with Python 3.8+ standard library only."
      },
      {
        "title": "Health Checks",
        "body": "# Validate configuration\npython3 scripts/validate-config.py --verbose\n\n# Test RSS feeds\npython3 scripts/fetch-rss.py --hours 1 --verbose\n\n# Check Twitter API\npython3 scripts/fetch-twitter.py --hours 1 --verbose"
      },
      {
        "title": "Archive Management",
        "body": "Digests automatically archived to <workspace>/archive/tech-news-digest/\nPrevious digest titles used for duplicate detection\nOld archives cleaned automatically (90+ days)"
      },
      {
        "title": "Error Handling",
        "body": "Network Failures: Retry with exponential backoff\nRate Limits: Automatic retry with appropriate delays\nInvalid Content: Graceful degradation, detailed logging\nConfiguration Errors: Schema validation with helpful messages"
      },
      {
        "title": "API Keys & Environment",
        "body": "Set in ~/.zshenv or similar:\n\n# Twitter (at least one required for Twitter source)\nexport TWITTERAPI_IO_KEY=\"your_key\"        # twitterapi.io key (preferred)\nexport X_BEARER_TOKEN=\"your_bearer_token\"  # Official X API v2 (fallback)\nexport TWITTER_API_BACKEND=\"auto\"          # auto|twitterapiio|official (default: auto)\n\n# Web Search (optional, enables web search layer)\nexport WEB_SEARCH_BACKEND=\"auto\"          # auto|brave|tavily (default: auto)\nexport TAVILY_API_KEY=\"tvly-xxx\"           # Tavily Search API (free 1000/mo)\n\n# Brave Search (alternative)\nexport BRAVE_API_KEYS=\"key1,key2,key3\"     # Multiple keys, comma-separated rotation\nexport BRAVE_API_KEY=\"key1\"                # Single key fallback\nexport BRAVE_PLAN=\"free\"                   # Override rate limit detection: free|pro\n\n# GitHub (optional, improves rate limits)\nexport GITHUB_TOKEN=\"ghp_xxx\"              # PAT (simplest)\nexport GH_APP_ID=\"12345\"                   # Or use GitHub App for auto-token\nexport GH_APP_INSTALL_ID=\"67890\"\nexport GH_APP_KEY_FILE=\"/path/to/key.pem\"\n\nTwitter: TWITTERAPI_IO_KEY preferred ($3-5/mo); X_BEARER_TOKEN as fallback; auto mode tries twitterapiio first\nWeb Search: Tavily (preferred in auto mode) or Brave; optional, fallback to agent web_search if unavailable\nGitHub: Auto-generates token from GitHub App if PAT not set; unauthenticated fallback (60 req/hr)\nReddit: No API key needed (uses public JSON API)"
      },
      {
        "title": "OpenClaw Cron (Recommended)",
        "body": "The cron prompt should NOT hardcode the pipeline steps. Instead, reference references/digest-prompt.md and only pass configuration parameters. This ensures the pipeline logic stays in the skill repo and is consistent across all installations.\n\nDaily Digest Cron Prompt\n\nRead <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a daily digest.\n\nReplace placeholders with:\n- MODE = daily\n- TIME_WINDOW = past 1-2 days\n- FRESHNESS = pd\n- RSS_HOURS = 48\n- ITEMS_PER_SECTION = 3-5\n- ENRICH = true\n- BLOG_PICKS_COUNT = 3\n- EXTRA_SECTIONS = (none)\n- SUBJECT = Daily Tech Digest - YYYY-MM-DD\n- WORKSPACE = <your workspace path>\n- SKILL_DIR = <your skill install path>\n- DISCORD_CHANNEL_ID = <your channel id>\n- EMAIL = (optional)\n- LANGUAGE = English\n- TEMPLATE = discord\n\nFollow every step in the prompt template strictly. Do not skip any steps.\n\nWeekly Digest Cron Prompt\n\nRead <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a weekly digest.\n\nReplace placeholders with:\n- MODE = weekly\n- TIME_WINDOW = past 7 days\n- FRESHNESS = pw\n- RSS_HOURS = 168\n- ITEMS_PER_SECTION = 10-15\n- ENRICH = true\n- BLOG_PICKS_COUNT = 3-5\n- EXTRA_SECTIONS = 📊 Weekly Trend Summary (2-3 sentences summarizing macro trends)\n- SUBJECT = Weekly Tech Digest - YYYY-MM-DD\n- WORKSPACE = <your workspace path>\n- SKILL_DIR = <your skill install path>\n- DISCORD_CHANNEL_ID = <your channel id>\n- EMAIL = (optional)\n- LANGUAGE = English\n- TEMPLATE = discord\n\nFollow every step in the prompt template strictly. Do not skip any steps.\n\nWhy This Pattern?\n\nSingle source of truth: Pipeline logic lives in digest-prompt.md, not scattered across cron configs\nPortable: Same skill on different OpenClaw instances, just change paths and channel IDs\nMaintainable: Update the skill → all cron jobs pick up changes automatically\nAnti-pattern: Do NOT copy pipeline steps into the cron prompt — it will drift out of sync\n\nMulti-Channel Delivery Limitation\n\nOpenClaw enforces cross-provider isolation: a single session can only send messages to one provider (e.g., Discord OR Telegram, not both). If you need to deliver digests to multiple platforms, create separate cron jobs for each provider:\n\n# Job 1: Discord + Email\n- DISCORD_CHANNEL_ID = <your-discord-channel-id>\n- EMAIL = user@example.com\n- TEMPLATE = discord\n\n# Job 2: Telegram DM\n- DISCORD_CHANNEL_ID = (none)\n- EMAIL = (none)\n- TEMPLATE = telegram\n\nReplace DISCORD_CHANNEL_ID delivery with the target platform's delivery in the second job's prompt.\n\nThis is a security feature, not a bug — it prevents accidental cross-context data leakage."
      },
      {
        "title": "Execution Model",
        "body": "This skill uses a prompt template pattern: the agent reads digest-prompt.md and follows its instructions. This is the standard OpenClaw skill execution model — the agent interprets structured instructions from skill-provided files. All instructions are shipped with the skill bundle and can be audited before installation."
      },
      {
        "title": "Network Access",
        "body": "The Python scripts make outbound requests to:\n\nRSS feed URLs (configured in tech-news-digest-sources.json)\nTwitter/X API (api.x.com or api.twitterapi.io)\nBrave Search API (api.search.brave.com)\nTavily Search API (api.tavily.com)\nGitHub API (api.github.com)\nReddit JSON API (reddit.com)\n\nNo data is sent to any other endpoints. All API keys are read from environment variables declared in the skill metadata."
      },
      {
        "title": "Shell Safety",
        "body": "Email delivery uses send-email.py which constructs proper MIME multipart messages with HTML body + optional PDF attachment. Subject formats are hardcoded (Daily Tech Digest - YYYY-MM-DD). PDF generation uses generate-pdf.py via weasyprint. The prompt template explicitly prohibits interpolating untrusted content (article titles, tweet text, etc.) into shell arguments. Email addresses and subjects must be static placeholder values only."
      },
      {
        "title": "File Access",
        "body": "Scripts read from config/ and write to workspace/archive/. No files outside the workspace are accessed."
      },
      {
        "title": "Common Issues",
        "body": "RSS feeds failing: Check network connectivity, use --verbose for details\nTwitter rate limits: Reduce sources or increase interval\nConfiguration errors: Run validate-config.py for specific issues\nNo articles found: Check time window (--hours) and source enablement"
      },
      {
        "title": "Debug Mode",
        "body": "All scripts support --verbose flag for detailed logging and troubleshooting."
      },
      {
        "title": "Performance Tuning",
        "body": "Parallel Workers: Adjust MAX_WORKERS in scripts for your system\nTimeout Settings: Increase TIMEOUT for slow networks\nArticle Limits: Adjust MAX_ARTICLES_PER_FEED based on needs"
      },
      {
        "title": "Shell Execution",
        "body": "The digest prompt instructs agents to run Python scripts via shell commands. All script paths and arguments are skill-defined constants — no user input is interpolated into commands. Two scripts use subprocess:\n\nrun-pipeline.py orchestrates child fetch scripts (all within scripts/ directory)\nfetch-github.py has two subprocess calls:\n\nopenssl dgst -sha256 -sign for JWT signing (only if GH_APP_* env vars are set — signs a self-constructed JWT payload, no user content involved)\ngh auth token CLI fallback (only if gh is installed — reads from gh's own credential store)\n\nNo user-supplied or fetched content is ever interpolated into subprocess arguments. Email delivery uses send-email.py which builds MIME messages programmatically — no shell interpolation. PDF generation uses generate-pdf.py via weasyprint. Email subjects are static format strings only — never constructed from fetched data."
      },
      {
        "title": "Credential & File Access",
        "body": "Scripts do not directly read ~/.config/, ~/.ssh/, or any credential files. All API tokens are read from environment variables declared in the skill metadata. The GitHub auth cascade is:\n\n$GITHUB_TOKEN env var (you control what to provide)\nGitHub App token generation (only if you set GH_APP_ID, GH_APP_INSTALL_ID, and GH_APP_KEY_FILE — uses inline JWT signing via openssl CLI, no external scripts involved)\ngh auth token CLI (delegates to gh's own secure credential store)\nUnauthenticated (60 req/hr, safe fallback)\n\nIf you prefer no automatic credential discovery, simply set $GITHUB_TOKEN and the script will use it directly without attempting steps 2-3."
      },
      {
        "title": "Dependency Installation",
        "body": "This skill does not install any packages. requirements.txt lists optional dependencies (feedparser, jsonschema) for reference only. All scripts work with Python 3.8+ standard library. Users should install optional deps in a virtualenv if desired — the skill never runs pip install."
      },
      {
        "title": "Input Sanitization",
        "body": "URL resolution rejects non-HTTP(S) schemes (javascript:, data:, etc.)\nRSS fallback parsing uses simple, non-backtracking regex patterns (no ReDoS risk)\nAll fetched content is treated as untrusted data for display only"
      },
      {
        "title": "Network Access",
        "body": "Scripts make outbound HTTP requests to configured RSS feeds, Twitter API, GitHub API, Reddit JSON API, Brave Search API, and Tavily Search API. No inbound connections or listeners are created."
      }
    ],
    "body": "Tech News Digest\n\nAutomated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation.\n\nQuick Start\n\nConfiguration Setup: Default configs are in config/defaults/. Copy to workspace for customization:\n\nmkdir -p workspace/config\ncp config/defaults/sources.json workspace/config/tech-news-digest-sources.json\ncp config/defaults/topics.json workspace/config/tech-news-digest-topics.json\n\n\nEnvironment Variables:\n\nTWITTERAPI_IO_KEY - twitterapi.io API key (optional, preferred)\nX_BEARER_TOKEN - Twitter/X official API bearer token (optional, fallback)\nTAVILY_API_KEY - Tavily Search API key, alternative to Brave (optional)\nWEB_SEARCH_BACKEND - Web search backend: auto|brave|tavily (optional, default: auto)\nBRAVE_API_KEYS - Brave Search API keys, comma-separated for rotation (optional)\nBRAVE_API_KEY - Single Brave key fallback (optional)\nGITHUB_TOKEN - GitHub personal access token (optional, improves rate limits)\n\nGenerate Digest:\n\n# Unified pipeline (recommended) — runs all 6 sources in parallel + merge\npython3 scripts/run-pipeline.py \\\n  --defaults config/defaults \\\n  --config workspace/config \\\n  --hours 48 --freshness pd \\\n  --archive-dir workspace/archive/tech-news-digest/ \\\n  --output /tmp/td-merged.json --verbose --force\n\n\nUse Templates: Apply Discord, email, or PDF templates to merged output\n\nConfiguration Files\nsources.json - Unified Data Sources\n{\n  \"sources\": [\n    {\n      \"id\": \"openai-rss\",\n      \"type\": \"rss\",\n      \"name\": \"OpenAI Blog\",\n      \"url\": \"https://openai.com/blog/rss.xml\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"llm\", \"ai-agent\"],\n      \"note\": \"Official OpenAI updates\"\n    },\n    {\n      \"id\": \"sama-twitter\",\n      \"type\": \"twitter\", \n      \"name\": \"Sam Altman\",\n      \"handle\": \"sama\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"llm\", \"frontier-tech\"],\n      \"note\": \"OpenAI CEO\"\n    }\n  ]\n}\n\ntopics.json - Enhanced Topic Definitions\n{\n  \"topics\": [\n    {\n      \"id\": \"llm\",\n      \"emoji\": \"🧠\",\n      \"label\": \"LLM / Large Models\",\n      \"description\": \"Large Language Models, foundation models, breakthroughs\",\n      \"search\": {\n        \"queries\": [\"LLM latest news\", \"large language model breakthroughs\"],\n        \"must_include\": [\"LLM\", \"large language model\", \"foundation model\"],\n        \"exclude\": [\"tutorial\", \"beginner guide\"]\n      },\n      \"display\": {\n        \"max_items\": 8,\n        \"style\": \"detailed\"\n      }\n    }\n  ]\n}\n\nScripts Pipeline\nrun-pipeline.py - Unified Pipeline (Recommended)\npython3 scripts/run-pipeline.py \\\n  --defaults config/defaults [--config CONFIG_DIR] \\\n  --hours 48 --freshness pd \\\n  --archive-dir workspace/archive/tech-news-digest/ \\\n  --output /tmp/td-merged.json --verbose --force\n\nFeatures: Runs all 6 fetch steps in parallel, then merges + deduplicates + scores\nOutput: Final merged JSON ready for report generation (~30s total)\nMetadata: Saves per-step timing and counts to *.meta.json\nGitHub Auth: Auto-generates GitHub App token if $GITHUB_TOKEN not set\nFallback: If this fails, run individual scripts below\nIndividual Scripts (Fallback)\nfetch-rss.py - RSS Feed Fetcher\npython3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]\n\nParallel fetching (10 workers), retry with backoff, feedparser + regex fallback\nTimeout: 30s per feed, ETag/Last-Modified caching\nfetch-twitter.py - Twitter/X KOL Monitor\npython3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--backend auto|official|twitterapiio]\n\nBackend auto-detection: uses twitterapi.io if TWITTERAPI_IO_KEY set, else official X API v2 if X_BEARER_TOKEN set\nRate limit handling, engagement metrics, retry with backoff\nfetch-web.py - Web Search Engine\npython3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]\n\nAuto-detects Brave API rate limit: paid plans → parallel queries, free → sequential\nWithout API: generates search interface for agents\nfetch-github.py - GitHub Releases Monitor\npython3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]\n\nParallel fetching (10 workers), 30s timeout\nAuth priority: $GITHUB_TOKEN → GitHub App auto-generate → gh CLI → unauthenticated (60 req/hr)\nfetch-github.py --trending - GitHub Trending Repos\npython3 scripts/fetch-github.py --trending [--hours 48] [--output FILE] [--verbose]\n\nSearches GitHub API for trending repos across 4 topics (LLM, AI Agent, Crypto, Frontier Tech)\nQuality scoring: base 5 + daily_stars_est / 10, max 15\nfetch-reddit.py - Reddit Posts Fetcher\npython3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]\n\nParallel fetching (4 workers), public JSON API (no auth required)\n13 subreddits with score filtering\nenrich-articles.py - Article Full-Text Enrichment\npython3 scripts/enrich-articles.py --input merged.json --output enriched.json [--min-score 10] [--max-articles 15] [--verbose]\n\nFetches full article text for high-scoring articles\nCloudflare Markdown for Agents (preferred) → HTML extraction (fallback) → Skip (paywalled/social)\nBlog domain whitelist with lower score threshold (≥3)\nParallel fetching (5 workers, 10s timeout)\nmerge-sources.py - Quality Scoring & Deduplication\npython3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --reddit FILE\n\nQuality scoring, title similarity dedup (85%), previous digest penalty\nOutput: topic-grouped articles sorted by score\nvalidate-config.py - Configuration Validator\npython3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]\n\nJSON schema validation, topic reference checks, duplicate ID detection\ngenerate-pdf.py - PDF Report Generator\npython3 scripts/generate-pdf.py --input report.md --output digest.pdf [--verbose]\n\nConverts markdown digest to styled A4 PDF with Chinese typography (Noto Sans CJK SC)\nEmoji icons, page headers/footers, blue accent theme. Requires weasyprint.\nsanitize-html.py - Safe HTML Email Converter\npython3 scripts/sanitize-html.py --input report.md --output email.html [--verbose]\n\nConverts markdown to XSS-safe HTML email with inline CSS\nURL whitelist (http/https only), HTML-escaped text content\nsource-health.py - Source Health Monitor\npython3 scripts/source-health.py --rss FILE --twitter FILE --github FILE --reddit FILE --web FILE [--verbose]\n\nTracks per-source success/failure history over 7 days\nReports unhealthy sources (>50% failure rate)\nsummarize-merged.py - Merged Data Summary\npython3 scripts/summarize-merged.py --input merged.json [--top N] [--topic TOPIC]\n\nHuman-readable summary of merged data for LLM consumption\nShows top articles per topic with scores and metrics\nUser Customization\nWorkspace Configuration Override\n\nPlace custom configs in workspace/config/ to override defaults:\n\nSources: Append new sources, disable defaults with \"enabled\": false\nTopics: Override topic definitions, search queries, display settings\nMerge Logic:\nSources with same id → user version takes precedence\nSources with new id → appended to defaults\nTopics with same id → user version completely replaces default\nExample Workspace Override\n// workspace/config/tech-news-digest-sources.json\n{\n  \"sources\": [\n    {\n      \"id\": \"simonwillison-rss\",\n      \"enabled\": false,\n      \"note\": \"Disabled: too noisy for my use case\"\n    },\n    {\n      \"id\": \"my-custom-blog\", \n      \"type\": \"rss\",\n      \"name\": \"My Custom Tech Blog\",\n      \"url\": \"https://myblog.com/rss\",\n      \"enabled\": true,\n      \"priority\": true,\n      \"topics\": [\"frontier-tech\"]\n    }\n  ]\n}\n\nTemplates & Output\nDiscord Template (references/templates/discord.md)\nBullet list format with link suppression (<link>)\nMobile-optimized, emoji headers\n2000 character limit awareness\nEmail Template (references/templates/email.md)\nRich metadata, technical stats, archive links\nExecutive summary, top articles section\nHTML-compatible formatting\nPDF Template (references/templates/pdf.md)\nA4 layout with Noto Sans CJK SC font for Chinese support\nEmoji icons, page headers/footers with page numbers\nGenerated via scripts/generate-pdf.py (requires weasyprint)\nDefault Sources (151 total)\nRSS Feeds (62): AI labs, tech blogs, crypto news, Chinese tech media\nTwitter/X KOLs (48): AI researchers, crypto leaders, tech executives\nGitHub Repos (28): Major open-source projects (LangChain, vLLM, DeepSeek, Llama, etc.)\nReddit (13): r/MachineLearning, r/LocalLLaMA, r/CryptoCurrency, r/ChatGPT, r/OpenAI, etc.\nWeb Search (4 topics): LLM, AI Agent, Crypto, Frontier Tech\n\nAll sources pre-configured with appropriate topic tags and priority levels.\n\nDependencies\npip install -r requirements.txt\n\n\nOptional but Recommended:\n\nfeedparser>=6.0.0 - Better RSS parsing (fallback to regex if unavailable)\njsonschema>=4.0.0 - Configuration validation\n\nAll scripts work with Python 3.8+ standard library only.\n\nMonitoring & Operations\nHealth Checks\n# Validate configuration\npython3 scripts/validate-config.py --verbose\n\n# Test RSS feeds\npython3 scripts/fetch-rss.py --hours 1 --verbose\n\n# Check Twitter API\npython3 scripts/fetch-twitter.py --hours 1 --verbose\n\nArchive Management\nDigests automatically archived to <workspace>/archive/tech-news-digest/\nPrevious digest titles used for duplicate detection\nOld archives cleaned automatically (90+ days)\nError Handling\nNetwork Failures: Retry with exponential backoff\nRate Limits: Automatic retry with appropriate delays\nInvalid Content: Graceful degradation, detailed logging\nConfiguration Errors: Schema validation with helpful messages\nAPI Keys & Environment\n\nSet in ~/.zshenv or similar:\n\n# Twitter (at least one required for Twitter source)\nexport TWITTERAPI_IO_KEY=\"your_key\"        # twitterapi.io key (preferred)\nexport X_BEARER_TOKEN=\"your_bearer_token\"  # Official X API v2 (fallback)\nexport TWITTER_API_BACKEND=\"auto\"          # auto|twitterapiio|official (default: auto)\n\n# Web Search (optional, enables web search layer)\nexport WEB_SEARCH_BACKEND=\"auto\"          # auto|brave|tavily (default: auto)\nexport TAVILY_API_KEY=\"tvly-xxx\"           # Tavily Search API (free 1000/mo)\n\n# Brave Search (alternative)\nexport BRAVE_API_KEYS=\"key1,key2,key3\"     # Multiple keys, comma-separated rotation\nexport BRAVE_API_KEY=\"key1\"                # Single key fallback\nexport BRAVE_PLAN=\"free\"                   # Override rate limit detection: free|pro\n\n# GitHub (optional, improves rate limits)\nexport GITHUB_TOKEN=\"ghp_xxx\"              # PAT (simplest)\nexport GH_APP_ID=\"12345\"                   # Or use GitHub App for auto-token\nexport GH_APP_INSTALL_ID=\"67890\"\nexport GH_APP_KEY_FILE=\"/path/to/key.pem\"\n\nTwitter: TWITTERAPI_IO_KEY preferred ($3-5/mo); X_BEARER_TOKEN as fallback; auto mode tries twitterapiio first\nWeb Search: Tavily (preferred in auto mode) or Brave; optional, fallback to agent web_search if unavailable\nGitHub: Auto-generates token from GitHub App if PAT not set; unauthenticated fallback (60 req/hr)\nReddit: No API key needed (uses public JSON API)\nCron / Scheduled Task Integration\nOpenClaw Cron (Recommended)\n\nThe cron prompt should NOT hardcode the pipeline steps. Instead, reference references/digest-prompt.md and only pass configuration parameters. This ensures the pipeline logic stays in the skill repo and is consistent across all installations.\n\nDaily Digest Cron Prompt\nRead <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a daily digest.\n\nReplace placeholders with:\n- MODE = daily\n- TIME_WINDOW = past 1-2 days\n- FRESHNESS = pd\n- RSS_HOURS = 48\n- ITEMS_PER_SECTION = 3-5\n- ENRICH = true\n- BLOG_PICKS_COUNT = 3\n- EXTRA_SECTIONS = (none)\n- SUBJECT = Daily Tech Digest - YYYY-MM-DD\n- WORKSPACE = <your workspace path>\n- SKILL_DIR = <your skill install path>\n- DISCORD_CHANNEL_ID = <your channel id>\n- EMAIL = (optional)\n- LANGUAGE = English\n- TEMPLATE = discord\n\nFollow every step in the prompt template strictly. Do not skip any steps.\n\nWeekly Digest Cron Prompt\nRead <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a weekly digest.\n\nReplace placeholders with:\n- MODE = weekly\n- TIME_WINDOW = past 7 days\n- FRESHNESS = pw\n- RSS_HOURS = 168\n- ITEMS_PER_SECTION = 10-15\n- ENRICH = true\n- BLOG_PICKS_COUNT = 3-5\n- EXTRA_SECTIONS = 📊 Weekly Trend Summary (2-3 sentences summarizing macro trends)\n- SUBJECT = Weekly Tech Digest - YYYY-MM-DD\n- WORKSPACE = <your workspace path>\n- SKILL_DIR = <your skill install path>\n- DISCORD_CHANNEL_ID = <your channel id>\n- EMAIL = (optional)\n- LANGUAGE = English\n- TEMPLATE = discord\n\nFollow every step in the prompt template strictly. Do not skip any steps.\n\nWhy This Pattern?\nSingle source of truth: Pipeline logic lives in digest-prompt.md, not scattered across cron configs\nPortable: Same skill on different OpenClaw instances, just change paths and channel IDs\nMaintainable: Update the skill → all cron jobs pick up changes automatically\nAnti-pattern: Do NOT copy pipeline steps into the cron prompt — it will drift out of sync\nMulti-Channel Delivery Limitation\n\nOpenClaw enforces cross-provider isolation: a single session can only send messages to one provider (e.g., Discord OR Telegram, not both). If you need to deliver digests to multiple platforms, create separate cron jobs for each provider:\n\n# Job 1: Discord + Email\n- DISCORD_CHANNEL_ID = <your-discord-channel-id>\n- EMAIL = user@example.com\n- TEMPLATE = discord\n\n# Job 2: Telegram DM\n- DISCORD_CHANNEL_ID = (none)\n- EMAIL = (none)\n- TEMPLATE = telegram\n\n\nReplace DISCORD_CHANNEL_ID delivery with the target platform's delivery in the second job's prompt.\n\nThis is a security feature, not a bug — it prevents accidental cross-context data leakage.\n\nSecurity Notes\nExecution Model\n\nThis skill uses a prompt template pattern: the agent reads digest-prompt.md and follows its instructions. This is the standard OpenClaw skill execution model — the agent interprets structured instructions from skill-provided files. All instructions are shipped with the skill bundle and can be audited before installation.\n\nNetwork Access\n\nThe Python scripts make outbound requests to:\n\nRSS feed URLs (configured in tech-news-digest-sources.json)\nTwitter/X API (api.x.com or api.twitterapi.io)\nBrave Search API (api.search.brave.com)\nTavily Search API (api.tavily.com)\nGitHub API (api.github.com)\nReddit JSON API (reddit.com)\n\nNo data is sent to any other endpoints. All API keys are read from environment variables declared in the skill metadata.\n\nShell Safety\n\nEmail delivery uses send-email.py which constructs proper MIME multipart messages with HTML body + optional PDF attachment. Subject formats are hardcoded (Daily Tech Digest - YYYY-MM-DD). PDF generation uses generate-pdf.py via weasyprint. The prompt template explicitly prohibits interpolating untrusted content (article titles, tweet text, etc.) into shell arguments. Email addresses and subjects must be static placeholder values only.\n\nFile Access\n\nScripts read from config/ and write to workspace/archive/. No files outside the workspace are accessed.\n\nSupport & Troubleshooting\nCommon Issues\nRSS feeds failing: Check network connectivity, use --verbose for details\nTwitter rate limits: Reduce sources or increase interval\nConfiguration errors: Run validate-config.py for specific issues\nNo articles found: Check time window (--hours) and source enablement\nDebug Mode\n\nAll scripts support --verbose flag for detailed logging and troubleshooting.\n\nPerformance Tuning\nParallel Workers: Adjust MAX_WORKERS in scripts for your system\nTimeout Settings: Increase TIMEOUT for slow networks\nArticle Limits: Adjust MAX_ARTICLES_PER_FEED based on needs\nSecurity Considerations\nShell Execution\n\nThe digest prompt instructs agents to run Python scripts via shell commands. All script paths and arguments are skill-defined constants — no user input is interpolated into commands. Two scripts use subprocess:\n\nrun-pipeline.py orchestrates child fetch scripts (all within scripts/ directory)\nfetch-github.py has two subprocess calls:\nopenssl dgst -sha256 -sign for JWT signing (only if GH_APP_* env vars are set — signs a self-constructed JWT payload, no user content involved)\ngh auth token CLI fallback (only if gh is installed — reads from gh's own credential store)\n\nNo user-supplied or fetched content is ever interpolated into subprocess arguments. Email delivery uses send-email.py which builds MIME messages programmatically — no shell interpolation. PDF generation uses generate-pdf.py via weasyprint. Email subjects are static format strings only — never constructed from fetched data.\n\nCredential & File Access\n\nScripts do not directly read ~/.config/, ~/.ssh/, or any credential files. All API tokens are read from environment variables declared in the skill metadata. The GitHub auth cascade is:\n\n$GITHUB_TOKEN env var (you control what to provide)\nGitHub App token generation (only if you set GH_APP_ID, GH_APP_INSTALL_ID, and GH_APP_KEY_FILE — uses inline JWT signing via openssl CLI, no external scripts involved)\ngh auth token CLI (delegates to gh's own secure credential store)\nUnauthenticated (60 req/hr, safe fallback)\n\nIf you prefer no automatic credential discovery, simply set $GITHUB_TOKEN and the script will use it directly without attempting steps 2-3.\n\nDependency Installation\n\nThis skill does not install any packages. requirements.txt lists optional dependencies (feedparser, jsonschema) for reference only. All scripts work with Python 3.8+ standard library. Users should install optional deps in a virtualenv if desired — the skill never runs pip install.\n\nInput Sanitization\nURL resolution rejects non-HTTP(S) schemes (javascript:, data:, etc.)\nRSS fallback parsing uses simple, non-backtracking regex patterns (no ReDoS risk)\nAll fetched content is treated as untrusted data for display only\nNetwork Access\n\nScripts make outbound HTTP requests to configured RSS feeds, Twitter API, GitHub API, Reddit JSON API, Brave Search API, and Tavily Search API. No inbound connections or listeners are created."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/dinstein/tech-news-digest",
    "publisherUrl": "https://clawhub.ai/dinstein/tech-news-digest",
    "owner": "dinstein",
    "version": "3.15.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/tech-news-digest",
    "downloadUrl": "https://openagent3.xyz/downloads/tech-news-digest",
    "agentUrl": "https://openagent3.xyz/skills/tech-news-digest/agent",
    "manifestUrl": "https://openagent3.xyz/skills/tech-news-digest/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/tech-news-digest/agent.md"
  }
}