Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Cross-reference GitHub PRs and issues to find duplicates and missing links. Spawns parallel Sonnet subagents to semantically analyze the last N PRs and issue...
Cross-reference GitHub PRs and issues to find duplicates and missing links. Spawns parallel Sonnet subagents to semantically analyze the last N PRs and issue...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
You find hidden connections between PRs and issues that humans miss at scale. The core loop is: fetch β analyze in parallel β cluster β verify β report β act. Before doing anything, read references/principles.md. Those rules override everything in this file when there's a conflict.
Repos accumulate duplicate PRs and orphaned issueβPR links over time. Manual cross-referencing doesn't scale past a few dozen items. This skill uses parallel Sonnet subagents to analyze up to 1000 PRs and 1000 issues simultaneously, finding two kinds of links: Duplicate PRs β PRs that address the same bug or feature (even with different approaches or wording) IssueβPR links β Open issues that already have a PR solving them but no explicit "fixes #N" reference Results are grouped into thematic clusters, scored by actionability, and presented with available actions (comment, close, label) β not just as a flat list of pairs.
The user provides these at invocation time (ask if not given): ParameterDefaultDescriptionrepo(ask)GitHub owner/repo to analyzepr_count1000How many recent PRs to scanissue_count1000How many recent issues to scanpr_stateallPR state filter: open, closed, allissue_stateopenIssue state filter: open, closed, allbatch_size50PRs per subagent batchconfidence_thresholdmediumMinimum confidence to include in report: low, medium, highmodeplanplan = report only (default, always start here). execute = act on findings. Default mode is plan (dry-run). The skill always starts by generating the report. The user must explicitly choose to execute actions after reviewing the findings. This matters because actions can't be undone.
Fetch PR and issue metadata from the GitHub API. This phase is deterministic and uses the shell script β no AI needed. scripts/fetch-data.sh <owner/repo> <workspace_dir> [pr_count] [issue_count] [pr_state] [issue_state] This produces: workspace/prs.json β Full PR metadata workspace/issues.json β Full issue metadata (PRs filtered out) workspace/existing-refs.json β Pre-extracted explicit cross-references workspace/pr-index.txt β Compact one-line-per-PR index workspace/issue-index.txt β Compact one-line-per-issue index The existing references map captures what's already linked (via "fixes #N", "closes #N", etc.) so subagents can focus on what's missing.
After all subagents return: Collect all JSON results into a single array Deduplicate duplicate_pr entries (AβB and BβA are the same link) Merge confidence β if two subagents found the same link, take the higher confidence and merge both evidence strings Filter by confidence_threshold Build clusters β group related findings into thematic clusters (see below) Score clusters by actionability (see below) Sort clusters by score (highest first) Save to workspace/results-unverified.json. Clustering Algorithm Instead of reporting isolated pairs, group connected findings into clusters. Two findings belong to the same cluster if they share any PR or issue number. Example: If you find PR#100 β PR#101 (duplicate) and PR#100 β Issue#50 (link), these form a single cluster: "Cluster: Issue#50 + PR#100 + PR#101". Cluster structure: { "cluster_id": 1, "theme": "Onboard token mismatch β OPENCLAW_GATEWAY_TOKEN ignored", "items": ["PR#22662", "PR#22658", "Issue#22638"], "findings": [ ...individual findings in this cluster... ], "score": 8.5, "cluster_status": "actionable|needs_review|manual_review_required", "suggested_actions": [ ...see Phase 4b... ] } The theme is a one-line summary that describes what this cluster is about β the shared root cause or feature area. Generate it from the root_cause fields of the cluster's findings. Actionability Scoring Each cluster gets a score based on these signals (clamp result to 0-10): SignalPointsWhy it mattersAll items open+3Can still be acted onAt least one high-confidence finding+2Strong evidenceMultiple findings in cluster+1More connections = more valueIssue has >5 reactions/comments+1High community interestPR is not draft+1Ready for reviewCluster has a clear canonical PR+1Easy to pick a winnerAny manual_review_required-2Needs human judgmentAll items closed-3Low urgency Clusters scoring 7+ are actionable (green in report). Clusters scoring 4-6 need review (yellow). Clusters scoring 0-3 are low priority (gray).
For each cluster, suggest appropriate actions based on confidence and item states. For duplicate PRs (high confidence, both open): π¬ Comment β link the PRs so authors can coordinate π·οΈ Label β add duplicate label to the weaker PR β Close β close the weaker PR as duplicate (only if very clear) For duplicate PRs (one open, one closed): π¬ Comment β note the connection for context (lower priority) For issueβPR links (high confidence): π¬ Comment on issue β note that a PR addresses this π·οΈ Label issue β add has-pr or similar For manual_review_required items: β οΈ Flag for human β present in a separate section, no automated action Action rules: Never suggest closing without high confidence + verification Never suggest labeling without at least medium confidence Always suggest commenting as the minimum action (it's the safest) For clusters with mixed confidence, suggest the action matching the lowest-confidence finding (conservative)
After presenting the report, ask the user how they want to proceed. Read references/commenting-strategy.md for rate-limiting details. Present action choices per cluster: For each actionable cluster, let the user pick: Comment only β just link the items Comment + label β link and add labels Comment + close β link and close duplicates (high confidence only) Skip β do nothing for this cluster Manual β I'll handle this one myself Then present the timing strategy. Read references/commenting-strategy.md for the full tier definitions, rate calculations, and daily budget math. Present the user with the strategy table from that file, populated with the actual counts from the report. If total actions exceed the daily budget, show the multi-day plan as described in commenting-strategy.md. Always offer Dry Run (report only, no actions) as the default choice. Also offer Skip β save the report but don't act at all.
If the user chooses to act, build workspace/approved-comments.json and execute with rate limiting via the shell script. approved-comments.json schema (array of objects): [ { "target_number": 1234, "type": "issue_link|duplicate_pr", "body": "The full comment text to post", "cluster_id": 1, "finding_index": 0 } ] target_number β the issue or PR number to comment on (used by post-comments.sh) type β finding type, used for logging only body β the complete comment text cluster_id and finding_index β traceability back to the report scripts/post-comments.sh <owner/repo> <workspace_dir> [jitter_min] [jitter_max] [daily_max] For label and close actions, execute them inline (not via the script) since they don't need the same rate limiting as comments: # Label (works for both issues and PRs β GitHub treats PRs as issues for labels) gh issue edit {number} --add-label duplicate --repo {owner/repo} # Close PR as duplicate (use heredoc for safe body passing) gh pr close {number} --comment "$(cat <<'EOF' Closing in favor of #{canonical_pr_number} by @{canonical_author}, which covers the same change ({root_cause_sentence}). Thanks for the contribution, @{closed_pr_author} β your work helped confirm this was worth fixing. _If this closure is wrong, reopen and let me know._ EOF )" --repo {owner/repo} Always execute in this order within a cluster: Post comments first (so the context exists before close/label) Add labels Close (only after comment is posted) Comment style: Comments should feel like they're from a helpful maintainer, not a bot. Vary the opener and closer for each comment to avoid sounding repetitive. Always mention the PR author by name. Comment templates (vary the opener each time): Openers (rotate through these, never use the same one twice in a row): "Heads up β this might be related." "Worth a look:" "Noticed a possible connection here." "This could be relevant to what you're working on." For issueβPR links (comment on the issue): {opener} PR #{pr_number} by @{author} ({pr_title}) appears to address this issue. {root_cause_sentence} _If this doesn't look right, let me know and I'll correct the link._ For duplicate PRs (comment on the newer PR): {opener} PR #{other_pr_number} by @{other_author} ({other_pr_title}) seems to address the same problem. {root_cause_sentence} Both approaches have merit β might be worth coordinating. _If these aren't actually related, let me know and I'll correct this._ Every comment includes a correction path because wrong links erode trust. Save progress to workspace/comment-progress.json for resume support.
API rate limit hit: Pause, show remaining reset time, save progress. Subagent returns invalid JSON: Log the error, skip that batch, warn user. Don't retry β the batch results are lost but other batches continue. PR/issue not found (deleted): Skip silently, note in report. Network error during commenting: Save progress immediately, offer resume. Subagent returns empty results: Normal β not every batch has links. Close/label fails: Log the error, continue with remaining actions. Never retry a close β the user should investigate manually.
cross-ref-workspace/ βββ prs.json # Raw PR metadata βββ issues.json # Raw issue metadata βββ pr-index.txt # Compact PR index (one line per PR) βββ issue-index.txt # Compact issue index (one line per issue) βββ existing-refs.json # Pre-extracted explicit references βββ batches/ β βββ batch-01-results.json # Subagent results per batch β βββ batch-02-results.json β βββ ... βββ results-unverified.json # Raw merged findings (before verification) βββ results.json # Verified findings with clusters βββ report.md # Human-readable report βββ approved-comments.json # Comments approved for posting βββ comment-progress.json # Commenting progress tracker βββ pending-comments.json # Links not yet commented (if day limit hit)
If a previous run exists in the workspace: Phase 1-3: Skip if results.json exists and user confirms Phase 4: Skip if report.md exists and user confirms Phase 5-6: Resume from comment-progress.json if commenting was interrupted Ask: "Found a previous run with {N} results. Resume commenting or start fresh?"
Start with a smaller count (100 PRs, 100 issues) to validate before scaling Always review the report in plan mode before executing actions The compact index approach keeps memory usage manageable β don't fetch full PR bodies (500 char truncation is intentional) For very active repos (>10K PRs), increase batch_size to reduce subagent count Token costs: ~20 subagent calls for 1000 PRs at batch_size=50, each with ~120KB context. Plan accordingly. The gh CLI token needs repo scope (private) or public_repo (public), plus issues:write for posting comments.
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.