Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.
Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Run the full pipeline by composing three sub-skills.
arxiv-search-collector arxiv-paper-processor arxiv-batch-reporter
language: manual language parameter used by all stages. Default is English when omitted. paper_processing_mode: subagent_parallel or serial. max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.
Initialize one run with arxiv-search-collector/scripts/init_collection_run.py. Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only). Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended). (Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch. Model reads each indexed query list and decides keep indexes. Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py. If relevance/coverage is still not good, iterate Stage A: generate another query plan with new labels, fetch again, re-merge with --incremental and updated selection-json. set weak labels to empty keep list ([]) to explicitly drop them. Pass --language <LANG> to collector scripts so all generated markdown files in Stage A follow the selected language. Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4). Default collector settings already include retries/backoff and run-local throttle state (<run_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary. Prefer cache reuse (no --force) unless query parameters changed or data refresh is required. Output: one run directory with per-paper metadata subdirectories.
For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce <paper_dir>/summary.md. Recommended pre-step for many papers: Run one batch artifact download before per-paper reading: python3 arxiv-paper-processor/scripts/download_papers_batch.py \ --run-dir /path/to/run \ --artifact source_then_pdf \ --max-workers 3 \ --min-interval-sec 5 \ --language <LANG> Per-paper execution steps (inside arxiv-paper-processor): If <paper_dir>/summary.md already exists and is complete, skip this paper. If usable source (source/source_extract/*.tex) or PDF (source/paper.pdf) already exists, skip download. If artifacts are missing, download source with arxiv-paper-processor/scripts/download_arxiv_source.py. If source is unusable, download PDF with arxiv-paper-processor/scripts/download_arxiv_pdf.py. Model reads content and manually writes <paper_dir>/summary.md by reference format, in language. Parallel strategy for many papers: Default: paper_processing_mode=subagent_parallel with max_parallel_papers=5. Optional: paper_processing_mode=serial to process one paper at a time. In parallel mode, run multiple arxiv-paper-processor instances in batches; concurrent papers must not exceed max_parallel_papers. Wait for one batch to finish before starting the next batch. In serial mode, run exactly one arxiv-paper-processor instance at a time. Subagent workers should only own one paper directory each to avoid file conflicts. Do not use scripts to auto-compose summary text; scripts are download-only tools. Output: all paper directories contain summary.md.
Run arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language <LANG>. Model reads summaries_bundle.md and writes collection_report_template.md in base dir. In template, each paper leaf entry must include one standalone placeholder line: {{ARXIV_BRIEF:<arxiv_id>}}. Run arxiv-batch-reporter/scripts/render_collection_report.py to generate final collection_report.md. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper summary.md section 10 via script injection. If language is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.
This orchestrator is suitable for cron/scheduled execution in OpenClaw: Frequency examples: daily, weekly, monthly. For rolling windows, use lookback (1d, 7d, 30d) when initializing runs.
<output-root>/<topic>-<timestamp>-<range>/ task_meta.json, task_meta.md query_results/, query_selection/ <arxiv_id>/metadata.md + downloaded source/pdf + summary.md summaries_bundle.md collection_report_template.md final rendered collection report (e.g. collection_report.md) Use references/workflow-checklist.md as execution checklist.
This is the top-level orchestration skill. Before using it, install and enable these three sub-skills: arxiv-search-collector arxiv-paper-processor arxiv-batch-reporter Execution order inside this orchestrator: arxiv-search-collector (Stage A) arxiv-paper-processor (Stage B) arxiv-batch-reporter (Stage C)
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.