Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Search, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to refe...
Search, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to refe...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Manage academic literature collections: search โ download โ convert โ organize โ verify.
pdftotext (poppler-utils) โ PDF text extraction curl โ downloading python3 โ JSON processing in audit file (coreutils) โ PDF validation uvx markitdown[pdf] (optional) โ fallback PDFโMD converter (note: plain uvx markitdown does NOT work for PDFs โ must use uvx markitdown[pdf])
# Download a single paper by DOI bash scripts/download.sh "10.1038/s41592-024-02200-1" output_dir/ # Convert PDF to markdown bash scripts/convert.sh paper.pdf output.md # Verify a single PDF+MD pair bash scripts/verify.sh paper.pdf paper.md # Full audit of a references/ folder bash scripts/audit.sh /path/to/references/
Use web_fetch on Google Scholar: https://scholar.google.com/scholar?q=QUERY&as_ylo=YEAR Extract: title, authors, year, journal, DOI, PDF links. For each result, identify the best open-access PDF source (see Download Strategy).
Run scripts/download.sh <DOI_or_URL> <output_dir/> per paper. The script tries sources in order: Direct publisher PDF (Nature, eLife, Frontiers, PNAS, bioRxiv, arXiv) EuropePMC (PMC_ID โ PDF) bioRxiv/arXiv preprint Sci-Hub โ https://sci-hub.box/<DOI> (use when publisher is paywalled) # Sci-Hub download example: curl -L "https://sci-hub.box/10.1038/nature12345" -o paper.pdf โ ๏ธ Legal note: Sci-Hub may violate publisher terms of service or copyright law in some jurisdictions. Use only if you understand and accept the legal implications in your context. If all sources fail (including Sci-Hub), flag as permanent paywall. Provide the user with the DOI and ask for manual download.
Run scripts/convert.sh <input.pdf> <output.md>. Uses pdftotext (reliable) with uvx markitdown[pdf] as fallback. # Correct markitdown command for PDFs: uvx markitdown[pdf] input.pdf > output.md # โ ๏ธ The following will NOT work for PDFs (missing [pdf] extra): # uvx markitdown input.pdf Prefer uvx markitdown[pdf] over pdftotext when full fidelity (tables, figures captions) matters.
Standard folder structure: references/ โโโ README.md # Human index (summaries per category) โโโ index.json # Machine index (structured metadata) โโโ RESOURCES.md # Code repos + datasets โโโ resources.json # Structured version โโโ <category-1>/ โ โโโ papers/ # PDFs โ โโโ markdown/ # Converted text โโโ <category-N>/ โโโ papers/ โโโ markdown/ Categories are user-defined. Number-prefix for sort order (e.g., 01-theoretical-frameworks/). index.json schema per paper { "id": "short_id", "title": "Full title", "authors": ["Author1", "Author2"], "year": 2024, "journal": "Journal Name", "doi": "10.xxxx/...", "category": "category_name", "subcategory": "optional", "pdf_path": "category/papers/filename.pdf", "markdown_path": "category/markdown/filename.md", "tags": ["tag1", "tag2"], "one_line_summary": "English one-liner", "key_concepts": ["concept1"], "relevance_to_project": "English description" } README.md pattern Per category section, per paper: title, authors, year, journal, DOI, short summary in user's language.
Downloaded files are often named using DOI format rather than AuthorYear: 10-1038_ncomms3018.md # DOI: 10.1038/ncomms3018 10-1016_j-neuron-2015-03-034.md When markdown_path entries in index.json become stale (e.g., after folder reorganization), maintain a separate mapping file: // temp/paper_md_mapping.json { "author2024_keyword": "references/new-downloads/10-1038_s41592-024-02200-1.md", ... } To build this mapping: cross-reference each paper's DOI in index.json against actual files on disk. Use find + Python to automate. index.json Known Pitfalls id: null corruption: If many entries have id=null and share the same pdf_path, the index was likely corrupted during a batch write. Rebuild from actual files on disk. DOI errors: Verify DOIs resolve correctly โ typos in DOI fields are common (e.g., wrong suffix digits). Always cross-check with publisher page. Dead markdown_path: After restructuring folders, markdown_path in index.json often points to old locations. Use the mapping file above as the source of truth.
Run scripts/audit.sh <references_dir/> for full verification: Every PDF is valid (file -b = PDF) Every PDF title matches filename (pdftotext | head) Every PDF has matching markdown (and vice versa) index.json is valid, complete, paths exist, no duplicate IDs README.md stats match actual counts
For tool/method papers, find GitHub repos and public datasets. Store in RESOURCES.md + resources.json.
For large batches, parallelize: Download: 1 sub-agent per batch of ~5-8 papers Organize: 1 sub-agent to build indexes Verify: 1 independent sub-agent (never the same as organizer) Always use a separate sub-agent for verification (QC should not self-grade).
One batch at a time โ do not spawn multiple note-writing batches simultaneously; LLM rate limits will cause silent failures Set a cron monitor whenever spawning long-running agents โ agents can fail silently without triggering auto-announce; cron catches this Cron monitor pattern: 1. Spawn agent(s) 2. Immediately set a cron job (every 10-15 min, isolated agentTurn) โ Check if expected output files exist โ Re-spawn failed agents โ When all complete: announce + delete cron 3. After task finishes, confirm cron was removed
To add papers to an existing collection: Download + convert new papers into correct category folder Append entries to index.json Update README.md stats Run audit to verify consistency
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.