Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud...
Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
π 100% local β crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost. π Semantic search on paper content β FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain. π arXiv category-based β tracks official arXiv categories (155 available, 8 groups). No free-text queries. π§Ή Auto-cleanup β configurable expiry deletes old papers, PDFs, and chunks.
python3 scripts/install.py Works on macOS and Linux. Installs Python deps (faiss-cpu, pdfplumber, tiktoken, arxiv, numpy), pulls nomic-embed-text via Ollama, creates data directories and DB.
Ollama β must be installed and running (ollama serve) Python 3.10+
# 1. Add arXiv categories to track akb categories add cs.AI cs.CV cs.LG # 2. Browse all available categories akb categories browse # 3. Ingest recent papers (last 7 days) akb ingest # 4. Check stats akb stats
akb categories list # Show enabled categories akb categories browse # Browse all 155 arXiv categories akb categories browse robotics # Filter by keyword akb categories add cs.AI cs.RO # Enable categories akb categories delete cs.AI # Disable a category Categories are official arXiv codes (e.g. cs.AI, eess.IV, q-fin.ST). The full taxonomy is built in.
akb ingest # Crawl, download PDFs, chunk, embed akb ingest --days 14 # Look back 14 days akb ingest --dry-run # Preview only akb ingest --no-pdf # Index abstracts only (faster) Pipeline: arXiv API β PDF download β text extraction (pdfplumber) β chunking (tiktoken, 500 tokens, 50 overlap) β embedding (Ollama nomic-embed-text) β FAISS + SQLite.
akb paper 2401.12345 # Show title, abstract, categories, PDF status
akb stats # Papers, chunks, categories, DB size
akb expire # Delete papers older than 90 days (default) akb expire --days 30 # Override: delete papers older than 30 days akb expire --days 30 -y # Skip confirmation
No config file needed. Defaults: SettingDefaultOverrideData directory~/workspace/arxivkbARXIVKB_DATA_DIR env or --data-dirOllama endpointhttp://localhost:11434β (hardcoded)Embedding modelnomic-embed-text (768d)β (hardcoded)Chunk size500 tokens, 50 overlapβExpiry90 days--days flag
~/workspace/arxivkb/ βββ arxivkb.db # SQLite: papers, chunks, translations, categories βββ pdfs/ # Downloaded PDF files ({arxiv_id}.pdf) βββ faiss/ βββ arxivkb.faiss # FAISS IndexFlatIP (chunk embeddings)
papers: id, arxiv_id, title, abstract, categories, published, status, created_at chunks: id, paper_id, section, chunk_index, text, faiss_id, created_at translations: paper_id, language, abstract, created_at (PK: paper_id+language) categories: code, description, group_name, enabled, added_at (155 entries)
When this skill is installed, the agent recognizes /akb as a shortcut: CommandAction/akb listShow enabled categories/akb add cs.AI cs.ROEnable categories for crawling/akb remove cs.AIDisable a category/akb browseBrowse all 155 arXiv categories/akb browse roboticsFilter categories by keyword/akb statsShow paper/chunk/category counts/akb helpShow available commands The agent runs these via the akb CLI internally.
A companion PWA dashboard is available. Provides: Semantic search across paper content Paper detail with abstract translation (on-demand via LLM) Inline PDF viewing Category browser Stats (papers, chunks, categories)
scripts/ βββ cli.py # CLI β categories, ingest, paper, stats, expire βββ db.py # SQLite schema + CRUD βββ arxiv_crawler.py # arXiv API search + PDF download βββ arxiv_taxonomy.py # Full arXiv category taxonomy (155 categories) βββ pdf_processor.py # PDF text extraction + tiktoken chunking βββ embed.py # Ollama nomic-embed-text (768d, normalized) βββ faiss_index.py # FAISS IndexFlatIP manager βββ search.py # Semantic search: query β FAISS β group by paper βββ install.py # One-command installer
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.