Tencent SkillHub · Developer Tools

arxivkb

Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud...

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: README.md, SKILL.md, scripts/arxiv_crawler.py, scripts/arxiv_taxonomy.py, scripts/cli.py, scripts/db.py

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.1

Provenance

Publisher: camopel
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 15 sections Open source page

Why This Skill?

🏠 100% local — crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost. 🔍 Semantic search on paper content — FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain. 📂 arXiv category-based — tracks official arXiv categories (155 available, 8 groups). No free-text queries. 🧹 Auto-cleanup — configurable expiry deletes old papers, PDFs, and chunks.

Install

python3 scripts/install.py Works on macOS and Linux. Installs Python deps (faiss-cpu, pdfplumber, tiktoken, arxiv, numpy), pulls nomic-embed-text via Ollama, creates data directories and DB.

Prerequisites

Ollama — must be installed and running (ollama serve) Python 3.10+

Quick Start

# 1. Add arXiv categories to track akb categories add cs.AI cs.CV cs.LG # 2. Browse all available categories akb categories browse # 3. Ingest recent papers (last 7 days) akb ingest # 4. Check stats akb stats

Ingestion

akb ingest # Crawl, download PDFs, chunk, embed akb ingest --days 14 # Look back 14 days akb ingest --dry-run # Preview only akb ingest --no-pdf # Index abstracts only (faster) Pipeline: arXiv API → PDF download → text extraction (pdfplumber) → chunking (tiktoken, 500 tokens, 50 overlap) → embedding (Ollama nomic-embed-text) → FAISS + SQLite.

Paper Details

akb paper 2401.12345 # Show title, abstract, categories, PDF status

Statistics

akb stats # Papers, chunks, categories, DB size

Expiry & Cleanup

akb expire # Delete papers older than 90 days (default) akb expire --days 30 # Override: delete papers older than 30 days akb expire --days 30 -y # Skip confirmation

Configuration

No config file needed. Defaults: SettingDefaultOverrideData directory~/workspace/arxivkbARXIVKB_DATA_DIR env or --data-dirOllama endpointhttp://localhost:11434— (hardcoded)Embedding modelnomic-embed-text (768d)— (hardcoded)Chunk size500 tokens, 50 overlap—Expiry90 days--days flag

Data Layout

~/workspace/arxivkb/ ├── arxivkb.db # SQLite: papers, chunks, translations, categories ├── pdfs/ # Downloaded PDF files ({arxiv_id}.pdf) └── faiss/ └── arxivkb.faiss # FAISS IndexFlatIP (chunk embeddings)

DB Schema

papers: id, arxiv_id, title, abstract, categories, published, status, created_at chunks: id, paper_id, section, chunk_index, text, faiss_id, created_at translations: paper_id, language, abstract, created_at (PK: paper_id+language) categories: code, description, group_name, enabled, added_at (155 entries)

💬 Chat Commands (OpenClaw Agent)

When this skill is installed, the agent recognizes /akb as a shortcut: CommandAction/akb listShow enabled categories/akb add cs.AI cs.ROEnable categories for crawling/akb remove cs.AIDisable a category/akb browseBrowse all 155 arXiv categories/akb browse roboticsFilter categories by keyword/akb statsShow paper/chunk/category counts/akb helpShow available commands The agent runs these via the akb CLI internally.

📱 PrivateApp Dashboard

A companion PWA dashboard is available. Provides: Semantic search across paper content Paper detail with abstract translation (on-demand via LLM) Inline PDF viewing Category browser Stats (papers, chunks, categories)

Architecture

scripts/ ├── cli.py # CLI — categories, ingest, paper, stats, expire ├── db.py # SQLite schema + CRUD ├── arxiv_crawler.py # arXiv API search + PDF download ├── arxiv_taxonomy.py # Full arXiv category taxonomy (155 categories) ├── pdf_processor.py # PDF text extraction + tiktoken chunking ├── embed.py # Ollama nomic-embed-text (768d, normalized) ├── faiss_index.py # FAISS IndexFlatIP manager ├── search.py # Semantic search: query → FAISS → group by paper └── install.py # One-command installer

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

4 Scripts2 Docs

SKILL.md Primary doc
README.md Docs
scripts/arxiv_crawler.py Scripts
scripts/arxiv_taxonomy.py Scripts
scripts/cli.py Scripts
scripts/db.py Scripts