{
  "schemaVersion": "1.0",
  "item": {
    "slug": "arxivkb",
    "name": "arxivkb",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/camopel/arxivkb",
    "canonicalUrl": "https://clawhub.ai/camopel/arxivkb",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/arxivkb",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=arxivkb",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "README.md",
      "SKILL.md",
      "scripts/arxiv_crawler.py",
      "scripts/arxiv_taxonomy.py",
      "scripts/cli.py",
      "scripts/db.py"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-07T17:22:31.273Z",
      "expiresAt": "2026-05-14T17:22:31.273Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
        "contentDisposition": "attachment; filename=\"afrexai-annual-report-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/arxivkb"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/arxivkb",
    "agentPageUrl": "https://openagent3.xyz/skills/arxivkb/agent",
    "manifestUrl": "https://openagent3.xyz/skills/arxivkb/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/arxivkb/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Why This Skill?",
        "body": "🏠 100% local — crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost.\n\n🔍 Semantic search on paper content — FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain.\n\n📂 arXiv category-based — tracks official arXiv categories (155 available, 8 groups). No free-text queries.\n\n🧹 Auto-cleanup — configurable expiry deletes old papers, PDFs, and chunks."
      },
      {
        "title": "Install",
        "body": "python3 scripts/install.py\n\nWorks on macOS and Linux. Installs Python deps (faiss-cpu, pdfplumber, tiktoken, arxiv, numpy), pulls nomic-embed-text via Ollama, creates data directories and DB."
      },
      {
        "title": "Prerequisites",
        "body": "Ollama — must be installed and running (ollama serve)\nPython 3.10+"
      },
      {
        "title": "Quick Start",
        "body": "# 1. Add arXiv categories to track\nakb categories add cs.AI cs.CV cs.LG\n\n# 2. Browse all available categories\nakb categories browse\n\n# 3. Ingest recent papers (last 7 days)\nakb ingest\n\n# 4. Check stats\nakb stats"
      },
      {
        "title": "Categories",
        "body": "akb categories list                    # Show enabled categories\nakb categories browse                  # Browse all 155 arXiv categories\nakb categories browse robotics         # Filter by keyword\nakb categories add cs.AI cs.RO         # Enable categories\nakb categories delete cs.AI            # Disable a category\n\nCategories are official arXiv codes (e.g. cs.AI, eess.IV, q-fin.ST). The full taxonomy is built in."
      },
      {
        "title": "Ingestion",
        "body": "akb ingest                    # Crawl, download PDFs, chunk, embed\nakb ingest --days 14          # Look back 14 days\nakb ingest --dry-run          # Preview only\nakb ingest --no-pdf           # Index abstracts only (faster)\n\nPipeline: arXiv API → PDF download → text extraction (pdfplumber) → chunking (tiktoken, 500 tokens, 50 overlap) → embedding (Ollama nomic-embed-text) → FAISS + SQLite."
      },
      {
        "title": "Paper Details",
        "body": "akb paper 2401.12345    # Show title, abstract, categories, PDF status"
      },
      {
        "title": "Statistics",
        "body": "akb stats   # Papers, chunks, categories, DB size"
      },
      {
        "title": "Expiry & Cleanup",
        "body": "akb expire               # Delete papers older than 90 days (default)\nakb expire --days 30     # Override: delete papers older than 30 days\nakb expire --days 30 -y  # Skip confirmation"
      },
      {
        "title": "Configuration",
        "body": "No config file needed. Defaults:\n\nSettingDefaultOverrideData directory~/workspace/arxivkbARXIVKB_DATA_DIR env or --data-dirOllama endpointhttp://localhost:11434— (hardcoded)Embedding modelnomic-embed-text (768d)— (hardcoded)Chunk size500 tokens, 50 overlap—Expiry90 days--days flag"
      },
      {
        "title": "Data Layout",
        "body": "~/workspace/arxivkb/\n├── arxivkb.db           # SQLite: papers, chunks, translations, categories\n├── pdfs/                  # Downloaded PDF files ({arxiv_id}.pdf)\n└── faiss/\n    └── arxivkb.faiss    # FAISS IndexFlatIP (chunk embeddings)"
      },
      {
        "title": "DB Schema",
        "body": "papers: id, arxiv_id, title, abstract, categories, published, status, created_at\nchunks: id, paper_id, section, chunk_index, text, faiss_id, created_at\ntranslations: paper_id, language, abstract, created_at (PK: paper_id+language)\ncategories: code, description, group_name, enabled, added_at (155 entries)"
      },
      {
        "title": "💬 Chat Commands (OpenClaw Agent)",
        "body": "When this skill is installed, the agent recognizes /akb as a shortcut:\n\nCommandAction/akb listShow enabled categories/akb add cs.AI cs.ROEnable categories for crawling/akb remove cs.AIDisable a category/akb browseBrowse all 155 arXiv categories/akb browse roboticsFilter categories by keyword/akb statsShow paper/chunk/category counts/akb helpShow available commands\n\nThe agent runs these via the akb CLI internally."
      },
      {
        "title": "📱 PrivateApp Dashboard",
        "body": "A companion PWA dashboard is available. Provides:\n\nSemantic search across paper content\nPaper detail with abstract translation (on-demand via LLM)\nInline PDF viewing\nCategory browser\nStats (papers, chunks, categories)"
      },
      {
        "title": "Architecture",
        "body": "scripts/\n├── cli.py             # CLI — categories, ingest, paper, stats, expire\n├── db.py              # SQLite schema + CRUD\n├── arxiv_crawler.py   # arXiv API search + PDF download\n├── arxiv_taxonomy.py  # Full arXiv category taxonomy (155 categories)\n├── pdf_processor.py   # PDF text extraction + tiktoken chunking\n├── embed.py           # Ollama nomic-embed-text (768d, normalized)\n├── faiss_index.py     # FAISS IndexFlatIP manager\n├── search.py          # Semantic search: query → FAISS → group by paper\n└── install.py         # One-command installer"
      }
    ],
    "body": "ArXivKB — Science Knowledge Base\nWhy This Skill?\n\n🏠 100% local — crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost.\n\n🔍 Semantic search on paper content — FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain.\n\n📂 arXiv category-based — tracks official arXiv categories (155 available, 8 groups). No free-text queries.\n\n🧹 Auto-cleanup — configurable expiry deletes old papers, PDFs, and chunks.\n\nInstall\npython3 scripts/install.py\n\n\nWorks on macOS and Linux. Installs Python deps (faiss-cpu, pdfplumber, tiktoken, arxiv, numpy), pulls nomic-embed-text via Ollama, creates data directories and DB.\n\nPrerequisites\nOllama — must be installed and running (ollama serve)\nPython 3.10+\nQuick Start\n# 1. Add arXiv categories to track\nakb categories add cs.AI cs.CV cs.LG\n\n# 2. Browse all available categories\nakb categories browse\n\n# 3. Ingest recent papers (last 7 days)\nakb ingest\n\n# 4. Check stats\nakb stats\n\nCategories\nakb categories list                    # Show enabled categories\nakb categories browse                  # Browse all 155 arXiv categories\nakb categories browse robotics         # Filter by keyword\nakb categories add cs.AI cs.RO         # Enable categories\nakb categories delete cs.AI            # Disable a category\n\n\nCategories are official arXiv codes (e.g. cs.AI, eess.IV, q-fin.ST). The full taxonomy is built in.\n\nIngestion\nakb ingest                    # Crawl, download PDFs, chunk, embed\nakb ingest --days 14          # Look back 14 days\nakb ingest --dry-run          # Preview only\nakb ingest --no-pdf           # Index abstracts only (faster)\n\n\nPipeline: arXiv API → PDF download → text extraction (pdfplumber) → chunking (tiktoken, 500 tokens, 50 overlap) → embedding (Ollama nomic-embed-text) → FAISS + SQLite.\n\nPaper Details\nakb paper 2401.12345    # Show title, abstract, categories, PDF status\n\nStatistics\nakb stats   # Papers, chunks, categories, DB size\n\nExpiry & Cleanup\nakb expire               # Delete papers older than 90 days (default)\nakb expire --days 30     # Override: delete papers older than 30 days\nakb expire --days 30 -y  # Skip confirmation\n\nConfiguration\n\nNo config file needed. Defaults:\n\nSetting\tDefault\tOverride\nData directory\t~/workspace/arxivkb\tARXIVKB_DATA_DIR env or --data-dir\nOllama endpoint\thttp://localhost:11434\t— (hardcoded)\nEmbedding model\tnomic-embed-text (768d)\t— (hardcoded)\nChunk size\t500 tokens, 50 overlap\t—\nExpiry\t90 days\t--days flag\nData Layout\n~/workspace/arxivkb/\n├── arxivkb.db           # SQLite: papers, chunks, translations, categories\n├── pdfs/                  # Downloaded PDF files ({arxiv_id}.pdf)\n└── faiss/\n    └── arxivkb.faiss    # FAISS IndexFlatIP (chunk embeddings)\n\nDB Schema\npapers: id, arxiv_id, title, abstract, categories, published, status, created_at\nchunks: id, paper_id, section, chunk_index, text, faiss_id, created_at\ntranslations: paper_id, language, abstract, created_at (PK: paper_id+language)\ncategories: code, description, group_name, enabled, added_at (155 entries)\n💬 Chat Commands (OpenClaw Agent)\n\nWhen this skill is installed, the agent recognizes /akb as a shortcut:\n\nCommand\tAction\n/akb list\tShow enabled categories\n/akb add cs.AI cs.RO\tEnable categories for crawling\n/akb remove cs.AI\tDisable a category\n/akb browse\tBrowse all 155 arXiv categories\n/akb browse robotics\tFilter categories by keyword\n/akb stats\tShow paper/chunk/category counts\n/akb help\tShow available commands\n\nThe agent runs these via the akb CLI internally.\n\n📱 PrivateApp Dashboard\n\nA companion PWA dashboard is available. Provides:\n\nSemantic search across paper content\nPaper detail with abstract translation (on-demand via LLM)\nInline PDF viewing\nCategory browser\nStats (papers, chunks, categories)\nArchitecture\nscripts/\n├── cli.py             # CLI — categories, ingest, paper, stats, expire\n├── db.py              # SQLite schema + CRUD\n├── arxiv_crawler.py   # arXiv API search + PDF download\n├── arxiv_taxonomy.py  # Full arXiv category taxonomy (155 categories)\n├── pdf_processor.py   # PDF text extraction + tiktoken chunking\n├── embed.py           # Ollama nomic-embed-text (768d, normalized)\n├── faiss_index.py     # FAISS IndexFlatIP manager\n├── search.py          # Semantic search: query → FAISS → group by paper\n└── install.py         # One-command installer"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/camopel/arxivkb",
    "publisherUrl": "https://clawhub.ai/camopel/arxivkb",
    "owner": "camopel",
    "version": "1.0.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/arxivkb",
    "downloadUrl": "https://openagent3.xyz/downloads/arxivkb",
    "agentUrl": "https://openagent3.xyz/skills/arxivkb/agent",
    "manifestUrl": "https://openagent3.xyz/skills/arxivkb/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/arxivkb/agent.md"
  }
}