← All skills
Tencent SkillHub Β· AI

Sci Data Extractor

AI-powered tool for extracting structured data from scientific literature PDFs

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

AI-powered tool for extracting structured data from scientific literature PDFs

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, README_ZH.md, SKILL.md, USAGE.md, batch_extract.py, examples/README.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
0.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 15 sections Open source page

PDF Content Extraction

Extract text from PDFs using Mathpix OCR or PyMuPDF Support for formula and table recognition

Data Extraction

Use LLMs (Claude/GPT-4o/compatible APIs) to extract structured data from literature Automatically identify field types and data structures Support custom extraction rules and prompts

Output Formats

Markdown tables CSV files

Prerequisites

Python 3.8+ pip package manager

Setup Steps

Install Python dependencies (choose one method): Method 1: Using uv (Recommended - Fastest) # Install uv curl -LsSf https://astral.sh/uv/install.sh | sh # Create virtual environment and install dependencies cd /path/to/sci-data-extractor uv venv source .venv/bin/activate # Linux/macOS # or .venv\Scripts\activate # Windows uv pip install -r requirements.txt Method 2: Using conda (Best for scientific/research users) cd /path/to/sci-data-extractor conda create -n sci-data-extractor python=3.11 -y conda activate sci-data-extractor pip install -r requirements.txt Method 3: Using pip directly (Built-in, no extra installation) cd /path/to/sci-data-extractor pip install -r requirements.txt Configure API credentials: # Copy example configuration cp .env.example .env # Edit .env and add your API key # Get API key from: https://console.anthropic.com/ EXTRACTOR_API_KEY=your-api-key-here EXTRACTOR_BASE_URL=https://api.anthropic.com EXTRACTOR_MODEL=claude-sonnet-4-5-20250929 EXTRACTOR_MAX_TOKENS=16384 Optional: Configure Mathpix OCR (for high-precision OCR): # Get credentials from: https://api.mathpix.com/ MATHPIX_APP_ID=your-mathpix-app-id MATHPIX_APP_KEY=your-mathpix-app-key

Verify Installation

python extractor.py --help

Get API Keys

Anthropic Claude: https://console.anthropic.com/ OpenAI: https://platform.openai.com/api-keys Mathpix OCR: https://api.mathpix.com/

How to Use

When users request data extraction: Understand requirements: Ask what type of data to extract Choose method: Use preset templates (enzyme/experiment/review) Use custom extraction prompts Execute extraction: python extractor.py input.pdf --template enzyme -o output.md Verify results: Display extracted data and ask if adjustments needed

Enzyme Kinetics Data (enzyme)

Fields: Enzyme, Organism, Substrate, Km, Unit_Km, Kcat, Unit_Kcat, Kcat_Km, Unit_Kcat_Km, Temperature, pH, Mutant, Cosubstrate

Experimental Results Data (experiment)

Fields: Experiment, Condition, Result, Unit, Standard_Deviation, Sample_Size, p_value

Literature Review Data (review)

Fields: Author, Year, Journal, Title, DOI, Key_Findings, Methodology

Configuration Requirements

Users should set environment variables (optional, can also be in .env file): EXTRACTOR_API_KEY: LLM API key EXTRACTOR_BASE_URL: API endpoint EXTRACTOR_MODEL: Model name (default: claude-sonnet-4-5-20250929) EXTRACTOR_TEMPERATURE: Temperature parameter (default: 0.1) EXTRACTOR_MAX_TOKENS: Maximum output tokens (default: 16384) MATHPIX_APP_ID: Mathpix OCR App ID (optional) MATHPIX_APP_KEY: Mathpix OCR Key (optional)

Best Practices

Verify API key configuration before extraction Recommend users validate extracted data for accuracy Long documents may require segmented processing Remind users to cite original literature

Usage Examples

Example command for enzyme kinetics extraction: python extractor.py paper.pdf --template enzyme -o results.md Example for custom extraction: python extractor.py paper.pdf -p "Extract all protein structures with PDB IDs" -o custom.md Example for CSV output: python extractor.py paper.pdf --template enzyme -o results.csv --format csv

Notes

This tool is for academic research use only Always validate AI-extracted results Respect copyright when using extracted data Cite original sources appropriately

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
5 Docs1 Scripts
  • SKILL.md Primary doc
  • examples/README.md Docs
  • README_ZH.md Docs
  • README.md Docs
  • USAGE.md Docs
  • batch_extract.py Scripts