# Send Sci Data Extractor to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "sci-data-extractor",
    "name": "Sci Data Extractor",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "canonicalUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/sci-data-extractor",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=sci-data-extractor",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "README.md",
      "README_ZH.md",
      "SKILL.md",
      "USAGE.md",
      "batch_extract.py",
      "examples/README.md"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "sci-data-extractor",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-03T14:13:33.514Z",
      "expiresAt": "2026-05-10T14:13:33.514Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=sci-data-extractor",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=sci-data-extractor",
        "contentDisposition": "attachment; filename=\"sci-data-extractor-0.1.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "sci-data-extractor"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/sci-data-extractor"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/sci-data-extractor",
    "downloadUrl": "https://openagent3.xyz/downloads/sci-data-extractor",
    "agentUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent",
    "manifestUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.md"
  }
}
```
## Documentation

### PDF Content Extraction

Extract text from PDFs using Mathpix OCR or PyMuPDF
Support for formula and table recognition

### Data Extraction

Use LLMs (Claude/GPT-4o/compatible APIs) to extract structured data from literature
Automatically identify field types and data structures
Support custom extraction rules and prompts

### Output Formats

Markdown tables
CSV files

### Prerequisites

Python 3.8+
pip package manager

### Setup Steps

Install Python dependencies (choose one method):
Method 1: Using uv (Recommended - Fastest)
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
cd /path/to/sci-data-extractor
uv venv
source .venv/bin/activate  # Linux/macOS
# or .venv\\Scripts\\activate  # Windows
uv pip install -r requirements.txt

Method 2: Using conda (Best for scientific/research users)
cd /path/to/sci-data-extractor
conda create -n sci-data-extractor python=3.11 -y
conda activate sci-data-extractor
pip install -r requirements.txt

Method 3: Using pip directly (Built-in, no extra installation)
cd /path/to/sci-data-extractor
pip install -r requirements.txt



Configure API credentials:
# Copy example configuration
cp .env.example .env

# Edit .env and add your API key
# Get API key from: https://console.anthropic.com/
EXTRACTOR_API_KEY=your-api-key-here
EXTRACTOR_BASE_URL=https://api.anthropic.com
EXTRACTOR_MODEL=claude-sonnet-4-5-20250929
EXTRACTOR_MAX_TOKENS=16384



Optional: Configure Mathpix OCR (for high-precision OCR):
# Get credentials from: https://api.mathpix.com/
MATHPIX_APP_ID=your-mathpix-app-id
MATHPIX_APP_KEY=your-mathpix-app-key

### Verify Installation

python extractor.py --help

### Get API Keys

Anthropic Claude: https://console.anthropic.com/
OpenAI: https://platform.openai.com/api-keys
Mathpix OCR: https://api.mathpix.com/

### How to Use

When users request data extraction:

Understand requirements: Ask what type of data to extract
Choose method:

Use preset templates (enzyme/experiment/review)
Use custom extraction prompts


Execute extraction:
python extractor.py input.pdf --template enzyme -o output.md


Verify results: Display extracted data and ask if adjustments needed

### Enzyme Kinetics Data (enzyme)

Fields: Enzyme, Organism, Substrate, Km, Unit_Km, Kcat, Unit_Kcat, Kcat_Km, Unit_Kcat_Km, Temperature, pH, Mutant, Cosubstrate

### Experimental Results Data (experiment)

Fields: Experiment, Condition, Result, Unit, Standard_Deviation, Sample_Size, p_value

### Literature Review Data (review)

Fields: Author, Year, Journal, Title, DOI, Key_Findings, Methodology

### Configuration Requirements

Users should set environment variables (optional, can also be in .env file):

EXTRACTOR_API_KEY: LLM API key
EXTRACTOR_BASE_URL: API endpoint
EXTRACTOR_MODEL: Model name (default: claude-sonnet-4-5-20250929)
EXTRACTOR_TEMPERATURE: Temperature parameter (default: 0.1)
EXTRACTOR_MAX_TOKENS: Maximum output tokens (default: 16384)
MATHPIX_APP_ID: Mathpix OCR App ID (optional)
MATHPIX_APP_KEY: Mathpix OCR Key (optional)

### Best Practices

Verify API key configuration before extraction
Recommend users validate extracted data for accuracy
Long documents may require segmented processing
Remind users to cite original literature

### Usage Examples

Example command for enzyme kinetics extraction:

python extractor.py paper.pdf --template enzyme -o results.md

Example for custom extraction:

python extractor.py paper.pdf -p "Extract all protein structures with PDB IDs" -o custom.md

Example for CSV output:

python extractor.py paper.pdf --template enzyme -o results.csv --format csv

### Notes

This tool is for academic research use only
Always validate AI-extracted results
Respect copyright when using extracted data
Cite original sources appropriately
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: JackKuo666
- Version: 0.1.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-03T14:13:33.514Z
- Expires at: 2026-05-10T14:13:33.514Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/sci-data-extractor)
- [Send to Agent page](https://openagent3.xyz/skills/sci-data-extractor/agent)
- [JSON manifest](https://openagent3.xyz/skills/sci-data-extractor/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/sci-data-extractor/agent.md)
- [Download page](https://openagent3.xyz/downloads/sci-data-extractor)