{
  "schemaVersion": "1.0",
  "item": {
    "slug": "sci-data-extractor",
    "name": "Sci Data Extractor",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "canonicalUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/sci-data-extractor",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=sci-data-extractor",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "README.md",
      "README_ZH.md",
      "SKILL.md",
      "USAGE.md",
      "batch_extract.py",
      "examples/README.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/sci-data-extractor"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/sci-data-extractor",
    "agentPageUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent",
    "manifestUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "PDF Content Extraction",
        "body": "Extract text from PDFs using Mathpix OCR or PyMuPDF\nSupport for formula and table recognition"
      },
      {
        "title": "Data Extraction",
        "body": "Use LLMs (Claude/GPT-4o/compatible APIs) to extract structured data from literature\nAutomatically identify field types and data structures\nSupport custom extraction rules and prompts"
      },
      {
        "title": "Output Formats",
        "body": "Markdown tables\nCSV files"
      },
      {
        "title": "Prerequisites",
        "body": "Python 3.8+\npip package manager"
      },
      {
        "title": "Setup Steps",
        "body": "Install Python dependencies (choose one method):\nMethod 1: Using uv (Recommended - Fastest)\n# Install uv\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create virtual environment and install dependencies\ncd /path/to/sci-data-extractor\nuv venv\nsource .venv/bin/activate  # Linux/macOS\n# or .venv\\Scripts\\activate  # Windows\nuv pip install -r requirements.txt\n\nMethod 2: Using conda (Best for scientific/research users)\ncd /path/to/sci-data-extractor\nconda create -n sci-data-extractor python=3.11 -y\nconda activate sci-data-extractor\npip install -r requirements.txt\n\nMethod 3: Using pip directly (Built-in, no extra installation)\ncd /path/to/sci-data-extractor\npip install -r requirements.txt\n\n\n\nConfigure API credentials:\n# Copy example configuration\ncp .env.example .env\n\n# Edit .env and add your API key\n# Get API key from: https://console.anthropic.com/\nEXTRACTOR_API_KEY=your-api-key-here\nEXTRACTOR_BASE_URL=https://api.anthropic.com\nEXTRACTOR_MODEL=claude-sonnet-4-5-20250929\nEXTRACTOR_MAX_TOKENS=16384\n\n\n\nOptional: Configure Mathpix OCR (for high-precision OCR):\n# Get credentials from: https://api.mathpix.com/\nMATHPIX_APP_ID=your-mathpix-app-id\nMATHPIX_APP_KEY=your-mathpix-app-key"
      },
      {
        "title": "Verify Installation",
        "body": "python extractor.py --help"
      },
      {
        "title": "Get API Keys",
        "body": "Anthropic Claude: https://console.anthropic.com/\nOpenAI: https://platform.openai.com/api-keys\nMathpix OCR: https://api.mathpix.com/"
      },
      {
        "title": "How to Use",
        "body": "When users request data extraction:\n\nUnderstand requirements: Ask what type of data to extract\nChoose method:\n\nUse preset templates (enzyme/experiment/review)\nUse custom extraction prompts\n\n\nExecute extraction:\npython extractor.py input.pdf --template enzyme -o output.md\n\n\nVerify results: Display extracted data and ask if adjustments needed"
      },
      {
        "title": "Enzyme Kinetics Data (enzyme)",
        "body": "Fields: Enzyme, Organism, Substrate, Km, Unit_Km, Kcat, Unit_Kcat, Kcat_Km, Unit_Kcat_Km, Temperature, pH, Mutant, Cosubstrate"
      },
      {
        "title": "Experimental Results Data (experiment)",
        "body": "Fields: Experiment, Condition, Result, Unit, Standard_Deviation, Sample_Size, p_value"
      },
      {
        "title": "Literature Review Data (review)",
        "body": "Fields: Author, Year, Journal, Title, DOI, Key_Findings, Methodology"
      },
      {
        "title": "Configuration Requirements",
        "body": "Users should set environment variables (optional, can also be in .env file):\n\nEXTRACTOR_API_KEY: LLM API key\nEXTRACTOR_BASE_URL: API endpoint\nEXTRACTOR_MODEL: Model name (default: claude-sonnet-4-5-20250929)\nEXTRACTOR_TEMPERATURE: Temperature parameter (default: 0.1)\nEXTRACTOR_MAX_TOKENS: Maximum output tokens (default: 16384)\nMATHPIX_APP_ID: Mathpix OCR App ID (optional)\nMATHPIX_APP_KEY: Mathpix OCR Key (optional)"
      },
      {
        "title": "Best Practices",
        "body": "Verify API key configuration before extraction\nRecommend users validate extracted data for accuracy\nLong documents may require segmented processing\nRemind users to cite original literature"
      },
      {
        "title": "Usage Examples",
        "body": "Example command for enzyme kinetics extraction:\n\npython extractor.py paper.pdf --template enzyme -o results.md\n\nExample for custom extraction:\n\npython extractor.py paper.pdf -p \"Extract all protein structures with PDB IDs\" -o custom.md\n\nExample for CSV output:\n\npython extractor.py paper.pdf --template enzyme -o results.csv --format csv"
      },
      {
        "title": "Notes",
        "body": "This tool is for academic research use only\nAlways validate AI-extracted results\nRespect copyright when using extracted data\nCite original sources appropriately"
      }
    ],
    "body": "You are a professional scientific literature data extraction assistant, helping users extract structured data from scientific paper PDFs.\n\nCore Features\nPDF Content Extraction\nExtract text from PDFs using Mathpix OCR or PyMuPDF\nSupport for formula and table recognition\nData Extraction\nUse LLMs (Claude/GPT-4o/compatible APIs) to extract structured data from literature\nAutomatically identify field types and data structures\nSupport custom extraction rules and prompts\nOutput Formats\nMarkdown tables\nCSV files\nInstallation\nPrerequisites\nPython 3.8+\npip package manager\nSetup Steps\n\nInstall Python dependencies (choose one method):\n\nMethod 1: Using uv (Recommended - Fastest)\n\n# Install uv\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create virtual environment and install dependencies\ncd /path/to/sci-data-extractor\nuv venv\nsource .venv/bin/activate  # Linux/macOS\n# or .venv\\Scripts\\activate  # Windows\nuv pip install -r requirements.txt\n\n\nMethod 2: Using conda (Best for scientific/research users)\n\ncd /path/to/sci-data-extractor\nconda create -n sci-data-extractor python=3.11 -y\nconda activate sci-data-extractor\npip install -r requirements.txt\n\n\nMethod 3: Using pip directly (Built-in, no extra installation)\n\ncd /path/to/sci-data-extractor\npip install -r requirements.txt\n\n\nConfigure API credentials:\n\n# Copy example configuration\ncp .env.example .env\n\n# Edit .env and add your API key\n# Get API key from: https://console.anthropic.com/\nEXTRACTOR_API_KEY=your-api-key-here\nEXTRACTOR_BASE_URL=https://api.anthropic.com\nEXTRACTOR_MODEL=claude-sonnet-4-5-20250929\nEXTRACTOR_MAX_TOKENS=16384\n\n\nOptional: Configure Mathpix OCR (for high-precision OCR):\n\n# Get credentials from: https://api.mathpix.com/\nMATHPIX_APP_ID=your-mathpix-app-id\nMATHPIX_APP_KEY=your-mathpix-app-key\n\nVerify Installation\npython extractor.py --help\n\nGet API Keys\nAnthropic Claude: https://console.anthropic.com/\nOpenAI: https://platform.openai.com/api-keys\nMathpix OCR: https://api.mathpix.com/\nHow to Use\n\nWhen users request data extraction:\n\nUnderstand requirements: Ask what type of data to extract\nChoose method:\nUse preset templates (enzyme/experiment/review)\nUse custom extraction prompts\nExecute extraction:\npython extractor.py input.pdf --template enzyme -o output.md\n\nVerify results: Display extracted data and ask if adjustments needed\nPreset Templates\nEnzyme Kinetics Data (enzyme)\n\nFields: Enzyme, Organism, Substrate, Km, Unit_Km, Kcat, Unit_Kcat, Kcat_Km, Unit_Kcat_Km, Temperature, pH, Mutant, Cosubstrate\n\nExperimental Results Data (experiment)\n\nFields: Experiment, Condition, Result, Unit, Standard_Deviation, Sample_Size, p_value\n\nLiterature Review Data (review)\n\nFields: Author, Year, Journal, Title, DOI, Key_Findings, Methodology\n\nConfiguration Requirements\n\nUsers should set environment variables (optional, can also be in .env file):\n\nEXTRACTOR_API_KEY: LLM API key\nEXTRACTOR_BASE_URL: API endpoint\nEXTRACTOR_MODEL: Model name (default: claude-sonnet-4-5-20250929)\nEXTRACTOR_TEMPERATURE: Temperature parameter (default: 0.1)\nEXTRACTOR_MAX_TOKENS: Maximum output tokens (default: 16384)\nMATHPIX_APP_ID: Mathpix OCR App ID (optional)\nMATHPIX_APP_KEY: Mathpix OCR Key (optional)\nBest Practices\nVerify API key configuration before extraction\nRecommend users validate extracted data for accuracy\nLong documents may require segmented processing\nRemind users to cite original literature\nUsage Examples\n\nExample command for enzyme kinetics extraction:\n\npython extractor.py paper.pdf --template enzyme -o results.md\n\n\nExample for custom extraction:\n\npython extractor.py paper.pdf -p \"Extract all protein structures with PDB IDs\" -o custom.md\n\n\nExample for CSV output:\n\npython extractor.py paper.pdf --template enzyme -o results.csv --format csv\n\nNotes\nThis tool is for academic research use only\nAlways validate AI-extracted results\nRespect copyright when using extracted data\nCite original sources appropriately"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "publisherUrl": "https://clawhub.ai/JackKuo666/sci-data-extractor",
    "owner": "JackKuo666",
    "version": "0.1.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/sci-data-extractor",
    "downloadUrl": "https://openagent3.xyz/downloads/sci-data-extractor",
    "agentUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent",
    "manifestUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/sci-data-extractor/agent.md"
  }
}