{
  "schemaVersion": "1.0",
  "item": {
    "slug": "openocr-skill",
    "name": "openocr-skill",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/Topdu/openocr-skill",
    "canonicalUrl": "https://clawhub.ai/Topdu/openocr-skill",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/openocr-skill",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=openocr-skill",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/openocr-skill"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/openocr-skill",
    "agentPageUrl": "https://openagent3.xyz/skills/openocr-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/openocr-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/openocr-skill/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Overview",
        "body": "This skill enables intelligent text extraction, document parsing, and universal recognition using OpenOCR - an accurate and efficient general OCR system. It provides a unified interface for text detection, text recognition, end-to-end OCR, VLM-based universal recognition (text/formulas/tables), and document parsing with layout analysis. Supports Chinese, English, and more."
      },
      {
        "title": "How to Use",
        "body": "Provide the image, scanned document, or PDF\nOptionally specify the task type (det/rec/ocr/unirec/doc)\nI'll extract text, formulas, tables, or full document structure\n\nExample prompts:\n\n\"Extract all text from this image\"\n\"Detect text regions in this photo\"\n\"Recognize the formula in this screenshot\"\n\"Parse this PDF document with layout analysis\"\n\"Convert this scanned page to Markdown\""
      },
      {
        "title": "OpenOCR Fundamentals",
        "body": "from openocr import OpenOCR\n\n# Initialize with a specific task\nengine = OpenOCR(task='ocr')\n\n# Run OCR on an image (callable interface)\nresults, time_dicts = engine(image_path='image.jpg')\n\n# Results contain detected boxes with recognized text\nfor result in results:\n    for line in result:\n        box = line[0]       # Bounding box coordinates\n        text = line[1][0]   # Recognized text\n        conf = line[1][1]   # Confidence score\n        print(f\"{text} ({conf:.2f})\")"
      },
      {
        "title": "Supported Tasks",
        "body": "# Available task types\ntasks = {\n    'det':    'Text Detection - detect text regions with bounding boxes',\n    'rec':    'Text Recognition - recognize text from cropped images',\n    'ocr':    'End-to-End OCR - detection + recognition pipeline',\n    'unirec': 'Universal Recognition - VLM-based text/formula/table recognition (0.1B params)',\n    'doc':    'Document Parsing - layout analysis + universal recognition (0.1B params)',\n}\n\n# Task selection via parameter\ndet_engine = OpenOCR(task='det')\nrec_engine = OpenOCR(task='rec')\nocr_engine = OpenOCR(task='ocr')\nunirec_engine = OpenOCR(task='unirec')\ndoc_engine = OpenOCR(task='doc')"
      },
      {
        "title": "Configuration Options",
        "body": "from openocr import OpenOCR\n\n# === Text Detection ===\ndetector = OpenOCR(\n    task='det',\n    backend='onnx',                          # 'onnx' (default) or 'torch'\n    onnx_det_model_path=None,                # Custom detection model (auto-downloads if None)\n    use_gpu='auto',                          # 'auto', 'true', or 'false'\n)\n\n# === Text Recognition ===\nrecognizer = OpenOCR(\n    task='rec',\n    mode='mobile',                           # 'mobile' (fast) or 'server' (accurate)\n    backend='onnx',                          # 'onnx' (default) or 'torch'\n    onnx_rec_model_path=None,                # Custom recognition model\n    use_gpu='auto',\n)\n\n# === End-to-End OCR ===\nocr = OpenOCR(\n    task='ocr',\n    mode='mobile',                           # 'mobile' or 'server'\n    backend='onnx',                          # 'onnx' or 'torch'\n    onnx_det_model_path=None,                # Custom detection model\n    onnx_rec_model_path=None,                # Custom recognition model\n    drop_score=0.5,                          # Confidence threshold for filtering\n    det_box_type='quad',                     # 'quad' or 'poly' (for curved text)\n    use_gpu='auto',\n)\n\n# === Universal Recognition (UniRec) ===\nunirec = OpenOCR(\n    task='unirec',\n    unirec_encoder_path=None,                # Custom encoder ONNX model\n    unirec_decoder_path=None,                # Custom decoder ONNX model\n    tokenizer_mapping_path=None,             # Custom tokenizer mapping JSON\n    max_length=2048,                         # Max generation length\n    auto_download=True,                      # Auto-download missing models\n    use_gpu='auto',\n)\n\n# === Document Parsing (OpenDoc) ===\ndoc = OpenOCR(\n    task='doc',\n    layout_model_path=None,                  # Custom layout detection model (PP-DocLayoutV2)\n    unirec_encoder_path=None,                # Custom UniRec encoder\n    unirec_decoder_path=None,                # Custom UniRec decoder\n    tokenizer_mapping_path=None,             # Custom tokenizer mapping\n    layout_threshold=0.5,                    # Layout detection threshold\n    use_layout_detection=True,               # Enable layout analysis\n    max_parallel_blocks=4,                   # Max parallel VLM blocks\n    auto_download=True,                      # Auto-download missing models\n    use_gpu='auto',\n)"
      },
      {
        "title": "Task-Specific Usage",
        "body": "Text Detection\n\nfrom openocr import OpenOCR\n\ndetector = OpenOCR(task='det', backend='onnx')\n\n# Detect text regions\nresults = detector(image_path='image.jpg')\n\nboxes = results[0]['boxes']      # np.ndarray of bounding boxes\nelapse = results[0]['elapse']    # Processing time in seconds\n\nprint(f\"Found {len(boxes)} text regions in {elapse:.3f}s\")\nfor box in boxes:\n    print(f\"  Box: {box.tolist()}\")\n\nText Recognition\n\nfrom openocr import OpenOCR\n\n# Mobile mode (fast, ONNX)\nrecognizer = OpenOCR(task='rec', mode='mobile', backend='onnx')\n\n# Server mode (accurate, requires torch)\n# recognizer = OpenOCR(task='rec', mode='server', backend='torch')\n\nresults = recognizer(image_path='word.jpg', batch_num=1)\n\ntext = results[0]['text']        # Recognized text string\nscore = results[0]['score']      # Confidence score\nelapse = results[0]['elapse']    # Processing time\n\nprint(f\"Text: {text}, Score: {score:.3f}, Time: {elapse:.3f}s\")\n\nEnd-to-End OCR\n\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr', mode='mobile', backend='onnx')\n\n# Run OCR with visualization\nresults, time_dicts = ocr(\n    image_path='image.jpg',\n    save_dir='./output',\n    is_visualize=True,\n    rec_batch_num=6,\n)\n\n# Process results\nfor result in results:\n    for line in result:\n        box, (text, confidence) = line[0], line[1]\n        print(f\"{text} ({confidence:.2f})\")\n\nUniversal Recognition (UniRec)\n\nfrom openocr import OpenOCR\n\nunirec = OpenOCR(task='unirec')\n\n# Image input\nresult_text, generated_ids = unirec(image_path='formula.jpg', max_length=2048)\nprint(f\"Result: {result_text}\")\n\n# PDF input (returns list of tuples, one per page)\nresults = unirec(image_path='document.pdf', max_length=2048)\nfor page_text, page_ids in results:\n    print(f\"Page: {page_text[:100]}...\")\n\nDocument Parsing (OpenDoc)\n\nfrom openocr import OpenOCR\n\ndoc = OpenOCR(task='doc', use_layout_detection=True)\n\n# Parse a document image\nresult = doc(image_path='document.jpg')\n\n# Save outputs in multiple formats\ndoc.save_to_markdown(result, './output')\ndoc.save_to_json(result, './output')\ndoc.save_visualization(result, './output')\n\n# Parse a PDF (returns list of dicts, one per page)\nresults = doc(image_path='document.pdf')\nfor page_result in results:\n    doc.save_to_markdown(page_result, './output')"
      },
      {
        "title": "Command-Line Interface",
        "body": "# Text Detection\nopenocr --task det --input_path image.jpg --is_vis\n\n# Text Recognition\nopenocr --task rec --input_path word.jpg --mode server --backend torch\n\n# End-to-End OCR\nopenocr --task ocr --input_path image.jpg --is_vis --output_path ./results\n\n# Universal Recognition\nopenocr --task unirec --input_path formula.jpg --max_length 2048\n\n# Document Parsing\nopenocr --task doc --input_path document.pdf \\\n    --use_layout_detection --save_vis --save_json --save_markdown\n\n# Launch Gradio Demos\nopenocr --task launch_openocr_demo --share --server_port 7860\nopenocr --task launch_unirec_demo --share --server_port 7861\nopenocr --task launch_opendoc_demo --share --server_port 7862"
      },
      {
        "title": "Processing Different Sources",
        "body": "Image Files\n\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr')\n\n# Single image\nresults, _ = ocr(image_path='image.jpg')\n\n# Directory of images\nresults, _ = ocr(image_path='./images/', save_dir='./output', is_visualize=True)\n\nPDF Files\n\nfrom openocr import OpenOCR\n\n# UniRec handles PDFs natively\nunirec = OpenOCR(task='unirec')\nresults = unirec(image_path='document.pdf', max_length=2048)\n\n# OpenDoc handles PDFs natively with layout analysis\ndoc = OpenOCR(task='doc', use_layout_detection=True)\nresults = doc(image_path='document.pdf')\n\n# Save each page\nfor page_result in results:\n    doc.save_to_markdown(page_result, './output')\n    doc.save_to_json(page_result, './output')\n\nNumpy Array Input\n\nimport cv2\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr')\n\n# Read image as numpy array\nimg = cv2.imread('image.jpg')\n\n# Pass numpy array directly\nresults, _ = ocr(img_numpy=img)"
      },
      {
        "title": "Result Formats",
        "body": "# Detection result format\ndet_result = [{'boxes': np.ndarray, 'elapse': float}]\n\n# Recognition result format\nrec_result = [{'text': str, 'score': float, 'elapse': float}]\n\n# OCR result format (detection + recognition)\nocr_result = (results_list, time_dicts)\n# results_list: [[[box, (text, confidence)], ...], ...]\n\n# UniRec result format\n# Image: (text: str, generated_ids: list)\n# PDF:   [(text: str, generated_ids: list), ...]  # one per page\n\n# Doc result format\n# Image: dict with layout blocks and recognized content\n# PDF:   [dict, ...]  # one per page"
      },
      {
        "title": "Best Practices",
        "body": "Choose the Right Task: Use ocr for general text, unirec for formulas/tables, doc for full documents\nUse Mobile Mode for Speed: mode='mobile' is much faster; use mode='server' only when accuracy is critical\nUse ONNX Backend: Default ONNX backend works on CPU without extra dependencies\nSet Appropriate Thresholds: Adjust drop_score (OCR) and layout_threshold (Doc) for your use case\nEnable Layout Detection: For documents with mixed content (text + formulas + tables), always enable use_layout_detection\nBatch Processing: Use rec_batch_num to control recognition batch size for throughput optimization\nGPU Acceleration: Install onnxruntime-gpu or PyTorch with CUDA for significant speedup"
      },
      {
        "title": "Full Document Processing Pipeline",
        "body": "from openocr import OpenOCR\nimport os\n\ndef process_documents(input_dir, output_dir):\n    \"\"\"Process all documents in a directory.\"\"\"\n    doc = OpenOCR(task='doc', use_layout_detection=True)\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    for filename in os.listdir(input_dir):\n        if filename.lower().endswith(('.jpg', '.png', '.pdf', '.bmp')):\n            filepath = os.path.join(input_dir, filename)\n            print(f\"Processing: {filename}\")\n\n            result = doc(image_path=filepath)\n\n            # Handle PDF (list) vs image (dict)\n            if isinstance(result, list):\n                for page_result in result:\n                    doc.save_to_markdown(page_result, output_dir)\n                    doc.save_to_json(page_result, output_dir)\n            else:\n                doc.save_to_markdown(result, output_dir)\n                doc.save_to_json(result, output_dir)\n\n    print(f\"All results saved to {output_dir}\")\n\nprocess_documents('./docs', './output')"
      },
      {
        "title": "OCR with Custom Post-Processing",
        "body": "from openocr import OpenOCR\nimport re\n\ndef extract_structured_text(image_path, drop_score=0.5):\n    \"\"\"Extract and structure text from an image.\"\"\"\n    ocr = OpenOCR(task='ocr', drop_score=drop_score)\n    results, _ = ocr(image_path=image_path)\n\n    lines = []\n    for result in results:\n        for line in result:\n            box = line[0]\n            text = line[1][0]\n            confidence = line[1][1]\n\n            # Calculate bounding box center\n            y_center = sum(p[1] for p in box) / 4\n\n            lines.append({\n                'text': text,\n                'confidence': confidence,\n                'y_center': y_center,\n                'box': box,\n            })\n\n    # Sort by vertical position (top to bottom)\n    lines.sort(key=lambda x: x['y_center'])\n\n    return lines\n\nresult = extract_structured_text('page.jpg')\nfor line in result:\n    print(f\"{line['text']} ({line['confidence']:.2f})\")"
      },
      {
        "title": "Formula Recognition",
        "body": "from openocr import OpenOCR\n\ndef recognize_formula(image_path):\n    \"\"\"Recognize mathematical formula from image.\"\"\"\n    unirec = OpenOCR(task='unirec')\n    text, ids = unirec(image_path=image_path, max_length=2048)\n\n    # UniRec outputs LaTeX for formulas\n    print(f\"LaTeX: {text}\")\n    return text\n\nlatex = recognize_formula('formula.png')\n# Output: \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}"
      },
      {
        "title": "Table Extraction",
        "body": "from openocr import OpenOCR\n\ndef extract_table(image_path):\n    \"\"\"Extract table content from image.\"\"\"\n    unirec = OpenOCR(task='unirec')\n    text, ids = unirec(image_path=image_path, max_length=2048)\n\n    # UniRec outputs LaTeX table format\n    print(f\"Table: {text}\")\n    return text\n\ntable_latex = extract_table('table.png')"
      },
      {
        "title": "Example 1: Batch OCR with Progress",
        "body": "from openocr import OpenOCR\nimport os\n\ndef batch_ocr(image_dir, output_dir='./ocr_results'):\n    \"\"\"OCR all images in a directory.\"\"\"\n    ocr = OpenOCR(task='ocr', mode='mobile')\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    image_files = [\n        f for f in os.listdir(image_dir)\n        if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp', '.tiff'))\n    ]\n\n    all_results = {}\n    for i, filename in enumerate(image_files):\n        filepath = os.path.join(image_dir, filename)\n        print(f\"[{i+1}/{len(image_files)}] Processing: {filename}\")\n\n        results, time_dicts = ocr(\n            image_path=filepath,\n            save_dir=output_dir,\n            is_visualize=True,\n        )\n\n        texts = []\n        for result in results:\n            for line in result:\n                texts.append(line[1][0])\n\n        all_results[filename] = texts\n        print(f\"  Found {len(texts)} text lines\")\n\n    # Save all text\n    with open(os.path.join(output_dir, 'all_text.txt'), 'w') as f:\n        for filename, texts in all_results.items():\n            f.write(f\"--- {filename} ---\\n\")\n            f.write('\\n'.join(texts))\n            f.write('\\n\\n')\n\n    return all_results\n\nresults = batch_ocr('./images')"
      },
      {
        "title": "Example 2: Document to Markdown Converter",
        "body": "from openocr import OpenOCR\nimport os\n\ndef doc_to_markdown(input_path, output_dir='./markdown_output'):\n    \"\"\"Convert document images or PDFs to Markdown.\"\"\"\n    doc = OpenOCR(\n        task='doc',\n        use_layout_detection=True,\n        use_chart_recognition=True,\n    )\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    result = doc(image_path=input_path)\n\n    if isinstance(result, list):\n        # PDF: multiple pages\n        for page_result in result:\n            doc.save_to_markdown(page_result, output_dir)\n        print(f\"Converted {len(result)} pages to Markdown\")\n    else:\n        # Single image\n        doc.save_to_markdown(result, output_dir)\n        print(\"Converted image to Markdown\")\n\n    print(f\"Output saved to: {output_dir}\")\n\n# Convert a scanned PDF\ndoc_to_markdown('paper.pdf')\n\n# Convert a document image\ndoc_to_markdown('page.jpg')"
      },
      {
        "title": "Example 3: Multi-Task Comparison",
        "body": "from openocr import OpenOCR\n\ndef compare_tasks(image_path):\n    \"\"\"Compare results from different OpenOCR tasks.\"\"\"\n\n    # 1. Detection only\n    det = OpenOCR(task='det')\n    det_result = det(image_path=image_path)\n    num_boxes = len(det_result[0]['boxes'])\n    print(f\"Detection: Found {num_boxes} text regions\")\n\n    # 2. End-to-End OCR\n    ocr = OpenOCR(task='ocr')\n    ocr_results, _ = ocr(image_path=image_path)\n    ocr_texts = [line[1][0] for result in ocr_results for line in result]\n    print(f\"OCR: Extracted {len(ocr_texts)} text lines\")\n    for t in ocr_texts[:5]:\n        print(f\"  - {t}\")\n\n    # 3. Universal Recognition\n    unirec = OpenOCR(task='unirec')\n    text, _ = unirec(image_path=image_path)\n    print(f\"UniRec: {text[:200]}...\")\n\n    return {\n        'det_boxes': num_boxes,\n        'ocr_texts': ocr_texts,\n        'unirec_text': text,\n    }\n\ncompare_tasks('document.jpg')"
      },
      {
        "title": "Example 4: Gradio Demo Launch",
        "body": "from openocr import launch_openocr_demo, launch_unirec_demo, launch_opendoc_demo\n\n# Launch OCR demo\nlaunch_openocr_demo(share=True, server_port=7860, server_name='0.0.0.0')\n\n# Launch UniRec demo\nlaunch_unirec_demo(share=True, server_port=7861)\n\n# Launch OpenDoc demo\nlaunch_opendoc_demo(share=True, server_port=7862)"
      },
      {
        "title": "Limitations",
        "body": "Text recognition accuracy depends on image quality\nVery small or heavily rotated text may reduce accuracy\nserver mode requires PyTorch and is slower than mobile mode\nUniRec and Doc tasks use 0.1B parameter VLM, larger models may yield better results\nPDF processing converts pages to images internally, very large PDFs may use significant memory\nComplex handwritten text accuracy varies\nGPU recommended for best performance, especially for Doc and UniRec tasks"
      },
      {
        "title": "Installation",
        "body": "# Basic installation (CPU, ONNX backend)\npip install openocr-python\n\n# GPU-accelerated ONNX inference\npip install openocr-python[onnx-gpu]\n\n# PyTorch backend (for server mode)\npip install openocr-python[pytorch]\n\n# Gradio demos\npip install openocr-python[gradio]\n\n# All optional dependencies\npip install openocr-python[all]\n\n# From source\ngit clone https://github.com/Topdu/OpenOCR.git\ncd OpenOCR\npython build_package.py\npip install ./build/dist/openocr_python-*.whl"
      },
      {
        "title": "Resources",
        "body": "OpenOCR GitHub\nPyPI Package\nUniRec Paper\nOpenDoc Documentation\nModel Zoo & Configs"
      }
    ],
    "body": "OpenOCR Skill\nOverview\n\nThis skill enables intelligent text extraction, document parsing, and universal recognition using OpenOCR - an accurate and efficient general OCR system. It provides a unified interface for text detection, text recognition, end-to-end OCR, VLM-based universal recognition (text/formulas/tables), and document parsing with layout analysis. Supports Chinese, English, and more.\n\nHow to Use\nProvide the image, scanned document, or PDF\nOptionally specify the task type (det/rec/ocr/unirec/doc)\nI'll extract text, formulas, tables, or full document structure\n\nExample prompts:\n\n\"Extract all text from this image\"\n\"Detect text regions in this photo\"\n\"Recognize the formula in this screenshot\"\n\"Parse this PDF document with layout analysis\"\n\"Convert this scanned page to Markdown\"\nDomain Knowledge\nOpenOCR Fundamentals\nfrom openocr import OpenOCR\n\n# Initialize with a specific task\nengine = OpenOCR(task='ocr')\n\n# Run OCR on an image (callable interface)\nresults, time_dicts = engine(image_path='image.jpg')\n\n# Results contain detected boxes with recognized text\nfor result in results:\n    for line in result:\n        box = line[0]       # Bounding box coordinates\n        text = line[1][0]   # Recognized text\n        conf = line[1][1]   # Confidence score\n        print(f\"{text} ({conf:.2f})\")\n\nSupported Tasks\n# Available task types\ntasks = {\n    'det':    'Text Detection - detect text regions with bounding boxes',\n    'rec':    'Text Recognition - recognize text from cropped images',\n    'ocr':    'End-to-End OCR - detection + recognition pipeline',\n    'unirec': 'Universal Recognition - VLM-based text/formula/table recognition (0.1B params)',\n    'doc':    'Document Parsing - layout analysis + universal recognition (0.1B params)',\n}\n\n# Task selection via parameter\ndet_engine = OpenOCR(task='det')\nrec_engine = OpenOCR(task='rec')\nocr_engine = OpenOCR(task='ocr')\nunirec_engine = OpenOCR(task='unirec')\ndoc_engine = OpenOCR(task='doc')\n\nConfiguration Options\nfrom openocr import OpenOCR\n\n# === Text Detection ===\ndetector = OpenOCR(\n    task='det',\n    backend='onnx',                          # 'onnx' (default) or 'torch'\n    onnx_det_model_path=None,                # Custom detection model (auto-downloads if None)\n    use_gpu='auto',                          # 'auto', 'true', or 'false'\n)\n\n# === Text Recognition ===\nrecognizer = OpenOCR(\n    task='rec',\n    mode='mobile',                           # 'mobile' (fast) or 'server' (accurate)\n    backend='onnx',                          # 'onnx' (default) or 'torch'\n    onnx_rec_model_path=None,                # Custom recognition model\n    use_gpu='auto',\n)\n\n# === End-to-End OCR ===\nocr = OpenOCR(\n    task='ocr',\n    mode='mobile',                           # 'mobile' or 'server'\n    backend='onnx',                          # 'onnx' or 'torch'\n    onnx_det_model_path=None,                # Custom detection model\n    onnx_rec_model_path=None,                # Custom recognition model\n    drop_score=0.5,                          # Confidence threshold for filtering\n    det_box_type='quad',                     # 'quad' or 'poly' (for curved text)\n    use_gpu='auto',\n)\n\n# === Universal Recognition (UniRec) ===\nunirec = OpenOCR(\n    task='unirec',\n    unirec_encoder_path=None,                # Custom encoder ONNX model\n    unirec_decoder_path=None,                # Custom decoder ONNX model\n    tokenizer_mapping_path=None,             # Custom tokenizer mapping JSON\n    max_length=2048,                         # Max generation length\n    auto_download=True,                      # Auto-download missing models\n    use_gpu='auto',\n)\n\n# === Document Parsing (OpenDoc) ===\ndoc = OpenOCR(\n    task='doc',\n    layout_model_path=None,                  # Custom layout detection model (PP-DocLayoutV2)\n    unirec_encoder_path=None,                # Custom UniRec encoder\n    unirec_decoder_path=None,                # Custom UniRec decoder\n    tokenizer_mapping_path=None,             # Custom tokenizer mapping\n    layout_threshold=0.5,                    # Layout detection threshold\n    use_layout_detection=True,               # Enable layout analysis\n    max_parallel_blocks=4,                   # Max parallel VLM blocks\n    auto_download=True,                      # Auto-download missing models\n    use_gpu='auto',\n)\n\nTask-Specific Usage\nText Detection\nfrom openocr import OpenOCR\n\ndetector = OpenOCR(task='det', backend='onnx')\n\n# Detect text regions\nresults = detector(image_path='image.jpg')\n\nboxes = results[0]['boxes']      # np.ndarray of bounding boxes\nelapse = results[0]['elapse']    # Processing time in seconds\n\nprint(f\"Found {len(boxes)} text regions in {elapse:.3f}s\")\nfor box in boxes:\n    print(f\"  Box: {box.tolist()}\")\n\nText Recognition\nfrom openocr import OpenOCR\n\n# Mobile mode (fast, ONNX)\nrecognizer = OpenOCR(task='rec', mode='mobile', backend='onnx')\n\n# Server mode (accurate, requires torch)\n# recognizer = OpenOCR(task='rec', mode='server', backend='torch')\n\nresults = recognizer(image_path='word.jpg', batch_num=1)\n\ntext = results[0]['text']        # Recognized text string\nscore = results[0]['score']      # Confidence score\nelapse = results[0]['elapse']    # Processing time\n\nprint(f\"Text: {text}, Score: {score:.3f}, Time: {elapse:.3f}s\")\n\nEnd-to-End OCR\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr', mode='mobile', backend='onnx')\n\n# Run OCR with visualization\nresults, time_dicts = ocr(\n    image_path='image.jpg',\n    save_dir='./output',\n    is_visualize=True,\n    rec_batch_num=6,\n)\n\n# Process results\nfor result in results:\n    for line in result:\n        box, (text, confidence) = line[0], line[1]\n        print(f\"{text} ({confidence:.2f})\")\n\nUniversal Recognition (UniRec)\nfrom openocr import OpenOCR\n\nunirec = OpenOCR(task='unirec')\n\n# Image input\nresult_text, generated_ids = unirec(image_path='formula.jpg', max_length=2048)\nprint(f\"Result: {result_text}\")\n\n# PDF input (returns list of tuples, one per page)\nresults = unirec(image_path='document.pdf', max_length=2048)\nfor page_text, page_ids in results:\n    print(f\"Page: {page_text[:100]}...\")\n\nDocument Parsing (OpenDoc)\nfrom openocr import OpenOCR\n\ndoc = OpenOCR(task='doc', use_layout_detection=True)\n\n# Parse a document image\nresult = doc(image_path='document.jpg')\n\n# Save outputs in multiple formats\ndoc.save_to_markdown(result, './output')\ndoc.save_to_json(result, './output')\ndoc.save_visualization(result, './output')\n\n# Parse a PDF (returns list of dicts, one per page)\nresults = doc(image_path='document.pdf')\nfor page_result in results:\n    doc.save_to_markdown(page_result, './output')\n\nCommand-Line Interface\n# Text Detection\nopenocr --task det --input_path image.jpg --is_vis\n\n# Text Recognition\nopenocr --task rec --input_path word.jpg --mode server --backend torch\n\n# End-to-End OCR\nopenocr --task ocr --input_path image.jpg --is_vis --output_path ./results\n\n# Universal Recognition\nopenocr --task unirec --input_path formula.jpg --max_length 2048\n\n# Document Parsing\nopenocr --task doc --input_path document.pdf \\\n    --use_layout_detection --save_vis --save_json --save_markdown\n\n# Launch Gradio Demos\nopenocr --task launch_openocr_demo --share --server_port 7860\nopenocr --task launch_unirec_demo --share --server_port 7861\nopenocr --task launch_opendoc_demo --share --server_port 7862\n\nProcessing Different Sources\nImage Files\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr')\n\n# Single image\nresults, _ = ocr(image_path='image.jpg')\n\n# Directory of images\nresults, _ = ocr(image_path='./images/', save_dir='./output', is_visualize=True)\n\nPDF Files\nfrom openocr import OpenOCR\n\n# UniRec handles PDFs natively\nunirec = OpenOCR(task='unirec')\nresults = unirec(image_path='document.pdf', max_length=2048)\n\n# OpenDoc handles PDFs natively with layout analysis\ndoc = OpenOCR(task='doc', use_layout_detection=True)\nresults = doc(image_path='document.pdf')\n\n# Save each page\nfor page_result in results:\n    doc.save_to_markdown(page_result, './output')\n    doc.save_to_json(page_result, './output')\n\nNumpy Array Input\nimport cv2\nfrom openocr import OpenOCR\n\nocr = OpenOCR(task='ocr')\n\n# Read image as numpy array\nimg = cv2.imread('image.jpg')\n\n# Pass numpy array directly\nresults, _ = ocr(img_numpy=img)\n\nResult Formats\n# Detection result format\ndet_result = [{'boxes': np.ndarray, 'elapse': float}]\n\n# Recognition result format\nrec_result = [{'text': str, 'score': float, 'elapse': float}]\n\n# OCR result format (detection + recognition)\nocr_result = (results_list, time_dicts)\n# results_list: [[[box, (text, confidence)], ...], ...]\n\n# UniRec result format\n# Image: (text: str, generated_ids: list)\n# PDF:   [(text: str, generated_ids: list), ...]  # one per page\n\n# Doc result format\n# Image: dict with layout blocks and recognized content\n# PDF:   [dict, ...]  # one per page\n\nBest Practices\nChoose the Right Task: Use ocr for general text, unirec for formulas/tables, doc for full documents\nUse Mobile Mode for Speed: mode='mobile' is much faster; use mode='server' only when accuracy is critical\nUse ONNX Backend: Default ONNX backend works on CPU without extra dependencies\nSet Appropriate Thresholds: Adjust drop_score (OCR) and layout_threshold (Doc) for your use case\nEnable Layout Detection: For documents with mixed content (text + formulas + tables), always enable use_layout_detection\nBatch Processing: Use rec_batch_num to control recognition batch size for throughput optimization\nGPU Acceleration: Install onnxruntime-gpu or PyTorch with CUDA for significant speedup\nCommon Patterns\nFull Document Processing Pipeline\nfrom openocr import OpenOCR\nimport os\n\ndef process_documents(input_dir, output_dir):\n    \"\"\"Process all documents in a directory.\"\"\"\n    doc = OpenOCR(task='doc', use_layout_detection=True)\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    for filename in os.listdir(input_dir):\n        if filename.lower().endswith(('.jpg', '.png', '.pdf', '.bmp')):\n            filepath = os.path.join(input_dir, filename)\n            print(f\"Processing: {filename}\")\n\n            result = doc(image_path=filepath)\n\n            # Handle PDF (list) vs image (dict)\n            if isinstance(result, list):\n                for page_result in result:\n                    doc.save_to_markdown(page_result, output_dir)\n                    doc.save_to_json(page_result, output_dir)\n            else:\n                doc.save_to_markdown(result, output_dir)\n                doc.save_to_json(result, output_dir)\n\n    print(f\"All results saved to {output_dir}\")\n\nprocess_documents('./docs', './output')\n\nOCR with Custom Post-Processing\nfrom openocr import OpenOCR\nimport re\n\ndef extract_structured_text(image_path, drop_score=0.5):\n    \"\"\"Extract and structure text from an image.\"\"\"\n    ocr = OpenOCR(task='ocr', drop_score=drop_score)\n    results, _ = ocr(image_path=image_path)\n\n    lines = []\n    for result in results:\n        for line in result:\n            box = line[0]\n            text = line[1][0]\n            confidence = line[1][1]\n\n            # Calculate bounding box center\n            y_center = sum(p[1] for p in box) / 4\n\n            lines.append({\n                'text': text,\n                'confidence': confidence,\n                'y_center': y_center,\n                'box': box,\n            })\n\n    # Sort by vertical position (top to bottom)\n    lines.sort(key=lambda x: x['y_center'])\n\n    return lines\n\nresult = extract_structured_text('page.jpg')\nfor line in result:\n    print(f\"{line['text']} ({line['confidence']:.2f})\")\n\nFormula Recognition\nfrom openocr import OpenOCR\n\ndef recognize_formula(image_path):\n    \"\"\"Recognize mathematical formula from image.\"\"\"\n    unirec = OpenOCR(task='unirec')\n    text, ids = unirec(image_path=image_path, max_length=2048)\n\n    # UniRec outputs LaTeX for formulas\n    print(f\"LaTeX: {text}\")\n    return text\n\nlatex = recognize_formula('formula.png')\n# Output: \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}\n\nTable Extraction\nfrom openocr import OpenOCR\n\ndef extract_table(image_path):\n    \"\"\"Extract table content from image.\"\"\"\n    unirec = OpenOCR(task='unirec')\n    text, ids = unirec(image_path=image_path, max_length=2048)\n\n    # UniRec outputs LaTeX table format\n    print(f\"Table: {text}\")\n    return text\n\ntable_latex = extract_table('table.png')\n\nExamples\nExample 1: Batch OCR with Progress\nfrom openocr import OpenOCR\nimport os\n\ndef batch_ocr(image_dir, output_dir='./ocr_results'):\n    \"\"\"OCR all images in a directory.\"\"\"\n    ocr = OpenOCR(task='ocr', mode='mobile')\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    image_files = [\n        f for f in os.listdir(image_dir)\n        if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp', '.tiff'))\n    ]\n\n    all_results = {}\n    for i, filename in enumerate(image_files):\n        filepath = os.path.join(image_dir, filename)\n        print(f\"[{i+1}/{len(image_files)}] Processing: {filename}\")\n\n        results, time_dicts = ocr(\n            image_path=filepath,\n            save_dir=output_dir,\n            is_visualize=True,\n        )\n\n        texts = []\n        for result in results:\n            for line in result:\n                texts.append(line[1][0])\n\n        all_results[filename] = texts\n        print(f\"  Found {len(texts)} text lines\")\n\n    # Save all text\n    with open(os.path.join(output_dir, 'all_text.txt'), 'w') as f:\n        for filename, texts in all_results.items():\n            f.write(f\"--- {filename} ---\\n\")\n            f.write('\\n'.join(texts))\n            f.write('\\n\\n')\n\n    return all_results\n\nresults = batch_ocr('./images')\n\nExample 2: Document to Markdown Converter\nfrom openocr import OpenOCR\nimport os\n\ndef doc_to_markdown(input_path, output_dir='./markdown_output'):\n    \"\"\"Convert document images or PDFs to Markdown.\"\"\"\n    doc = OpenOCR(\n        task='doc',\n        use_layout_detection=True,\n        use_chart_recognition=True,\n    )\n\n    os.makedirs(output_dir, exist_ok=True)\n\n    result = doc(image_path=input_path)\n\n    if isinstance(result, list):\n        # PDF: multiple pages\n        for page_result in result:\n            doc.save_to_markdown(page_result, output_dir)\n        print(f\"Converted {len(result)} pages to Markdown\")\n    else:\n        # Single image\n        doc.save_to_markdown(result, output_dir)\n        print(\"Converted image to Markdown\")\n\n    print(f\"Output saved to: {output_dir}\")\n\n# Convert a scanned PDF\ndoc_to_markdown('paper.pdf')\n\n# Convert a document image\ndoc_to_markdown('page.jpg')\n\nExample 3: Multi-Task Comparison\nfrom openocr import OpenOCR\n\ndef compare_tasks(image_path):\n    \"\"\"Compare results from different OpenOCR tasks.\"\"\"\n\n    # 1. Detection only\n    det = OpenOCR(task='det')\n    det_result = det(image_path=image_path)\n    num_boxes = len(det_result[0]['boxes'])\n    print(f\"Detection: Found {num_boxes} text regions\")\n\n    # 2. End-to-End OCR\n    ocr = OpenOCR(task='ocr')\n    ocr_results, _ = ocr(image_path=image_path)\n    ocr_texts = [line[1][0] for result in ocr_results for line in result]\n    print(f\"OCR: Extracted {len(ocr_texts)} text lines\")\n    for t in ocr_texts[:5]:\n        print(f\"  - {t}\")\n\n    # 3. Universal Recognition\n    unirec = OpenOCR(task='unirec')\n    text, _ = unirec(image_path=image_path)\n    print(f\"UniRec: {text[:200]}...\")\n\n    return {\n        'det_boxes': num_boxes,\n        'ocr_texts': ocr_texts,\n        'unirec_text': text,\n    }\n\ncompare_tasks('document.jpg')\n\nExample 4: Gradio Demo Launch\nfrom openocr import launch_openocr_demo, launch_unirec_demo, launch_opendoc_demo\n\n# Launch OCR demo\nlaunch_openocr_demo(share=True, server_port=7860, server_name='0.0.0.0')\n\n# Launch UniRec demo\nlaunch_unirec_demo(share=True, server_port=7861)\n\n# Launch OpenDoc demo\nlaunch_opendoc_demo(share=True, server_port=7862)\n\nLimitations\nText recognition accuracy depends on image quality\nVery small or heavily rotated text may reduce accuracy\nserver mode requires PyTorch and is slower than mobile mode\nUniRec and Doc tasks use 0.1B parameter VLM, larger models may yield better results\nPDF processing converts pages to images internally, very large PDFs may use significant memory\nComplex handwritten text accuracy varies\nGPU recommended for best performance, especially for Doc and UniRec tasks\nInstallation\n# Basic installation (CPU, ONNX backend)\npip install openocr-python\n\n# GPU-accelerated ONNX inference\npip install openocr-python[onnx-gpu]\n\n# PyTorch backend (for server mode)\npip install openocr-python[pytorch]\n\n# Gradio demos\npip install openocr-python[gradio]\n\n# All optional dependencies\npip install openocr-python[all]\n\n# From source\ngit clone https://github.com/Topdu/OpenOCR.git\ncd OpenOCR\npython build_package.py\npip install ./build/dist/openocr_python-*.whl\n\nResources\nOpenOCR GitHub\nPyPI Package\nUniRec Paper\nOpenDoc Documentation\nModel Zoo & Configs"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/Topdu/openocr-skill",
    "publisherUrl": "https://clawhub.ai/Topdu/openocr-skill",
    "owner": "Topdu",
    "version": "0.1.6",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/openocr-skill",
    "downloadUrl": "https://openagent3.xyz/downloads/openocr-skill",
    "agentUrl": "https://openagent3.xyz/skills/openocr-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/openocr-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/openocr-skill/agent.md"
  }
}