Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
AI-native OCR platform that turns documents into high-accuracy data in minutes. Using multi-model consensus, DeepRead achieves 97%+ accuracy and flags only uncertain fields for Human-in-the-Loop (HIL) review—reducing manual work from 100% to 5-10%. Zero prompt engineering required.
AI-native OCR platform that turns documents into high-accuracy data in minutes. Using multi-model consensus, DeepRead achieves 97%+ accuracy and flags only uncertain fields for Human-in-the-Loop (HIL) review—reducing manual work from 100% to 5-10%. Zero prompt engineering required.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
DeepRead is an AI-native OCR platform that turns documents into high-accuracy data in minutes. Using multi-model consensus, DeepRead achieves 97%+ accuracy and flags only uncertain fields for Human-in-the-Loop (HIL) review—reducing manual work from 100% to 5-10%. Zero prompt engineering required.
DeepRead is a production-grade document processing API that gives you high-accuracy structured data output in minutes with human review flagging so manual review is limited to the flagged exceptions Core Features: Text Extraction: Convert PDFs and images to clean markdown Structured Data: Extract JSON fields with confidence scores HIL Interface: Built-in Human-in-the-Loop review — uncertain fields are flagged (hil_flag) so only exceptions need manual review Multi-Pass Processing: Multiple validation passes for maximum accuracy Multi-Model Consensus: Cross-validation between models for reliability Free Tier: 2,000 pages/month (no credit card required)
Sign up and create an API key: # Visit the dashboard https://www.deepread.tech/dashboard # Or use this direct link https://www.deepread.tech/dashboard/?utm_source=clawdhub Save your API key: export DEEPREAD_API_KEY="sk_live_your_key_here"
Add to your clawdbot.config.json5: { skills: { entries: { "deepread": { enabled: true // API key is read from DEEPREAD_API_KEY environment variable // Do NOT hardcode your API key here } } } }
Option A: With Webhook (Recommended) # Upload PDF with webhook notification curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@document.pdf" \ -F "webhook_url=https://your-app.com/webhooks/deepread" # Returns immediately { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued" } # Your webhook receives results when processing completes (2-5 minutes) Option B: Poll for Results # Upload PDF without webhook curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@document.pdf" # Returns immediately { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued" } # Poll until completed curl https://api.deepread.tech/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \ -H "X-API-Key: $DEEPREAD_API_KEY"
Extract text as clean markdown: # With webhook (recommended) curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F "webhook_url=https://your-app.com/webhook" # OR poll for completion curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" # Then poll curl https://api.deepread.tech/v1/jobs/JOB_ID \ -H "X-API-Key: $DEEPREAD_API_KEY" Response when completed: { "id": "550e8400-...", "status": "completed", "result": { "text": "# INVOICE\n\n**Vendor:** Acme Corp\n**Total:** $1,250.00..." } }
Extract specific fields with confidence scoring: curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F 'schema={ "type": "object", "properties": { "vendor": { "type": "string", "description": "Vendor company name" }, "total": { "type": "number", "description": "Total invoice amount" }, "invoice_date": { "type": "string", "description": "Invoice date in MM/DD/YYYY format" } } }' Response includes confidence flags: { "status": "completed", "result": { "text": "# INVOICE\n\n**Vendor:** Acme Corp...", "data": { "vendor": { "value": "Acme Corp", "hil_flag": false, "found_on_page": 1 }, "total": { "value": 1250.00, "hil_flag": false, "found_on_page": 1 }, "invoice_date": { "value": "2024-10-??", "hil_flag": true, "reason": "Date partially obscured", "found_on_page": 1 } }, "metadata": { "fields_requiring_review": 1, "total_fields": 3, "review_percentage": 33.3 } } }
Extract arrays and nested objects: curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F 'schema={ "type": "object", "properties": { "vendor": {"type": "string"}, "total": {"type": "number"}, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "quantity": {"type": "number"}, "price": {"type": "number"} } } } } }'
Get per-page OCR results with quality flags: curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@contract.pdf" \ -F "include_pages=true" Response: { "result": { "text": "Combined text from all pages...", "pages": [ { "page_number": 1, "text": "# Contract Agreement\n\n...", "hil_flag": false }, { "page_number": 2, "text": "Terms and C??diti??s...", "hil_flag": true, "reason": "Multiple unrecognized characters" } ], "metadata": { "pages_requiring_review": 1, "total_pages": 2 } } }
Invoice Processing: Extract vendor, totals, line items Receipt OCR: Parse merchant, items, totals Contract Analysis: Extract parties, dates, terms Form Digitization: Convert paper forms to structured data Document Workflows: Any process requiring OCR + data extraction Quality-Critical Apps: When you need to know which extractions are uncertain
Real-time Processing: Processing takes 2-5 minutes (async workflow) Batch >2,000 pages/month: Upgrade to PRO or SCALE tier
PDF → Convert → Rotate Correction → OCR → Multi-Model Validation → Extract → Done The pipeline automatically handles: Document rotation and orientation correction Multi-pass validation for accuracy Cross-model consensus for reliability Field-level confidence scoring
DeepRead includes a built-in Human-in-the-Loop (HIL) review system. The AI compares extracted text to the original image and sets hil_flag on each field: hil_flag: false = Clear, confident extraction → Auto-process hil_flag: true = Uncertain extraction → Routed to human review How HIL works: Fields extracted with high confidence are auto-approved Uncertain fields are flagged with hil_flag: true and a reason Only flagged fields need human review (typically 5-10% of total fields) Review flagged fields in DeepRead Preview (preview.deepread.tech) — a dedicated HIL review interface where reviewers can see the original document side-by-side with extracted data, correct flagged fields, and approve results Or integrate with your own review queue using the hil_flag data in the API response AI flags extractions when: Text is handwritten, blurry, or low quality Multiple possible interpretations exist Characters are partially visible or unclear Field not found in document This is multimodal AI determination, not rule-based.
Create reusable, optimized schemas for specific document types: # List your blueprints curl https://api.deepread.tech/v1/blueprints \ -H "X-API-Key: $DEEPREAD_API_KEY" # Use blueprint instead of inline schema curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F "blueprint_id=660e8400-e29b-41d4-a716-446655440001" Benefits: 20-30% accuracy improvement over baseline schemas Reusable across similar documents Versioned with rollback support How to create blueprints: # Create a blueprint from training data curl -X POST https://api.deepread.tech/v1/optimize \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "utility_invoice", "description": "Optimized for utility invoices", "document_type": "invoice", "initial_schema": { "type": "object", "properties": { "vendor": {"type": "string", "description": "Vendor name"}, "total": {"type": "number", "description": "Total amount"} } }, "training_documents": ["doc1.pdf", "doc2.pdf", "doc3.pdf"], "ground_truth_data": [ {"vendor": "Acme Power", "total": 125.50}, {"vendor": "City Electric", "total": 89.25} ], "target_accuracy": 95.0, "max_iterations": 5 }' # Returns: {"job_id": "...", "blueprint_id": "...", "status": "pending"} # Check optimization status curl https://api.deepread.tech/v1/blueprints/jobs/JOB_ID \ -H "X-API-Key: $DEEPREAD_API_KEY" # Use blueprint (once completed) curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F "blueprint_id=BLUEPRINT_ID"
Get notified when processing completes instead of polling: curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@invoice.pdf" \ -F "webhook_url=https://your-app.com/webhooks/deepread" Your webhook receives this payload when processing completes: { "job_id": "550e8400-...", "status": "completed", "created_at": "2025-01-27T10:00:00Z", "completed_at": "2025-01-27T10:02:30Z", "result": { "text": "...", "data": {...} }, "preview_url": "https://preview.deepread.tech/abc1234" } Benefits: No polling required Instant notification when done Lower latency Better for production workflows
DeepRead Preview (preview.deepread.tech) is the built-in Human-in-the-Loop review interface. Reviewers can view the original document alongside extracted data, correct flagged fields, and approve results. Preview URLs can also be shared without authentication: # Request preview URL curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@document.pdf" \ -F "include_images=true" # Get preview URL in response { "result": { "text": "...", "data": {...} }, "preview_url": "https://preview.deepread.tech/Xy9aB12" } Public Preview Endpoint: # No authentication required curl https://api.deepread.tech/v1/preview/Xy9aB12
2,000 pages/month 10 requests/minute Full feature access (OCR + structured extraction + blueprints)
PRO: 50,000 pages/month, 100 requests/minute @ $99/mo SCALE: Custom volume pricing (contact sales) Upgrade: https://www.deepread.tech/dashboard/billing?utm_source=clawdhub
Every response includes quota information: X-RateLimit-Limit: 2000 X-RateLimit-Remaining: 1847 X-RateLimit-Used: 153 X-RateLimit-Reset: 1730419200
✅ Recommended: Webhook notifications curl -X POST https://api.deepread.tech/v1/process \ -H "X-API-Key: $DEEPREAD_API_KEY" \ -F "file=@document.pdf" \ -F "webhook_url=https://your-app.com/webhook" Only use polling if: Testing/development Cannot expose a webhook endpoint Need synchronous response
✅ Good: Descriptive field descriptions { "vendor": { "type": "string", "description": "Vendor company name. Usually in header or top-left of invoice." } } ❌ Bad: No description { "vendor": {"type": "string"} }
Only if you can't use webhooks, poll every 5-10 seconds: import time import requests def wait_for_result(job_id, api_key): while True: response = requests.get( f"https://api.deepread.tech/v1/jobs/{job_id}", headers={"X-API-Key": api_key} ) result = response.json() if result["status"] == "completed": return result["result"] elif result["status"] == "failed": raise Exception(f"Job failed: {result.get('error')}") time.sleep(5)
Separate confident fields from uncertain ones: def process_extraction(data): confident = {} needs_review = [] for field, field_data in data.items(): if field_data["hil_flag"]: needs_review.append({ "field": field, "value": field_data["value"], "reason": field_data.get("reason") }) else: confident[field] = field_data["value"] # Auto-process confident fields save_to_database(confident) # Send uncertain fields to review queue if needs_review: send_to_review_queue(needs_review)
{"detail": "Monthly page quota exceeded"} Solution: Upgrade to PRO or wait until next billing cycle.
{"detail": "Schema must be valid JSON Schema"} Solution: Ensure schema is valid JSON and includes type and properties.
{"detail": "File size exceeds 50MB limit"} Solution: Compress PDF or split into smaller files.
{"status": "failed", "error": "PDF could not be processed"} Common causes: Corrupted PDF file Password-protected PDF Unsupported PDF version Image quality too low for OCR
{ "type": "object", "properties": { "invoice_number": { "type": "string", "description": "Unique invoice ID" }, "invoice_date": { "type": "string", "description": "Invoice date in MM/DD/YYYY format" }, "vendor": { "type": "string", "description": "Vendor company name" }, "total": { "type": "number", "description": "Total amount due including tax" }, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "quantity": {"type": "number"}, "price": {"type": "number"} } } } } }
{ "type": "object", "properties": { "merchant": { "type": "string", "description": "Store or merchant name" }, "date": { "type": "string", "description": "Transaction date" }, "total": { "type": "number", "description": "Total amount paid" }, "items": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "number"} } } } } }
{ "type": "object", "properties": { "parties": { "type": "array", "items": {"type": "string"}, "description": "Names of all parties in the contract" }, "effective_date": { "type": "string", "description": "Contract start date" }, "term_length": { "type": "string", "description": "Duration of contract" }, "termination_clause": { "type": "string", "description": "Conditions for termination" } } }
GitHub: https://github.com/deepread-tech Issues: https://github.com/deepread-tech/deep-read-service/issues Email: hello@deepread.tech
Processing Time: 2-5 minutes (async, not real-time) Async Workflow: Use webhooks (recommended) or polling Rate Limits: 10 req/min on free tier File Size Limit: 50MB per file Supported Formats: PDF, JPG, JPEG, PNG Ready to start? Get your free API key at https://www.deepread.tech/dashboard/?utm_source=clawdhub
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.