Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Tag and annotate images using Apple Vision framework (macOS only). Detects faces, bodies, hands, text (OCR), barcodes, objects, scene labels, and saliency re...
Tag and annotate images using Apple Vision framework (macOS only). Detects faces, bodies, hands, text (OCR), barcodes, objects, scene labels, and saliency re...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
macOS-native image analysis using Apple's Vision framework. All processing is local β no cloud APIs, no API keys needed.
macOS 12+ (Monterey or later) Xcode Command Line Tools Python 3 with Pillow
# Install Xcode CLI tools if needed xcode-select --install # Install Pillow pip3 install Pillow # Compile the Swift binary cd scripts/ swiftc -O -o image_tagger image_tagger.swift
./scripts/image_tagger /path/to/photo.jpg Output includes: faces β bounding boxes, roll/yaw/pitch, landmarks (eyes, nose, mouth) bodies β 18 skeleton joints with confidence scores hands β 21 joints per hand (left/right) text β OCR results with bounding boxes labels β scene classification (desk, outdoor, clothing, etc.) barcodes β QR codes, UPC, etc. saliency β attention and objectness regions
python3 scripts/annotate_image.py photo.jpg output.jpg Draws colored boxes: π’ Green: faces π Orange: body skeleton π£ Magenta: hands π΅ Cyan: text regions π‘ Yellow: rectangles/objects Scene labels at bottom
import subprocess, json def analyze(path): r = subprocess.run(['./scripts/image_tagger', path], capture_output=True, text=True) return json.loads(r.stdout[r.stdout.find('{'):]) tags = analyze('photo.jpg') print(tags['labels']) # [{'label': 'desk', 'confidence': 0.85}, ...] print(tags['faces']) # [{'bbox': {...}, 'confidence': 0.99, 'yaw': 5.2}]
{ "dimensions": {"width": 1920, "height": 1080}, "faces": [{"bbox": {"x": 0.3, "y": 0.4, "width": 0.15, "height": 0.2}, "confidence": 0.99, "roll": -2, "yaw": 5}], "bodies": [{"joints": {"head_joint": {"x": 0.5, "y": 0.7, "confidence": 0.9}, "left_shoulder": {...}}, "confidence": 1}], "hands": [{"chirality": "left", "joints": {"VNHLKWRI": {"x": 0.4, "y": 0.3, "confidence": 0.85}}}], "text": [{"text": "HELLO", "confidence": 0.95, "bbox": {...}}], "labels": [{"label": "outdoor", "confidence": 0.88}, {"label": "sky", "confidence": 0.75}], "saliency": {"attentionBased": [{"x": 0.2, "y": 0.1, "width": 0.6, "height": 0.8}]} }
FeatureDetailsFacesBounding box, confidence, roll/yaw/pitch angles, 76-point landmarksBodies18 joints: head, neck, shoulders, elbows, wrists, hips, knees, anklesHands21 joints per hand, left/right chiralityText (OCR)Recognized text with confidence and bounding boxesLabels1000+ scene/object categories (clothing, furniture, outdoor, etc.)BarcodesQR, UPC, EAN, Code128, PDF417, Aztec, DataMatrixSaliencyAttention-based and objectness-based regions
Photo tagging β Auto-tag photos with detected objects/scenes Posture monitoring β Track face/body position for ergonomics Document scanning β Extract text from images Security β Detect people in camera feeds Accessibility β Describe image contents
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.