Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.
Quick Start Core Expertise Tech Stack Workflow 1: Object Detection Pipeline Workflow 2: Model Optimization and Deployment Workflow 3: Custom Dataset Preparation Architecture Selection Guide Reference Documentation Common Commands
# Generate training configuration for YOLO or Faster R-CNN python scripts/vision_model_trainer.py models/ --task detection --arch yolov8 # Analyze model for optimization opportunities (quantization, pruning) python scripts/inference_optimizer.py model.pt --target onnx --benchmark # Build dataset pipeline with augmentations python scripts/dataset_pipeline_builder.py images/ --format coco --augment
This skill provides guidance on: Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2 Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything) Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT) Video Analysis: Object tracking (ByteTrack, SORT), action recognition 3D Vision: Depth estimation, point cloud processing, NeRF Production Deployment: ONNX, TensorRT, OpenVINO, CoreML
CategoryTechnologiesFrameworksPyTorch, torchvision, timmDetectionUltralytics (YOLO), Detectron2, MMDetectionSegmentationsegment-anything, mmsegmentationOptimizationONNX, TensorRT, OpenVINO, torch.compileImage ProcessingOpenCV, Pillow, albumentationsAnnotationCVAT, Label Studio, RoboflowExperiment TrackingMLflow, Weights & BiasesServingTriton Inference Server, TorchServe
Use this workflow when building an object detection system from scratch.
Choose architecture based on requirements: RequirementRecommended ArchitectureWhyReal-time (>30 FPS)YOLOv8/v11, RT-DETRSingle-stage, optimized for speedHigh accuracyFaster R-CNN, DINOTwo-stage, better localizationSmall objectsYOLO + SAHI, Faster R-CNN + FPNMulti-scale detectionEdge deploymentYOLOv8n, MobileNetV3-SSDLightweight architecturesTransformer-basedDETR, DINO, RT-DETREnd-to-end, no NMS required
Convert annotations to required format: # COCO format (recommended) python scripts/dataset_pipeline_builder.py data/images/ \ --annotations data/labels/ \ --format coco \ --split 0.8 0.1 0.1 \ --output data/coco/ # Verify dataset python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
Generate training configuration: # For Ultralytics YOLO python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch yolov8m \ --epochs 100 \ --batch 16 \ --imgsz 640 \ --output configs/ # For Detectron2 python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch faster_rcnn_R_50_FPN \ --framework detectron2 \ --output configs/
# Ultralytics training yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640 # Detectron2 training python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1 # Validate on test set yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
Key metrics to analyze: MetricTargetDescriptionmAP@50>0.7Mean Average Precision at IoU 0.5mAP@50:95>0.5COCO primary metricPrecision>0.8Low false positivesRecall>0.8Low missed detectionsInference time<33msFor 30 FPS real-time
Use this workflow when preparing a trained model for production deployment.
Deployment TargetOptimization PathNVIDIA GPU (cloud)PyTorch β ONNX β TensorRT FP16NVIDIA GPU (edge)PyTorch β TensorRT INT8Intel CPUPyTorch β ONNX β OpenVINOApple SiliconPyTorch β CoreMLGeneric CPUPyTorch β ONNX RuntimeMobilePyTorch β TFLite or ONNX Mobile
# Export with dynamic batch size python scripts/inference_optimizer.py model.pt \ --export onnx \ --input-size 640 640 \ --dynamic-batch \ --simplify \ --output model.onnx # Verify ONNX model python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
For INT8 quantization with calibration: # Generate calibration dataset python scripts/inference_optimizer.py model.onnx \ --quantize int8 \ --calibration-data data/calibration/ \ --calibration-samples 500 \ --output model_int8.onnx Quantization impact analysis: PrecisionSizeSpeedAccuracy DropFP32100%1x0%FP1650%1.5-2x<0.5%INT825%2-4x1-3%
# TensorRT (NVIDIA GPU) trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 # OpenVINO (Intel) mo --input_model model.onnx --output_dir openvino/ # CoreML (Apple) python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
Use this workflow when preparing a computer vision dataset for training.
# Remove corrupted and duplicate images python scripts/dataset_pipeline_builder.py data/raw/ \ --clean \ --remove-corrupted \ --remove-duplicates \ --output data/cleaned/
# Convert VOC to COCO format python scripts/dataset_pipeline_builder.py data/cleaned/ \ --annotations data/annotations/ \ --input-format voc \ --output-format coco \ --output data/coco/ Supported format conversions: FromToPascal VOC XMLCOCO JSONYOLO TXTCOCO JSONCOCO JSONYOLO TXTLabelMe JSONCOCO JSONCVAT XMLCOCO JSON
# Generate augmentation config python scripts/dataset_pipeline_builder.py data/coco/ \ --augment \ --aug-config configs/augmentation.yaml \ --output data/augmented/ Recommended augmentations for detection: # configs/augmentation.yaml augmentations: geometric: - horizontal_flip: { p: 0.5 } - vertical_flip: { p: 0.1 } # Only if orientation invariant - rotate: { limit: 15, p: 0.3 } - scale: { scale_limit: 0.2, p: 0.5 } color: - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 } - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 } - blur: { blur_limit: 3, p: 0.1 } advanced: - mosaic: { p: 0.5 } # YOLO-style mosaic - mixup: { p: 0.1 } # Image mixing - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
python scripts/dataset_pipeline_builder.py data/augmented/ \ --split 0.8 0.1 0.1 \ --stratify \ --seed 42 \ --output data/final/ Split strategy guidelines: Dataset SizeTrainValTest<1,000 images70%15%15%1,000-10,00080%10%10%>10,00090%5%5%
# For Ultralytics YOLO python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config yolo \ --output data.yaml # For Detectron2 python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config detectron2 \ --output detectron2_config.py
ArchitectureSpeedAccuracyBest ForYOLOv8n1.2ms37.3 mAPEdge, mobile, real-timeYOLOv8s2.1ms44.9 mAPBalanced speed/accuracyYOLOv8m4.2ms50.2 mAPGeneral purposeYOLOv8l6.8ms52.9 mAPHigh accuracyYOLOv8x10.1ms53.9 mAPMaximum accuracyRT-DETR-L5.3ms53.0 mAPTransformer, no NMSFaster R-CNN R5046ms40.2 mAPTwo-stage, high qualityDINO-4scale85ms49.0 mAPSOTA transformer
ArchitectureTypeSpeedBest ForYOLOv8-segInstance4.5msReal-time instance segMask R-CNNInstance67msHigh-quality masksSAMPromptable50msZero-shot segmentationDeepLabV3+Semantic25msScene parsingSegFormerSemantic15msEfficient semantic seg
AspectCNN (YOLO, R-CNN)ViT (DETR, DINO)Training data needed1K-10K images10K-100K+ imagesTraining timeFastSlow (needs more epochs)Inference speedFasterSlowerSmall objectsGood with FPNNeeds multi-scaleGlobal contextLimitedExcellentPositional encodingImplicitExplicit
β See references/reference-docs-and-commands.md for details
MetricReal-timeHigh AccuracyEdgeFPS>30>10>15mAP@50>0.6>0.8>0.5Latency P99<50ms<150ms<100msGPU Memory<4GB<8GB<2GBModel Size<50MB<200MB<20MB
Architecture Guide: references/computer_vision_architectures.md Optimization Guide: references/object_detection_optimization.md Deployment Guide: references/production_vision_systems.md Scripts: scripts/ directory for automation tools
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.