← All skills
Tencent SkillHub Β· AI

Senior Computer Vision

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, references/computer_vision_architectures.md, references/object_detection_optimization.md, references/production_vision_systems.md, references/reference-docs-and-commands.md, scripts/dataset_pipeline_builder.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
2.1.1

Documentation

ClawHub primary doc Primary doc: SKILL.md 32 sections Open source page

Senior Computer Vision Engineer

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Table of Contents

Quick Start Core Expertise Tech Stack Workflow 1: Object Detection Pipeline Workflow 2: Model Optimization and Deployment Workflow 3: Custom Dataset Preparation Architecture Selection Guide Reference Documentation Common Commands

Quick Start

# Generate training configuration for YOLO or Faster R-CNN python scripts/vision_model_trainer.py models/ --task detection --arch yolov8 # Analyze model for optimization opportunities (quantization, pruning) python scripts/inference_optimizer.py model.pt --target onnx --benchmark # Build dataset pipeline with augmentations python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on: Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2 Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything) Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT) Video Analysis: Object tracking (ByteTrack, SORT), action recognition 3D Vision: Depth estimation, point cloud processing, NeRF Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

CategoryTechnologiesFrameworksPyTorch, torchvision, timmDetectionUltralytics (YOLO), Detectron2, MMDetectionSegmentationsegment-anything, mmsegmentationOptimizationONNX, TensorRT, OpenVINO, torch.compileImage ProcessingOpenCV, Pillow, albumentationsAnnotationCVAT, Label Studio, RoboflowExperiment TrackingMLflow, Weights & BiasesServingTriton Inference Server, TorchServe

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

  • Analyze the detection task requirements:
  • Detection Requirements Analysis:
  • Target objects: [list specific classes to detect]
  • Real-time requirement: [yes/no, target FPS]
  • Accuracy priority: [speed vs accuracy trade-off]
  • Deployment target: [cloud GPU, edge device, mobile]
  • Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements: RequirementRecommended ArchitectureWhyReal-time (>30 FPS)YOLOv8/v11, RT-DETRSingle-stage, optimized for speedHigh accuracyFaster R-CNN, DINOTwo-stage, better localizationSmall objectsYOLO + SAHI, Faster R-CNN + FPNMulti-scale detectionEdge deploymentYOLOv8n, MobileNetV3-SSDLightweight architecturesTransformer-basedDETR, DINO, RT-DETREnd-to-end, no NMS required

Step 3: Prepare Dataset

Convert annotations to required format: # COCO format (recommended) python scripts/dataset_pipeline_builder.py data/images/ \ --annotations data/labels/ \ --format coco \ --split 0.8 0.1 0.1 \ --output data/coco/ # Verify dataset python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration: # For Ultralytics YOLO python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch yolov8m \ --epochs 100 \ --batch 16 \ --imgsz 640 \ --output configs/ # For Detectron2 python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch faster_rcnn_R_50_FPN \ --framework detectron2 \ --output configs/

Step 5: Train and Validate

# Ultralytics training yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640 # Detectron2 training python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1 # Validate on test set yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze: MetricTargetDescriptionmAP@50>0.7Mean Average Precision at IoU 0.5mAP@50:95>0.5COCO primary metricPrecision>0.8Low false positivesRecall>0.8Low missed detectionsInference time<33msFor 30 FPS real-time

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

  • # Measure current model performance
  • python scripts/inference_optimizer.py model.pt \
  • --benchmark \
  • --input-size 640 640 \
  • --batch-sizes 1 4 8 16 \
  • --warmup 10 \
  • --iterations 100
  • Expected output:
  • Baseline Performance (PyTorch FP32):
  • Batch 1: 45.2ms (22.1 FPS)
  • Batch 4: 89.4ms (44.7 FPS)
  • Batch 8: 165.3ms (48.4 FPS)
  • Memory: 2.1 GB
  • Parameters: 25.9M

Step 2: Select Optimization Strategy

Deployment TargetOptimization PathNVIDIA GPU (cloud)PyTorch β†’ ONNX β†’ TensorRT FP16NVIDIA GPU (edge)PyTorch β†’ TensorRT INT8Intel CPUPyTorch β†’ ONNX β†’ OpenVINOApple SiliconPyTorch β†’ CoreMLGeneric CPUPyTorch β†’ ONNX RuntimeMobilePyTorch β†’ TFLite or ONNX Mobile

Step 3: Export to ONNX

# Export with dynamic batch size python scripts/inference_optimizer.py model.pt \ --export onnx \ --input-size 640 640 \ --dynamic-batch \ --simplify \ --output model.onnx # Verify ONNX model python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration: # Generate calibration dataset python scripts/inference_optimizer.py model.onnx \ --quantize int8 \ --calibration-data data/calibration/ \ --calibration-samples 500 \ --output model_int8.onnx Quantization impact analysis: PrecisionSizeSpeedAccuracy DropFP32100%1x0%FP1650%1.5-2x<0.5%INT825%2-4x1-3%

Step 5: Convert to Target Runtime

# TensorRT (NVIDIA GPU) trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 # OpenVINO (Intel) mo --input_model model.onnx --output_dir openvino/ # CoreML (Apple) python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

  • python scripts/inference_optimizer.py model.engine \
  • --benchmark \
  • --runtime tensorrt \
  • --compare model.pt
  • Expected speedup:
  • Optimization Results:
  • Original (PyTorch FP32): 45.2ms
  • Optimized (TensorRT FP16): 12.8ms
  • Speedup: 3.5x
  • Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

  • # Analyze image dataset
  • python scripts/dataset_pipeline_builder.py data/raw/ \
  • --analyze \
  • --output analysis/
  • Analysis report includes:
  • Dataset Analysis:
  • Total images: 5,234
  • Image sizes: 640x480 to 4096x3072 (variable)
  • Formats: JPEG (4,891), PNG (343)
  • Corrupted: 12 files
  • Duplicates: 45 pairs
  • Annotation Analysis:
  • Format detected: Pascal VOC XML
  • Total annotations: 28,456
  • Classes: 5 (car, person, bicycle, dog, cat)
  • Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
  • Empty images: 234

Step 2: Clean and Validate

# Remove corrupted and duplicate images python scripts/dataset_pipeline_builder.py data/raw/ \ --clean \ --remove-corrupted \ --remove-duplicates \ --output data/cleaned/

Step 3: Convert Annotation Format

# Convert VOC to COCO format python scripts/dataset_pipeline_builder.py data/cleaned/ \ --annotations data/annotations/ \ --input-format voc \ --output-format coco \ --output data/coco/ Supported format conversions: FromToPascal VOC XMLCOCO JSONYOLO TXTCOCO JSONCOCO JSONYOLO TXTLabelMe JSONCOCO JSONCVAT XMLCOCO JSON

Step 4: Apply Augmentations

# Generate augmentation config python scripts/dataset_pipeline_builder.py data/coco/ \ --augment \ --aug-config configs/augmentation.yaml \ --output data/augmented/ Recommended augmentations for detection: # configs/augmentation.yaml augmentations: geometric: - horizontal_flip: { p: 0.5 } - vertical_flip: { p: 0.1 } # Only if orientation invariant - rotate: { limit: 15, p: 0.3 } - scale: { scale_limit: 0.2, p: 0.5 } color: - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 } - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 } - blur: { blur_limit: 3, p: 0.1 } advanced: - mosaic: { p: 0.5 } # YOLO-style mosaic - mixup: { p: 0.1 } # Image mixing - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

python scripts/dataset_pipeline_builder.py data/augmented/ \ --split 0.8 0.1 0.1 \ --stratify \ --seed 42 \ --output data/final/ Split strategy guidelines: Dataset SizeTrainValTest<1,000 images70%15%15%1,000-10,00080%10%10%>10,00090%5%5%

Step 6: Generate Dataset Configuration

# For Ultralytics YOLO python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config yolo \ --output data.yaml # For Detectron2 python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config detectron2 \ --output detectron2_config.py

Object Detection Architectures

ArchitectureSpeedAccuracyBest ForYOLOv8n1.2ms37.3 mAPEdge, mobile, real-timeYOLOv8s2.1ms44.9 mAPBalanced speed/accuracyYOLOv8m4.2ms50.2 mAPGeneral purposeYOLOv8l6.8ms52.9 mAPHigh accuracyYOLOv8x10.1ms53.9 mAPMaximum accuracyRT-DETR-L5.3ms53.0 mAPTransformer, no NMSFaster R-CNN R5046ms40.2 mAPTwo-stage, high qualityDINO-4scale85ms49.0 mAPSOTA transformer

Segmentation Architectures

ArchitectureTypeSpeedBest ForYOLOv8-segInstance4.5msReal-time instance segMask R-CNNInstance67msHigh-quality masksSAMPromptable50msZero-shot segmentationDeepLabV3+Semantic25msScene parsingSegFormerSemantic15msEfficient semantic seg

CNN vs Vision Transformer Trade-offs

AspectCNN (YOLO, R-CNN)ViT (DETR, DINO)Training data needed1K-10K images10K-100K+ imagesTraining timeFastSlow (needs more epochs)Inference speedFasterSlowerSmall objectsGood with FPNNeeds multi-scaleGlobal contextLimitedExcellentPositional encodingImplicitExplicit

Reference Documentation

β†’ See references/reference-docs-and-commands.md for details

Performance Targets

MetricReal-timeHigh AccuracyEdgeFPS>30>10>15mAP@50>0.6>0.8>0.5Latency P99<50ms<150ms<100msGPU Memory<4GB<8GB<2GBModel Size<50MB<200MB<20MB

Resources

Architecture Guide: references/computer_vision_architectures.md Optimization Guide: references/object_detection_optimization.md Deployment Guide: references/production_vision_systems.md Scripts: scripts/ directory for automation tools

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
5 Docs1 Scripts
  • SKILL.md Primary doc
  • references/computer_vision_architectures.md Docs
  • references/object_detection_optimization.md Docs
  • references/production_vision_systems.md Docs
  • references/reference-docs-and-commands.md Docs
  • scripts/dataset_pipeline_builder.py Scripts