# Send Video Captions to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "video-captions",
    "name": "Video Captions",
    "source": "tencent",
    "type": "skill",
    "category": "内容创作",
    "sourceUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "canonicalUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/video-captions",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=video-captions",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "engines.md",
      "formats.md",
      "platforms.md",
      "styling.md"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "video-captions",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T22:06:25.955Z",
      "expiresAt": "2026-05-07T22:06:25.955Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=video-captions",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=video-captions",
        "contentDisposition": "attachment; filename=\"video-captions-1.0.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "video-captions"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/video-captions"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/video-captions",
    "downloadUrl": "https://openagent3.xyz/downloads/video-captions",
    "agentUrl": "https://openagent3.xyz/skills/video-captions/agent",
    "manifestUrl": "https://openagent3.xyz/skills/video-captions/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/video-captions/agent.md"
  }
}
```
## Documentation

### When to Use

User needs captions or subtitles for video content. Agent handles transcription, timing, formatting, styling, translation, and burn-in across all major formats and platforms.

### Quick Reference

TopicFileTranscription enginesengines.mdOutput formatsformats.mdStyling presetsstyling.mdPlatform requirementsplatforms.md

### 1. Engine Selection by Context

ScenarioEngineWhyDefault (recommended)Whisper local100% offline, no data leaves machineApple SiliconMLX WhisperNative acceleration, still localWord timestampswhisper-timestampedDTW alignment, still local

Default: Whisper local (turbo model). See engines.md for optional cloud alternatives.

### 2. Format Selection by Platform

PlatformFormatNotesYouTubeVTT or SRTVTT preferredNetflix/ProTTMLStrict timing rulesSocial (TikTok, IG)Burn-in (ASS)Embedded in videoGeneralSRTUniversal compatibilityKaraoke/effectsASSAdvanced styling

Ask user's target platform if not specified.

### 3. Professional Timing Standards

Netflix-compliant (default):

Min duration: 5/6 second (0.833s)
Max duration: 7 seconds
Max chars/line: 42
Max lines: 2
Gap between subtitles: 2+ frames

Social media:

Shorter segments (2-4 words)
More frequent breaks
Centered or dynamic positioning

### 4. Segmentation Rules

Break lines:

After punctuation marks
Before conjunctions (and, but, or)
Before prepositions

Never separate:

Article from noun
Adjective from noun
First name from last name
Verb from subject pronoun
Auxiliary from verb

### 5. Word-Level Timestamps

Use word timestamps for:

Karaoke-style highlighting
Precise sync verification
TikTok/Instagram animated captions
Quality checking transcript accuracy

Enable with --word-timestamps flag.

### 6. Speaker Identification

For multi-speaker content:

Use diarization (pyannote local, or cloud APIs if configured)
Format: [Speaker 1] or [Name] if known
SDH format: JOHN: What do you think?

### 7. Quality Verification

Before delivering:

Check sync at start, middle, end
Verify character limits per line
Confirm speaker labels if multi-speaker
Test burn-in render quality

### Basic Transcription

# Auto-detect language, output SRT
whisper video.mp4 --model turbo --output_format srt

# Specify language
whisper video.mp4 --model turbo --language es --output_format srt

# Multiple formats
whisper video.mp4 --model turbo --output_format all

### Word-Level Timestamps

# Using whisper-timestamped
whisper_timestamped video.mp4 --model large-v3 --output_format srt

# With VAD pre-processing (reduces hallucinations)
whisper_timestamped video.mp4 --vad silero --accurate

### Styled Subtitles (ASS)

# Generate SRT first, then convert with style
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

### Burn-In for Social Media

# TikTok/Instagram style (centered, bold)
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'" output.mp4

# Netflix style (bottom, clean)
ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

### Translation

# Transcribe + translate to English
whisper video.mp4 --model turbo --task translate --output_format srt

### Format Conversion

# SRT to VTT
ffmpeg -i video.srt video.vtt

# SRT to ASS (for styling)
ffmpeg -i video.srt video.ass

### Caption Traps

Hallucinations on silence → Use VAD pre-processing or trim silent sections
Wrong language detection → Specify --language explicitly for mixed content
Timing drift in long videos → Use word timestamps + manual spot-check
Character limit violations → Set --max_line_width 42 for Netflix compliance
Missing speaker IDs → Enable diarization for multi-speaker content
Burn-in quality loss → Use high bitrate output (-b:v 8M)

### YouTube Video

Transcribe: whisper video.mp4 --output_format vtt
Upload .vtt to YouTube Studio
Review auto-sync suggestions

### TikTok/Instagram Reel

Transcribe with word timestamps
Apply bold animated style
Burn-in: ffmpeg -i video.mp4 -vf "subtitles=video.ass" -c:a copy output.mp4
Export at platform resolution

### Netflix/Professional

Use Whisper large-v3 for best local accuracy
Export TTML format
Verify: 42 chars/line, 2 lines max, timing gaps
Include translator credit as last subtitle

### Podcast/Interview

Enable speaker diarization
Format as dialogue: [SPEAKER]: text
SDH option: include [music], [laughter] descriptions

### Foreign Film Translation

Transcribe in original language
Translate: --task translate for English
Or use external translation + timing sync

### External Endpoints

Default: 100% LOCAL processing. No network calls.

EndpointData SentWhen UsedWhisper (local)None (local)Default — alwaysapi.assemblyai.comAudio fileOnly if user sets ASSEMBLYAI_API_KEYapi.deepgram.comAudio fileOnly if user sets DEEPGRAM_API_KEY

Cloud APIs are documented as alternatives but never used unless user explicitly provides API keys and requests cloud processing. By default, all processing stays on your machine.

### Security & Privacy

Default workflow is 100% offline:

Whisper runs locally on your machine
Generated subtitle files stay local
Burned-in videos stay local
No network calls made

Cloud APIs are OPTIONAL and OPT-IN:

Only used if you set ASSEMBLYAI_API_KEY or DEEPGRAM_API_KEY
Only triggered when you explicitly use cloud engine commands
If you never set these keys, no audio ever leaves your machine

This skill does NOT:

Upload anything by default
Require internet connection for basic use
Store data externally

### Related Skills

Install with clawhub install <slug> if user confirms:

ffmpeg — video/audio processing
video — general video tasks
video-edit — video editing
audio — audio processing

### Feedback

If useful: clawhub star video-captions
Stay updated: clawhub sync
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: ivangdavila
- Version: 1.0.1
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-04-30T22:06:25.955Z
- Expires at: 2026-05-07T22:06:25.955Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/video-captions)
- [Send to Agent page](https://openagent3.xyz/skills/video-captions/agent)
- [JSON manifest](https://openagent3.xyz/skills/video-captions/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/video-captions/agent.md)
- [Download page](https://openagent3.xyz/downloads/video-captions)