โ† All skills
Tencent SkillHub ยท Content Creation

Video Captions

Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, engines.md, formats.md, platforms.md, styling.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.1

Documentation

ClawHub primary doc Primary doc: SKILL.md 25 sections Open source page

When to Use

User needs captions or subtitles for video content. Agent handles transcription, timing, formatting, styling, translation, and burn-in across all major formats and platforms.

Quick Reference

TopicFileTranscription enginesengines.mdOutput formatsformats.mdStyling presetsstyling.mdPlatform requirementsplatforms.md

1. Engine Selection by Context

ScenarioEngineWhyDefault (recommended)Whisper local100% offline, no data leaves machineApple SiliconMLX WhisperNative acceleration, still localWord timestampswhisper-timestampedDTW alignment, still local Default: Whisper local (turbo model). See engines.md for optional cloud alternatives.

2. Format Selection by Platform

PlatformFormatNotesYouTubeVTT or SRTVTT preferredNetflix/ProTTMLStrict timing rulesSocial (TikTok, IG)Burn-in (ASS)Embedded in videoGeneralSRTUniversal compatibilityKaraoke/effectsASSAdvanced styling Ask user's target platform if not specified.

3. Professional Timing Standards

Netflix-compliant (default): Min duration: 5/6 second (0.833s) Max duration: 7 seconds Max chars/line: 42 Max lines: 2 Gap between subtitles: 2+ frames Social media: Shorter segments (2-4 words) More frequent breaks Centered or dynamic positioning

4. Segmentation Rules

Break lines: After punctuation marks Before conjunctions (and, but, or) Before prepositions Never separate: Article from noun Adjective from noun First name from last name Verb from subject pronoun Auxiliary from verb

5. Word-Level Timestamps

Use word timestamps for: Karaoke-style highlighting Precise sync verification TikTok/Instagram animated captions Quality checking transcript accuracy Enable with --word-timestamps flag.

6. Speaker Identification

For multi-speaker content: Use diarization (pyannote local, or cloud APIs if configured) Format: [Speaker 1] or [Name] if known SDH format: JOHN: What do you think?

7. Quality Verification

Before delivering: Check sync at start, middle, end Verify character limits per line Confirm speaker labels if multi-speaker Test burn-in render quality

Basic Transcription

# Auto-detect language, output SRT whisper video.mp4 --model turbo --output_format srt # Specify language whisper video.mp4 --model turbo --language es --output_format srt # Multiple formats whisper video.mp4 --model turbo --output_format all

Word-Level Timestamps

# Using whisper-timestamped whisper_timestamped video.mp4 --model large-v3 --output_format srt # With VAD pre-processing (reduces hallucinations) whisper_timestamped video.mp4 --vad silero --accurate

Styled Subtitles (ASS)

# Generate SRT first, then convert with style ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

Burn-In for Social Media

# TikTok/Instagram style (centered, bold) ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'" output.mp4 # Netflix style (bottom, clean) ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" output.mp4

Translation

# Transcribe + translate to English whisper video.mp4 --model turbo --task translate --output_format srt

Format Conversion

# SRT to VTT ffmpeg -i video.srt video.vtt # SRT to ASS (for styling) ffmpeg -i video.srt video.ass

Caption Traps

Hallucinations on silence โ†’ Use VAD pre-processing or trim silent sections Wrong language detection โ†’ Specify --language explicitly for mixed content Timing drift in long videos โ†’ Use word timestamps + manual spot-check Character limit violations โ†’ Set --max_line_width 42 for Netflix compliance Missing speaker IDs โ†’ Enable diarization for multi-speaker content Burn-in quality loss โ†’ Use high bitrate output (-b:v 8M)

YouTube Video

Transcribe: whisper video.mp4 --output_format vtt Upload .vtt to YouTube Studio Review auto-sync suggestions

TikTok/Instagram Reel

Transcribe with word timestamps Apply bold animated style Burn-in: ffmpeg -i video.mp4 -vf "subtitles=video.ass" -c:a copy output.mp4 Export at platform resolution

Netflix/Professional

Use Whisper large-v3 for best local accuracy Export TTML format Verify: 42 chars/line, 2 lines max, timing gaps Include translator credit as last subtitle

Podcast/Interview

Enable speaker diarization Format as dialogue: [SPEAKER]: text SDH option: include [music], [laughter] descriptions

Foreign Film Translation

Transcribe in original language Translate: --task translate for English Or use external translation + timing sync

External Endpoints

Default: 100% LOCAL processing. No network calls. EndpointData SentWhen UsedWhisper (local)None (local)Default โ€” alwaysapi.assemblyai.comAudio fileOnly if user sets ASSEMBLYAI_API_KEYapi.deepgram.comAudio fileOnly if user sets DEEPGRAM_API_KEY Cloud APIs are documented as alternatives but never used unless user explicitly provides API keys and requests cloud processing. By default, all processing stays on your machine.

Security & Privacy

Default workflow is 100% offline: Whisper runs locally on your machine Generated subtitle files stay local Burned-in videos stay local No network calls made Cloud APIs are OPTIONAL and OPT-IN: Only used if you set ASSEMBLYAI_API_KEY or DEEPGRAM_API_KEY Only triggered when you explicitly use cloud engine commands If you never set these keys, no audio ever leaves your machine This skill does NOT: Upload anything by default Require internet connection for basic use Store data externally

Related Skills

Install with clawhub install <slug> if user confirms: ffmpeg โ€” video/audio processing video โ€” general video tasks video-edit โ€” video editing audio โ€” audio processing

Feedback

If useful: clawhub star video-captions Stay updated: clawhub sync

Category context

Writing, remixing, publishing, visual generation, and marketing content production.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
5 Docs
  • SKILL.md Primary doc
  • engines.md Docs
  • formats.md Docs
  • platforms.md Docs
  • styling.md Docs