{
  "schemaVersion": "1.0",
  "item": {
    "slug": "video-captions",
    "name": "Video Captions",
    "source": "tencent",
    "type": "skill",
    "category": "内容创作",
    "sourceUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "canonicalUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/video-captions",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=video-captions",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md",
      "engines.md",
      "formats.md",
      "platforms.md",
      "styling.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/video-captions"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/video-captions",
    "agentPageUrl": "https://openagent3.xyz/skills/video-captions/agent",
    "manifestUrl": "https://openagent3.xyz/skills/video-captions/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/video-captions/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "When to Use",
        "body": "User needs captions or subtitles for video content. Agent handles transcription, timing, formatting, styling, translation, and burn-in across all major formats and platforms."
      },
      {
        "title": "Quick Reference",
        "body": "TopicFileTranscription enginesengines.mdOutput formatsformats.mdStyling presetsstyling.mdPlatform requirementsplatforms.md"
      },
      {
        "title": "1. Engine Selection by Context",
        "body": "ScenarioEngineWhyDefault (recommended)Whisper local100% offline, no data leaves machineApple SiliconMLX WhisperNative acceleration, still localWord timestampswhisper-timestampedDTW alignment, still local\n\nDefault: Whisper local (turbo model). See engines.md for optional cloud alternatives."
      },
      {
        "title": "2. Format Selection by Platform",
        "body": "PlatformFormatNotesYouTubeVTT or SRTVTT preferredNetflix/ProTTMLStrict timing rulesSocial (TikTok, IG)Burn-in (ASS)Embedded in videoGeneralSRTUniversal compatibilityKaraoke/effectsASSAdvanced styling\n\nAsk user's target platform if not specified."
      },
      {
        "title": "3. Professional Timing Standards",
        "body": "Netflix-compliant (default):\n\nMin duration: 5/6 second (0.833s)\nMax duration: 7 seconds\nMax chars/line: 42\nMax lines: 2\nGap between subtitles: 2+ frames\n\nSocial media:\n\nShorter segments (2-4 words)\nMore frequent breaks\nCentered or dynamic positioning"
      },
      {
        "title": "4. Segmentation Rules",
        "body": "Break lines:\n\nAfter punctuation marks\nBefore conjunctions (and, but, or)\nBefore prepositions\n\nNever separate:\n\nArticle from noun\nAdjective from noun\nFirst name from last name\nVerb from subject pronoun\nAuxiliary from verb"
      },
      {
        "title": "5. Word-Level Timestamps",
        "body": "Use word timestamps for:\n\nKaraoke-style highlighting\nPrecise sync verification\nTikTok/Instagram animated captions\nQuality checking transcript accuracy\n\nEnable with --word-timestamps flag."
      },
      {
        "title": "6. Speaker Identification",
        "body": "For multi-speaker content:\n\nUse diarization (pyannote local, or cloud APIs if configured)\nFormat: [Speaker 1] or [Name] if known\nSDH format: JOHN: What do you think?"
      },
      {
        "title": "7. Quality Verification",
        "body": "Before delivering:\n\nCheck sync at start, middle, end\nVerify character limits per line\nConfirm speaker labels if multi-speaker\nTest burn-in render quality"
      },
      {
        "title": "Basic Transcription",
        "body": "# Auto-detect language, output SRT\nwhisper video.mp4 --model turbo --output_format srt\n\n# Specify language\nwhisper video.mp4 --model turbo --language es --output_format srt\n\n# Multiple formats\nwhisper video.mp4 --model turbo --output_format all"
      },
      {
        "title": "Word-Level Timestamps",
        "body": "# Using whisper-timestamped\nwhisper_timestamped video.mp4 --model large-v3 --output_format srt\n\n# With VAD pre-processing (reduces hallucinations)\nwhisper_timestamped video.mp4 --vad silero --accurate"
      },
      {
        "title": "Styled Subtitles (ASS)",
        "body": "# Generate SRT first, then convert with style\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'\" output.mp4"
      },
      {
        "title": "Burn-In for Social Media",
        "body": "# TikTok/Instagram style (centered, bold)\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'\" output.mp4\n\n# Netflix style (bottom, clean)\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'\" output.mp4"
      },
      {
        "title": "Translation",
        "body": "# Transcribe + translate to English\nwhisper video.mp4 --model turbo --task translate --output_format srt"
      },
      {
        "title": "Format Conversion",
        "body": "# SRT to VTT\nffmpeg -i video.srt video.vtt\n\n# SRT to ASS (for styling)\nffmpeg -i video.srt video.ass"
      },
      {
        "title": "Caption Traps",
        "body": "Hallucinations on silence → Use VAD pre-processing or trim silent sections\nWrong language detection → Specify --language explicitly for mixed content\nTiming drift in long videos → Use word timestamps + manual spot-check\nCharacter limit violations → Set --max_line_width 42 for Netflix compliance\nMissing speaker IDs → Enable diarization for multi-speaker content\nBurn-in quality loss → Use high bitrate output (-b:v 8M)"
      },
      {
        "title": "YouTube Video",
        "body": "Transcribe: whisper video.mp4 --output_format vtt\nUpload .vtt to YouTube Studio\nReview auto-sync suggestions"
      },
      {
        "title": "TikTok/Instagram Reel",
        "body": "Transcribe with word timestamps\nApply bold animated style\nBurn-in: ffmpeg -i video.mp4 -vf \"subtitles=video.ass\" -c:a copy output.mp4\nExport at platform resolution"
      },
      {
        "title": "Netflix/Professional",
        "body": "Use Whisper large-v3 for best local accuracy\nExport TTML format\nVerify: 42 chars/line, 2 lines max, timing gaps\nInclude translator credit as last subtitle"
      },
      {
        "title": "Podcast/Interview",
        "body": "Enable speaker diarization\nFormat as dialogue: [SPEAKER]: text\nSDH option: include [music], [laughter] descriptions"
      },
      {
        "title": "Foreign Film Translation",
        "body": "Transcribe in original language\nTranslate: --task translate for English\nOr use external translation + timing sync"
      },
      {
        "title": "External Endpoints",
        "body": "Default: 100% LOCAL processing. No network calls.\n\nEndpointData SentWhen UsedWhisper (local)None (local)Default — alwaysapi.assemblyai.comAudio fileOnly if user sets ASSEMBLYAI_API_KEYapi.deepgram.comAudio fileOnly if user sets DEEPGRAM_API_KEY\n\nCloud APIs are documented as alternatives but never used unless user explicitly provides API keys and requests cloud processing. By default, all processing stays on your machine."
      },
      {
        "title": "Security & Privacy",
        "body": "Default workflow is 100% offline:\n\nWhisper runs locally on your machine\nGenerated subtitle files stay local\nBurned-in videos stay local\nNo network calls made\n\nCloud APIs are OPTIONAL and OPT-IN:\n\nOnly used if you set ASSEMBLYAI_API_KEY or DEEPGRAM_API_KEY\nOnly triggered when you explicitly use cloud engine commands\nIf you never set these keys, no audio ever leaves your machine\n\nThis skill does NOT:\n\nUpload anything by default\nRequire internet connection for basic use\nStore data externally"
      },
      {
        "title": "Related Skills",
        "body": "Install with clawhub install <slug> if user confirms:\n\nffmpeg — video/audio processing\nvideo — general video tasks\nvideo-edit — video editing\naudio — audio processing"
      },
      {
        "title": "Feedback",
        "body": "If useful: clawhub star video-captions\nStay updated: clawhub sync"
      }
    ],
    "body": "When to Use\n\nUser needs captions or subtitles for video content. Agent handles transcription, timing, formatting, styling, translation, and burn-in across all major formats and platforms.\n\nQuick Reference\nTopic\tFile\nTranscription engines\tengines.md\nOutput formats\tformats.md\nStyling presets\tstyling.md\nPlatform requirements\tplatforms.md\nCore Rules\n1. Engine Selection by Context\nScenario\tEngine\tWhy\nDefault (recommended)\tWhisper local\t100% offline, no data leaves machine\nApple Silicon\tMLX Whisper\tNative acceleration, still local\nWord timestamps\twhisper-timestamped\tDTW alignment, still local\n\nDefault: Whisper local (turbo model). See engines.md for optional cloud alternatives.\n\n2. Format Selection by Platform\nPlatform\tFormat\tNotes\nYouTube\tVTT or SRT\tVTT preferred\nNetflix/Pro\tTTML\tStrict timing rules\nSocial (TikTok, IG)\tBurn-in (ASS)\tEmbedded in video\nGeneral\tSRT\tUniversal compatibility\nKaraoke/effects\tASS\tAdvanced styling\n\nAsk user's target platform if not specified.\n\n3. Professional Timing Standards\n\nNetflix-compliant (default):\n\nMin duration: 5/6 second (0.833s)\nMax duration: 7 seconds\nMax chars/line: 42\nMax lines: 2\nGap between subtitles: 2+ frames\n\nSocial media:\n\nShorter segments (2-4 words)\nMore frequent breaks\nCentered or dynamic positioning\n4. Segmentation Rules\n\nBreak lines:\n\nAfter punctuation marks\nBefore conjunctions (and, but, or)\nBefore prepositions\n\nNever separate:\n\nArticle from noun\nAdjective from noun\nFirst name from last name\nVerb from subject pronoun\nAuxiliary from verb\n5. Word-Level Timestamps\n\nUse word timestamps for:\n\nKaraoke-style highlighting\nPrecise sync verification\nTikTok/Instagram animated captions\nQuality checking transcript accuracy\n\nEnable with --word-timestamps flag.\n\n6. Speaker Identification\n\nFor multi-speaker content:\n\nUse diarization (pyannote local, or cloud APIs if configured)\nFormat: [Speaker 1] or [Name] if known\nSDH format: JOHN: What do you think?\n7. Quality Verification\n\nBefore delivering:\n\nCheck sync at start, middle, end\nVerify character limits per line\nConfirm speaker labels if multi-speaker\nTest burn-in render quality\nWorkflow\nBasic Transcription\n# Auto-detect language, output SRT\nwhisper video.mp4 --model turbo --output_format srt\n\n# Specify language\nwhisper video.mp4 --model turbo --language es --output_format srt\n\n# Multiple formats\nwhisper video.mp4 --model turbo --output_format all\n\nWord-Level Timestamps\n# Using whisper-timestamped\nwhisper_timestamped video.mp4 --model large-v3 --output_format srt\n\n# With VAD pre-processing (reduces hallucinations)\nwhisper_timestamped video.mp4 --vad silero --accurate\n\nStyled Subtitles (ASS)\n# Generate SRT first, then convert with style\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'\" output.mp4\n\nBurn-In for Social Media\n# TikTok/Instagram style (centered, bold)\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'\" output.mp4\n\n# Netflix style (bottom, clean)\nffmpeg -i video.mp4 -vf \"subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'\" output.mp4\n\nTranslation\n# Transcribe + translate to English\nwhisper video.mp4 --model turbo --task translate --output_format srt\n\nFormat Conversion\n# SRT to VTT\nffmpeg -i video.srt video.vtt\n\n# SRT to ASS (for styling)\nffmpeg -i video.srt video.ass\n\nCaption Traps\nHallucinations on silence → Use VAD pre-processing or trim silent sections\nWrong language detection → Specify --language explicitly for mixed content\nTiming drift in long videos → Use word timestamps + manual spot-check\nCharacter limit violations → Set --max_line_width 42 for Netflix compliance\nMissing speaker IDs → Enable diarization for multi-speaker content\nBurn-in quality loss → Use high bitrate output (-b:v 8M)\nCommon Scenarios\nYouTube Video\nTranscribe: whisper video.mp4 --output_format vtt\nUpload .vtt to YouTube Studio\nReview auto-sync suggestions\nTikTok/Instagram Reel\nTranscribe with word timestamps\nApply bold animated style\nBurn-in: ffmpeg -i video.mp4 -vf \"subtitles=video.ass\" -c:a copy output.mp4\nExport at platform resolution\nNetflix/Professional\nUse Whisper large-v3 for best local accuracy\nExport TTML format\nVerify: 42 chars/line, 2 lines max, timing gaps\nInclude translator credit as last subtitle\nPodcast/Interview\nEnable speaker diarization\nFormat as dialogue: [SPEAKER]: text\nSDH option: include [music], [laughter] descriptions\nForeign Film Translation\nTranscribe in original language\nTranslate: --task translate for English\nOr use external translation + timing sync\nExternal Endpoints\n\nDefault: 100% LOCAL processing. No network calls.\n\nEndpoint\tData Sent\tWhen Used\nWhisper (local)\tNone (local)\tDefault — always\napi.assemblyai.com\tAudio file\tOnly if user sets ASSEMBLYAI_API_KEY\napi.deepgram.com\tAudio file\tOnly if user sets DEEPGRAM_API_KEY\n\nCloud APIs are documented as alternatives but never used unless user explicitly provides API keys and requests cloud processing. By default, all processing stays on your machine.\n\nSecurity & Privacy\n\nDefault workflow is 100% offline:\n\nWhisper runs locally on your machine\nGenerated subtitle files stay local\nBurned-in videos stay local\nNo network calls made\n\nCloud APIs are OPTIONAL and OPT-IN:\n\nOnly used if you set ASSEMBLYAI_API_KEY or DEEPGRAM_API_KEY\nOnly triggered when you explicitly use cloud engine commands\nIf you never set these keys, no audio ever leaves your machine\n\nThis skill does NOT:\n\nUpload anything by default\nRequire internet connection for basic use\nStore data externally\nRelated Skills\n\nInstall with clawhub install <slug> if user confirms:\n\nffmpeg — video/audio processing\nvideo — general video tasks\nvideo-edit — video editing\naudio — audio processing\nFeedback\nIf useful: clawhub star video-captions\nStay updated: clawhub sync"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "publisherUrl": "https://clawhub.ai/ivangdavila/video-captions",
    "owner": "ivangdavila",
    "version": "1.0.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/video-captions",
    "downloadUrl": "https://openagent3.xyz/downloads/video-captions",
    "agentUrl": "https://openagent3.xyz/skills/video-captions/agent",
    "manifestUrl": "https://openagent3.xyz/skills/video-captions/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/video-captions/agent.md"
  }
}