Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
This skill follows the Agent Skills specification. Turn any script into a YouTube-ready voiceover β complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command. Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.
ScenarioWhy it fitsYouTube long-formFull narration with chapter markers and captionsYouTube ShortsQuick hooks with punchy deliveryCourse contentProfessional narration for educational videosScreen recordingsDub a screencast with clean AI voiceoverQuick iterationSmart caching β edit one section, only that segment re-rendersBatch productionSame voice, consistent quality across every video
Have a script and a video? Dub it in one shot: node voiceai-vo.cjs build \ --input my-script.md \ --voice oliver \ --title "My YouTube Video" \ --video ./my-recording.mp4 \ --mux \ --template youtube This renders the voiceover, stitches the master audio, and drops it onto your video β all in one command. Output: out/my-youtube-video/muxed.mp4 β your video dubbed with the AI voiceover out/my-youtube-video/master.wav β the standalone audio out/my-youtube-video/review.html β listen and review each segment out/my-youtube-video/chapters.txt β paste directly into your YouTube description out/my-youtube-video/captions.srt β upload to YouTube as subtitles out/my-youtube-video/description.txt β ready-made YouTube description with chapters Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.
Node.js 20+ β runtime (no npm install needed β the CLI is a single bundled file) VOICE_AI_API_KEY β set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard. ffmpeg (optional) β needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
Set VOICE_AI_API_KEY as an environment variable before running: export VOICE_AI_API_KEY=your-key-here The skill does not read .env files or access any files for credentials β only the environment variable. Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).
node voiceai-vo.cjs build \ --input <script.md or script.txt> \ --voice <voice-alias-or-uuid> \ --title "My YouTube Video" \ [--template youtube] \ [--video input.mp4 --mux --sync shortest] \ [--force] [--mock] What it does: Reads the script and splits it into segments (by ## headings for .md, or by sentence boundaries for .txt) Optionally prepends/appends YouTube intro/outro segments Renders each segment via Voice.ai TTS Stitches a master audio file (if ffmpeg is available) Generates YouTube chapters, SRT captions, a review page, and a ready-made description Optionally dubs your video with the voiceover Full options: OptionDescription-i, --input <path>Script file (.txt or .md) β required-v, --voice <id>Voice alias or UUID β required-t, --title <title>Video title (defaults to filename)--template youtubeAuto-inject YouTube intro/outro--mode <mode>headings or auto (default: headings for .md)--max-chars <n>Max characters per auto-chunk (default: 1500)--language <code>Language code (default: en)--video <path>Input video to dub--muxEnable video dubbing (requires --video)--sync <policy>shortest, pad, or trim (default: shortest)--forceRe-render all segments (ignore cache)--mockMock mode β no API calls, placeholder audio-o, --out <dir>Custom output directory
node voiceai-vo.cjs replace-audio \ --video ./my-video.mp4 \ --audio ./out/my-video/master.wav \ [--out ./out/my-video/dubbed.mp4] \ [--sync shortest|pad|trim] Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead. Sync policyBehaviorshortest (default)Output ends when the shorter track endspadPad audio with silence to match video durationtrimTrim audio to match video duration Video stream is copied without re-encoding (-c:v copy). Audio is encoded as AAC for YouTube compatibility. Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.
node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
Use short aliases or full UUIDs with --voice: AliasVoiceGenderBest for YouTubeellieEllieFVlogs, lifestyle, social contentoliverOliverMTutorials, narration, explainerslilithLilithFASMR, calm walkthroughssmoothSmooth Calm VoiceMDocumentaries, long-form essayscorpseCorpse HusbandMGaming, entertainmentskadiSkadiFAnime, character contentzhongliZhongliMGaming, dramatic introsfloraFloraFKids content, upbeat videoschiefMaster ChiefMGaming, action trailers The voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.
After a build, the output directory contains everything you need to publish on YouTube: out/<title-slug>/ segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, β¦) master.wav # Stitched voiceover (requires ffmpeg) master.mp3 # MP3 for upload (requires ffmpeg) muxed.mp4 # Dubbed video (if --video --mux used) chapters.txt # Paste into YouTube description captions.srt # Upload as YouTube subtitles description.txt # Ready-made YouTube description with chapters review.html # Interactive review page with audio players manifest.json # Build metadata: voice, template, segment list timeline.json # Segment durations and start times
Run the build command Upload muxed.mp4 (or your original video + master.mp3 as audio) Paste chapters.txt content into your YouTube description Upload captions.srt as subtitles in YouTube Studio Done β professional narration, chapters, and captions in minutes
Use --template youtube to auto-inject a branded intro and outro: SegmentSource fileIntro (prepended)templates/youtube_intro.txtOutro (appended)templates/youtube_outro.txt Edit the files in templates/ to customize your channel's branding.
Segments are cached by a hash of: text content + voice ID + language. Unchanged segments are skipped on rebuild β fast iteration Modified segments are re-rendered automatically Use --force to re-render everything Cache manifest is stored in segments/.cache.json
Voice.ai supports 11 languages β dub your YouTube videos for global audiences: en, es, fr, de, it, pt, pl, ru, nl, sv, ca node voiceai-vo.cjs build \ --input script-spanish.md \ --voice ellie \ --title "Mi Video" \ --language es \ --video ./my-video.mp4 \ --mux The pipeline auto-selects the multilingual TTS model for non-English languages.
IssueSolutionffmpeg missingPipeline still works β you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing.Rate limits (429)Segments render sequentially, which stays under most limits. Wait and retry.Insufficient credits (402)Top up at voice.ai/dashboard. Cached segments won't re-use credits on retry.Long scriptsCaching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls.Windows pathsWrap paths with spaces in quotes: --input "C:\My Scripts\script.md" See references/TROUBLESHOOTING.md for more.
Agent Skills Specification Voice.ai references/VOICEAI_API.md β API endpoints, audio formats, models references/TROUBLESHOOTING.md β Common issues and fixes
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.