Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Replace video audio with TTS voice while preserving original timing. Includes subtitle generation from video using Whisper. Uses ElevenLabs or Edge TTS, alig...
Replace video audio with TTS voice while preserving original timing. Includes subtitle generation from video using Whisper. Uses ElevenLabs or Edge TTS, alig...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Replace a video's original audio with TTS-generated voice while maintaining precise timing alignment. Also supports generating subtitles from video using Whisper.
If you don't have an SRT file, generate one from the video using the included script: # Generate subtitles from video (uses faster-whisper, free, local) generate_subtitles.py video.mp4 -o subtitles.srt -l zh Or manually with Python: # Using faster-whisper (recommended, local, free) pip install faster-whisper srt python3 << 'EOF' from faster_whisper import WhisperModel import srt from datetime import timedelta model = WhisperModel("base", device="cpu", compute_type="int8") segments, info = model.transcribe("input_video.mp4", language="zh") # Generate SRT def format_time(seconds): td = timedelta(seconds=seconds) return f"{td.seconds//3600:02d}:{(td.seconds%3600)//60:02d}:{td.seconds%60:02d},{td.microseconds//1000:03d}" srt_content = "" for i, seg in enumerate(segments, 1): start = format_time(seg.start) end = format_time(seg.end) srt_content += f"{i}\n{start} --> {end}\n{seg.text.strip()}\n\n" with open("subtitles.srt", "w", encoding="utf-8") as f: f.write(srt_content) EOF
Use the generated SRT to create a new video with TTS voice.
Dubbing videos with AI-generated voice Converting subtitle files to voice-over Creating multilingual video versions
ElevenLabs: Set ELEVENLABS_API_KEY environment variable Edge TTS (free, no key needed): Use --engine edge
ffmpeg sox (optional, for advanced processing)
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --voice "Liam"
video-audio-replace --video input.mp4 --srt subtitles.srt --output output.mp4 --engine edge --voice "zh-CN-YunxiNeural"
OptionDescriptionDefault--videoInput video fileRequired--srtSRT subtitle fileRequired--outputOutput video fileinput_tts.mp4--voiceVoice ID or nameLiam (ElevenLabs)--engineTTS engine: elevenlabs, edgeelevenlabs--speed-rangeSpeed adjustment range0.85-1.15
video-audio-replace --video 2028.mp4 --srt 2028.srt --output 2028_final.mp4 --voice "Liam"
video-audio-replace --video video.mp4 --srt subs.srt --output result.mp4 --engine edge --voice "zh-CN-YunxiNeural"
Extract original audio from video Split audio into segments based on subtitle timestamps Generate TTS audio for each subtitle segment Adjust TTS speed (within 0.85-1.15x) to match original segment duration Add silence padding to fill any remaining time gap Merge all segments preserving original timing gaps Replace video audio with aligned TTS audio
Liam - Energetic male (recommended) Sarah - Professional female Brian - Deep resonant male Run curl with your API key to list all voices
Chinese: zh-CN-XiaoxiaoNeural, zh-CN-YunxiNeural, zh-CN-YunyangNeural English: en-US-JennyNeural, en-US-GuyNeural Many more languages available
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.