Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).
Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
DO NOT try to use Docker containers directly. DO NOT try to interact with coqui-xtts container - it is broken and restarting. DO NOT try to use APIs or servers. ONLY USE THE SCRIPT: scripts/clonev.sh The script handles everything automatically. Just call it with text, voice sample, and language.
Clones any voice from a short audio sample and generates new speech in that voice. Input: Text to speak Voice sample (WAV file, 6-30 seconds) Language code Output: OGG voice file (cloned voice speaking the text) Works with: Any voice! Yours, a celebrity, a character, etc.
$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language) That's it! Nothing else needed.
Text to speak (from user) Path to voice sample WAV file (from user) Language code (from user or default to en)
VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
The variable $VOICE_FILE now contains the path to the generated OGG file.
# Generate cloned voice VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en) # Send to Telegram (as voice message) message action=send channel=telegram asVoice=true filePath="$VOICE"
# Generate Czech voice VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs) # Send message action=send channel=telegram asVoice=true filePath="$VOICE"
#!/bin/bash # Generate voice VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en) # Verify file was created if [ -f "$VOICE" ]; then echo "Success! Voice file: $VOICE" ls -lh "$VOICE" else echo "Error: Voice file not created" fi
CodeLanguageExample UsageenEnglishscripts/clonev.sh "Hello" sample.wav encsCzechscripts/clonev.sh "Ahoj" sample.wav csdeGermanscripts/clonev.sh "Hallo" sample.wav defrFrenchscripts/clonev.sh "Bonjour" sample.wav fresSpanishscripts/clonev.sh "Hola" sample.wav es Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
Format: WAV file Length: 6-30 seconds (optimal: 10-15 seconds) Quality: Clear audio, no background noise Content: Any speech (the actual words don't matter) Good samples: ✅ Recording of someone speaking clearly ✅ No music or noise in background ✅ Consistent volume Bad samples: ❌ Music or songs ❌ Heavy background noise ❌ Very short (< 6 seconds) ❌ Very long (> 30 seconds)
First use downloads ~1.87GB model (one-time) Model is stored at: /mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/ Status: ✅ Already downloaded
Takes 20-40 seconds depending on text length This is normal - voice cloning is computationally intensive
Make sure you're in the skill directory or use full path: /home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en
Check the path to the WAV file Use absolute paths (starting with /) Ensure file exists: ls -la /path/to/sample.wav
The model should auto-download. If not: cd /mnt/c/TEMP/Docker-containers/coqui-tts docker run --rm --entrypoint "" \ -v $(pwd)/models-xtts:/root/.local/share/tts \ ghcr.io/coqui-ai/tts:latest \ python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"
Use clearer voice sample Ensure no background noise Try different sample (some voices clone better)
USER: "Clone my voice and say 'hello'" → Get: sample path, text="hello", language="en" → Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en) → Result: $VOICE contains path to OGG file → Send: message action=send channel=telegram asVoice=true filePath="$VOICE" USER: "Make me speak Czech" → Get: sample path, text="Ahoj", language="cs" → Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs) → Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
Generated files are saved to: /mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg The script returns this path, so you can use it directly.
ONLY use the script: scripts/clonev.sh NEVER try to use Docker containers directly NEVER try to interact with the coqui-xtts container Script handles everything automatically Returns path to OGG file ready to send Simple. Just use the script. Clone any voice. Speak any language. Just use the script.
Messaging, meetings, inboxes, CRM, and teammate communication surfaces.
Largest current source with strong distribution and engagement signals.