Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Complete voice solution — both TTS and STT using one API: TTS: Text-to-Speech (high-quality voices) STT: Speech-to-Text via Scribe (accurate transcription)
Set your API key: export ELEVENLABS_API_KEY="sk_..." Or create .env file in workspace root.
Convert text to natural-sounding speech: python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3 With custom voice: python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
python scripts/elevenlabs_speech.py voices
from scripts.elevenlabs_speech import ElevenLabsClient client = ElevenLabsClient(api_key="sk_...") # Basic TTS result = client.text_to_speech( text="Hello from zerox", output_path="greeting.mp3" ) # With custom settings result = client.text_to_speech( text="Your text here", voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel stability=0.5, similarity_boost=0.75, output_path="output.mp3" ) # Get available voices voices = client.get_voices() for voice in voices['voices']: print(f"{voice['name']}: {voice['voice_id']}")
Voice IDNameDescription21m00Tcm4TlvDq8ikWAMRachelNatural, versatile (default)AZnzlk1XvdvUeBnXmlldDomiStrong, energeticEXAVITQu4vr4xnSDxMaLBellaSoft, soothingErXwobaYiN019PkySvjVAntoniWell-roundedMF3mGyEYCl7XYWbV9V6OElliWarm, friendlyTxGEqnHWrfWFTfGW9XjXJoshDeep, calmVR6AewLTigWG4xSOukaGArnoldAuthoritative
stability (0-1): Lower = more emotional, Higher = more stable similarity_boost (0-1): Higher = closer to original voice Default: stability=0.5, similarity_boost=0.75
eleven_turbo_v2_5 - Fast, high quality (default) eleven_multilingual_v2 - Best for non-English eleven_monolingual_v1 - English only
When user sends text and wants voice reply: # Generate speech result = client.text_to_speech(text=user_text, output_path="reply.mp3") # Send via Telegram message tool with media path message(action="send", media="path/to/reply.mp3", as_voice=True)
Check https://elevenlabs.io/pricing for current rates. Free tier available!
Transcribe voice messages using ElevenLabs Scribe:
python scripts/elevenlabs_scribe.py voice_message.ogg With specific language: python scripts/elevenlabs_scribe.py voice_message.ogg --language ara With speaker diarization (multiple speakers): python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
from scripts.elevenlabs_scribe import ElevenLabsScribe client = ElevenLabsScribe(api_key="sk-...") # Basic transcription result = client.transcribe("voice_message.ogg") print(result['text']) # With language hint (improves accuracy) result = client.transcribe("voice_message.ogg", language_code="ara") # With speaker detection result = client.transcribe("voice_message.ogg", num_speakers=2)
mp3, mp4, mpeg, mpga, m4a, wav, webm Max file size: 100 MB Works great with Telegram voice messages (.ogg)
Scribe supports 99 languages including: Arabic (ara) English (eng) Spanish (spa) French (fra) And many more... Without language hint, it auto-detects.
User sends voice message → You reply with voice: from scripts.elevenlabs_scribe import ElevenLabsScribe from scripts.elevenlabs_speech import ElevenLabsClient # 1. Transcribe user's voice message stt = ElevenLabsScribe() transcription = stt.transcribe("user_voice.ogg") user_text = transcription['text'] # 2. Process/understand the text # ... your logic here ... # 3. Generate response text response_text = "Your response here" # 4. Convert to speech tts = ElevenLabsClient() tts.text_to_speech(response_text, output_path="reply.mp3") # 5. Send voice reply message(action="send", media="reply.mp3", as_voice=True)
Check https://elevenlabs.io/pricing for current rates: TTS (Text-to-Speech): Free tier: 10,000 characters/month Paid plans available STT (Speech-to-Text) - Scribe: Free tier available Check website for current pricing
Messaging, meetings, inboxes, CRM, and teammate communication surfaces.
Largest current source with strong distribution and engagement signals.