Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Convert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.
1. Get your API Key: Get a key from Zhipu AI Console 2. Set it in your environment: export ZHIPU_API_KEY="your-key-here"
tongtong (彤彤) - Default voice, balanced tone chuichui (锤锤) - Male voice, deeper tone xiaochen (小陈) - Young professional voice jam - 动动动物圈 Jam voice kazi - 动动动物圈 Kazi voice douji - 动动动物圈 Douji voice luodo - 动动动物圈 Luodo voice
Convert text to speech with default settings (tongtong voice, normal speed, WAV format): bash scripts/text_to_speech.sh "你好,今天天气怎么样"
Specify voice, speed, format, and output filename: bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav Parameters: text (required): Chinese text to convert (max 1024 characters) voice (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo speed (optional): Speech speed from 0.5 to 2.0 (default: 1.0) output_format (optional): wav (default), pcm output_file (optional): Output filename (default: output.{format})
Choose tongtong (default) for: General purpose narration Professional presentations Balanced tone requirements Choose chuichui for: Male voice needed Deeper, authoritative tone Documentary or formal content Choose xiaochen for: Young, energetic tone Modern, casual content Friendly assistant vibe Choose jam/kazi/douji/luodo for: Entertainment content Character voices Creative projects
Recommended speeds: 0.8-1.0: Clear, professional narration 1.0-1.2: Natural conversational pace (default: 1.0) 1.2-1.5: Energetic, upbeat delivery 1.5-2.0: Fast-paced summaries (may reduce clarity)
WAV (recommended): Standard audio format Widely compatible Better quality preservation PCM: Raw audio format Smaller file size Requires additional processing for playback
Create a professional greeting: bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav Generate an energetic announcement: bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav Create a calm narration: bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav
Maximum input: 1024 characters per request For longer texts, split into multiple segments Combine audio files post-generation
Best practices: Use punctuation for natural pauses (commas, periods) Break long sentences into shorter segments Use appropriate line breaks for paragraph pauses Test speed settings for your specific content Sample rate: Generated audio uses 24000 Hz sampling rate for optimal quality.
Text Length Issues: Split texts longer than 1024 characters Process segments separately Combine using audio editing tools Audio Quality Issues: Check text encoding (use UTF-8) Verify punctuation placement Adjust speed settings Try different voices File Playback Issues: Ensure format compatibility with your player WAV format works on most systems PCM may require conversion
Responses are returned as audio files Watermarking enabled by default (can be disabled in account settings) No strict rate limiting documented Audio generation typically completes in 1-3 seconds
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.