← All skills
Tencent SkillHub · AI

Zhipu AI TTS

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, SKILL.md, package.json, scripts/text_to_speech.sh

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 13 sections Open source page

Zhipu AI Text-to-Speech

Convert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console 2. Set it in your environment: export ZHIPU_API_KEY="your-key-here"

System Voices (Pre-built)

tongtong (彤彤) - Default voice, balanced tone chuichui (锤锤) - Male voice, deeper tone xiaochen (小陈) - Young professional voice jam - 动动动物圈 Jam voice kazi - 动动动物圈 Kazi voice douji - 动动动物圈 Douji voice luodo - 动动动物圈 Luodo voice

Basic Text-to-Speech

Convert text to speech with default settings (tongtong voice, normal speed, WAV format): bash scripts/text_to_speech.sh "你好,今天天气怎么样"

Advanced Options

Specify voice, speed, format, and output filename: bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav Parameters: text (required): Chinese text to convert (max 1024 characters) voice (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo speed (optional): Speech speed from 0.5 to 2.0 (default: 1.0) output_format (optional): wav (default), pcm output_file (optional): Output filename (default: output.{format})

Voice Selection Guide

Choose tongtong (default) for: General purpose narration Professional presentations Balanced tone requirements Choose chuichui for: Male voice needed Deeper, authoritative tone Documentary or formal content Choose xiaochen for: Young, energetic tone Modern, casual content Friendly assistant vibe Choose jam/kazi/douji/luodo for: Entertainment content Character voices Creative projects

Speed Control

Recommended speeds: 0.8-1.0: Clear, professional narration 1.0-1.2: Natural conversational pace (default: 1.0) 1.2-1.5: Energetic, upbeat delivery 1.5-2.0: Fast-paced summaries (may reduce clarity)

Output Formats

WAV (recommended): Standard audio format Widely compatible Better quality preservation PCM: Raw audio format Smaller file size Requires additional processing for playback

Examples

Create a professional greeting: bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav Generate an energetic announcement: bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav Create a calm narration: bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav

Character Limits

Maximum input: 1024 characters per request For longer texts, split into multiple segments Combine audio files post-generation

Audio Quality Tips

Best practices: Use punctuation for natural pauses (commas, periods) Break long sentences into shorter segments Use appropriate line breaks for paragraph pauses Test speed settings for your specific content Sample rate: Generated audio uses 24000 Hz sampling rate for optimal quality.

Troubleshooting

Text Length Issues: Split texts longer than 1024 characters Process segments separately Combine using audio editing tools Audio Quality Issues: Check text encoding (use UTF-8) Verify punctuation placement Adjust speed settings Try different voices File Playback Issues: Ensure format compatibility with your player WAV format works on most systems PCM may require conversion

API Notes

Responses are returned as audio files Watermarking enabled by default (can be disabled in account settings) No strict rate limiting documented Audio generation typically completes in 1-3 seconds

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs1 Scripts1 Config
  • SKILL.md Primary doc
  • README.md Docs
  • scripts/text_to_speech.sh Scripts
  • package.json Config