Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...
Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
YAML-driven text-to-speech over SSH using Qwen3-TTS models. For installation and deployment, see references/setup.md.
Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars. scripts/qwenspeak.sh <command> [args] scripts/qwenspeak.sh <command> < input_file scripts/qwenspeak.sh <command> > output_file
Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially β one at a time, the rest queue up. # Get the YAML template scripts/qwenspeak.sh "tts print-yaml" > job.yaml # Submit job scripts/qwenspeak.sh "tts" < job.yaml # {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7} # Check progress scripts/qwenspeak.sh "tts get-job 550e8400" # Follow job log scripts/qwenspeak.sh "tts get-job-log 550e8400 -f" # Download result scripts/qwenspeak.sh "get hello.wav" > hello.wav
Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation. steps: - mode: custom-voice model_size: 1.7b speaker: Ryan language: English generate: - text: "Hello world" output: hello.wav - text: "I cannot believe this!" speaker: Vivian instruct: "Speak angrily" output: angry.wav - mode: voice-design generate: - text: "Welcome to our store." instruct: "A warm, friendly young female voice with a cheerful tone" output: welcome.wav - mode: voice-clone model_size: 1.7b ref_audio: ref.wav ref_text: "Transcript of reference" generate: - text: "First line in cloned voice" output: clone1.wav - text: "Second line" output: clone2.wav
custom-voice β Pick from 9 preset speakers. 1.7B supports emotion/style via instruct. voice-design β Describe the voice in natural language via instruct. 1.7B only. voice-clone β Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.
Upload references with different emotions, use separate steps: scripts/qwenspeak.sh "create-dir refs" scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav steps: - mode: voice-clone ref_audio: refs/happy.wav ref_text: "transcript of happy ref" generate: - text: "Great news everyone!" output: happy1.wav - mode: voice-clone ref_audio: refs/angry.wav ref_text: "transcript of angry ref" generate: - text: "This is unacceptable" output: angry1.wav
scripts/qwenspeak.sh "tts list-jobs" # list all scripts/qwenspeak.sh "tts list-jobs --json" # JSON output scripts/qwenspeak.sh "tts get-job <id>" # job details scripts/qwenspeak.sh "tts get-job-log <id>" # view log scripts/qwenspeak.sh "tts get-job-log <id> -f" # follow log scripts/qwenspeak.sh "tts cancel-job <id>" # cancel Statuses: queued β running β completed | failed | cancelled Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).
All paths relative to the work directory. Traversal blocked. CommandDescriptionput <path>Upload file from stdinget <path>Download file to stdoutlist-files [--json]List directoryremove-file <path>Delete a filecreate-dir <path>Create directoryremove-dir <path>Remove empty directorymove-file <src> <dst>Move or renamecopy-file <src> <dst>Copy a filefile-exists <path>Check if file exists (true/false)search-files <glob>Glob search (** recursive)
SpeakerGenderLanguageDescriptionVivianFemaleChineseBright, slightly edgy young voiceSerenaFemaleChineseWarm, gentle young voiceUncle_FuMaleChineseSeasoned, low mellow timbreDylanMaleChineseYouthful Beijing dialect, clear natural timbreEricMaleChineseLively Chengdu/Sichuan dialect, slightly huskyRyanMaleEnglishDynamic with strong rhythmic driveAidenMaleEnglishSunny American, clear midrangeOno_AnnaFemaleJapanesePlayful, light nimble timbreSoheeFemaleKoreanWarm with rich emotion
All settings cascade: global > step > generation. FieldDefaultDescriptiondtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)flash_attnautoFlashAttention-2: auto-detects, auto-switches float32βbfloat16temperature0.9Sampling temperaturetop_k50Top-k samplingtop_p1.0Top-p / nucleus samplingrepetition_penalty1.05Repetition penaltymax_new_tokens2048Max codec tokens to generateno_samplefalseGreedy decodingstreamingfalseStreaming mode (lower latency)moderequiredStep only: custom-voice, voice-design, or voice-clonemodel_size1.7bStep only: 1.7b or 0.6btextrequiredText to synthesizeoutputrequiredOutput file pathspeakerViviancustom-voice: speaker namelanguageAutoLanguage for synthesisinstruct-custom-voice: emotion/style; voice-design: voice descriptionref_audio-voice-clone: reference audio file pathref_text-voice-clone: transcript of reference audiox_vector_onlyfalsevoice-clone: use speaker embedding only
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.