← All skills
Tencent SkillHub Β· AI

qwenspeak

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, references/setup.md, scripts/qwenspeak.sh

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.5.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 10 sections Open source page

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models. For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars. scripts/qwenspeak.sh <command> [args] scripts/qwenspeak.sh <command> < input_file scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially β€” one at a time, the rest queue up. # Get the YAML template scripts/qwenspeak.sh "tts print-yaml" > job.yaml # Submit job scripts/qwenspeak.sh "tts" < job.yaml # {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7} # Check progress scripts/qwenspeak.sh "tts get-job 550e8400" # Follow job log scripts/qwenspeak.sh "tts get-job-log 550e8400 -f" # Download result scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation. steps: - mode: custom-voice model_size: 1.7b speaker: Ryan language: English generate: - text: "Hello world" output: hello.wav - text: "I cannot believe this!" speaker: Vivian instruct: "Speak angrily" output: angry.wav - mode: voice-design generate: - text: "Welcome to our store." instruct: "A warm, friendly young female voice with a cheerful tone" output: welcome.wav - mode: voice-clone model_size: 1.7b ref_audio: ref.wav ref_text: "Transcript of reference" generate: - text: "First line in cloned voice" output: clone1.wav - text: "Second line" output: clone2.wav

Modes

custom-voice β€” Pick from 9 preset speakers. 1.7B supports emotion/style via instruct. voice-design β€” Describe the voice in natural language via instruct. 1.7B only. voice-clone β€” Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps: scripts/qwenspeak.sh "create-dir refs" scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav steps: - mode: voice-clone ref_audio: refs/happy.wav ref_text: "transcript of happy ref" generate: - text: "Great news everyone!" output: happy1.wav - mode: voice-clone ref_audio: refs/angry.wav ref_text: "transcript of angry ref" generate: - text: "This is unacceptable" output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs" # list all scripts/qwenspeak.sh "tts list-jobs --json" # JSON output scripts/qwenspeak.sh "tts get-job <id>" # job details scripts/qwenspeak.sh "tts get-job-log <id>" # view log scripts/qwenspeak.sh "tts get-job-log <id> -f" # follow log scripts/qwenspeak.sh "tts cancel-job <id>" # cancel Statuses: queued β†’ running β†’ completed | failed | cancelled Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked. CommandDescriptionput <path>Upload file from stdinget <path>Download file to stdoutlist-files [--json]List directoryremove-file <path>Delete a filecreate-dir <path>Create directoryremove-dir <path>Remove empty directorymove-file <src> <dst>Move or renamecopy-file <src> <dst>Copy a filefile-exists <path>Check if file exists (true/false)search-files <glob>Glob search (** recursive)

Speakers

SpeakerGenderLanguageDescriptionVivianFemaleChineseBright, slightly edgy young voiceSerenaFemaleChineseWarm, gentle young voiceUncle_FuMaleChineseSeasoned, low mellow timbreDylanMaleChineseYouthful Beijing dialect, clear natural timbreEricMaleChineseLively Chengdu/Sichuan dialect, slightly huskyRyanMaleEnglishDynamic with strong rhythmic driveAidenMaleEnglishSunny American, clear midrangeOno_AnnaFemaleJapanesePlayful, light nimble timbreSoheeFemaleKoreanWarm with rich emotion

YAML Options

All settings cascade: global > step > generation. FieldDefaultDescriptiondtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)flash_attnautoFlashAttention-2: auto-detects, auto-switches float32β†’bfloat16temperature0.9Sampling temperaturetop_k50Top-k samplingtop_p1.0Top-p / nucleus samplingrepetition_penalty1.05Repetition penaltymax_new_tokens2048Max codec tokens to generateno_samplefalseGreedy decodingstreamingfalseStreaming mode (lower latency)moderequiredStep only: custom-voice, voice-design, or voice-clonemodel_size1.7bStep only: 1.7b or 0.6btextrequiredText to synthesizeoutputrequiredOutput file pathspeakerViviancustom-voice: speaker namelanguageAutoLanguage for synthesisinstruct-custom-voice: emotion/style; voice-design: voice descriptionref_audio-voice-clone: reference audio file pathref_text-voice-clone: transcript of reference audiox_vector_onlyfalsevoice-clone: use speaker embedding only

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs1 Scripts
  • SKILL.md Primary doc
  • references/setup.md Docs
  • scripts/qwenspeak.sh Scripts