← All skills
Tencent SkillHub Β· AI

Qwen3 Tts Mlx

Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, references/dubbing_format.md, scripts/batch_dubbing.py, scripts/run_tts.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
2.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Qwen3-TTS MLX

Run Qwen3-TTS locally on Apple Silicon (M1/M2/M3/M4) using MLX. Supports 11 languages, 9 built-in voices, voice cloning, and voice design from text descriptions.

When to Use

Generate speech fully offline on a Mac Produce narration, audiobooks, podcasts, or video voiceovers Create multilingual TTS with controllable style and emotion Clone any voice from a short audio sample Design custom voices from text descriptions

Install

pip install mlx-audio brew install ffmpeg

Basic Usage

python scripts/run_tts.py custom-voice \ --text "Hello, welcome to local text to speech." \ --voice Ryan \ --output output.wav

With Style Control

python scripts/run_tts.py custom-voice \ --text "Breaking news: local AI model achieves human-level speech." \ --voice Uncle_Fu \ --instruct "news anchor tone, calm and authoritative" \ --output news.wav

Model Variants

VariantModelSizeMemoryUse CaseCustomVoicemlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit~1GB~4GBBuilt-in voices + style control (recommended)VoiceDesignmlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-5bit~2GB~5GBCreate voices from text descriptionsBasemlx-community/Qwen3-TTS-12Hz-0.6B-Base-4bit~1GB~4GBVoice cloning from reference audio

Supported Languages

LanguageCodeNotesAuto-detectautoDefault, detects from textChineseChineseMandarinEnglishEnglishJapaneseJapaneseKoreanKoreanFrenchFrenchGermanGermanSpanishSpanishPortuguesePortugueseItalianItalianRussianRussian

Built-in Voices

VoiceLanguageCharacterVivianChineseFemale, bright, youngSerenaChineseFemale, gentle, softUncle_FuChineseMale, authoritative, news anchorDylanChineseMale, Beijing dialectEricChineseMale, Sichuan dialectRyanEnglishMale, energeticAidenEnglishMale, clear, neutralOno_AnnaJapaneseFemaleSoheeKoreanFemale Voice Selection Guide: ScenarioRecommended VoiceChinese news/narrationUncle_FuChinese casual/livelyEricChinese female, professionalVivianChinese female, storytellingSerenaEnglish energetic contentRyanEnglish neutral/educationalAidenJapanese contentOno_AnnaKorean contentSohee

1) CustomVoice

Use built-in voices with optional emotion/style control via --instruct. python scripts/run_tts.py custom-voice \ --text "This is amazing news!" \ --voice Vivian \ --instruct "excited and happy" \ --output excited.wav Style instruction examples: "calm and warm" - Soft, friendly delivery "news anchor, authoritative" - Professional broadcast style "excited and energetic" - High energy, enthusiastic "sad and melancholic" - Emotional, somber tone "whispering, intimate" - Quiet, close-mic feel

2) VoiceDesign

Create a completely new voice by describing it in natural language. python scripts/run_tts.py voice-design \ --text "Welcome to our podcast." \ --instruct "warm, mature male narrator with low pitch and gentle tone" \ --output podcast_intro.wav Voice description examples: "young cheerful female with high pitch" "elderly wise male with deep resonant voice" "professional female news anchor, clear articulation" "friendly young male, casual and relaxed"

3) VoiceClone

Clone any voice from a reference audio sample (5-10 seconds recommended). python scripts/run_tts.py voice-clone \ --text "This is my cloned voice speaking new content." \ --ref_audio reference.wav \ --ref_text "The exact transcript of the reference audio" \ --output cloned.wav Tips for voice cloning: Use clean audio without background noise 5-10 seconds of speech works best Provide accurate transcript of the reference Reference and output language should match

CLI Parameters

ParameterRequiredDefaultDescription--textYes-Text to synthesize--voiceNoVivianBuilt-in voice (CustomVoice only)--lang_codeNoautoLanguage code--instructNo-Style control or voice description--speedNo1.0Speech speed multiplier--temperatureNo0.7Sampling temperature (higher = more variation)--modelNo(per mode)Override default model--outputNo-Output file path--out-dirNo./outputsOutput directory when --output not set--ref_audioVoiceClone-Reference audio file--ref_textVoiceClone-Reference audio transcript

Using generate_audio (recommended)

from mlx_audio.tts.generate import generate_audio # CustomVoice with style control generate_audio( text="Hello from Qwen3-TTS!", model="mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit", voice="Ryan", lang_code="english", instruct="friendly and warm", output_path=".", file_prefix="hello", audio_format="wav", join_audio=True, verbose=True, )

Using Model directly

from mlx_audio.tts.utils import load import soundfile as sf import numpy as np # Load model model = load("mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit") # Generate audio (returns a generator) audio_chunks = [] for chunk in model.generate_custom_voice( text="Hello from Qwen3-TTS.", speaker="Ryan", language="english", instruct="clear, steady delivery" ): if hasattr(chunk, 'audio') and chunk.audio is not None: audio_chunks.append(chunk.audio) # Combine and save audio = np.concatenate(audio_chunks) sf.write("output.wav", audio, 24000)

VoiceDesign

from mlx_audio.tts.generate import generate_audio generate_audio( text="Welcome to the show.", model="mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-5bit", instruct="warm, friendly female narrator with medium pitch", lang_code="english", output_path=".", file_prefix="voice_design", join_audio=True, )

VoiceClone

from mlx_audio.tts.generate import generate_audio generate_audio( text="New content in the cloned voice.", model="mlx-community/Qwen3-TTS-12Hz-0.6B-Base-4bit", ref_audio="reference.wav", ref_text="Transcript of the reference audio", output_path=".", file_prefix="cloned", join_audio=True, )

Batch Processing

Use scripts/batch_dubbing.py for processing multiple lines: python scripts/batch_dubbing.py \ --input dubbing.json \ --out-dir outputs See references/dubbing_format.md for the JSON format.

Performance

MetricValueSample rate24,000 HzReal-time factor~0.7x (faster than real-time)Peak memory~4-6 GBFirst runDownloads model (~1-2GB)

Troubleshooting

IssueSolutionSlow generationUse 4-bit CustomVoice modelUnnatural pausesAdd punctuation, keep sentences shortWrong language detectedSpecify --lang_code explicitlyVoice cloning qualityUse cleaner reference audio, accurate transcriptTokenizer warningsHarmless, can be ignoredOut of memoryClose other apps, use 4-bit model

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs2 Scripts
  • SKILL.md Primary doc
  • references/dubbing_format.md Docs
  • scripts/batch_dubbing.py Scripts
  • scripts/run_tts.py Scripts