Tencent SkillHub · AI

CosyVoice3 macOS

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-...

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, scripts/download_models.py, scripts/install.sh, scripts/tts.py

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.0

Provenance

Publisher: lhuaizhong
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 17 sections Open source page

CosyVoice3 TTS

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon.

Overview

CosyVoice3 is an advanced TTS system based on large language models, supporting: 9 languages: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian 18+ Chinese dialects: Cantonese, Sichuan, Dongbei, Shanghai, etc. Zero-shot voice cloning: Clone any voice from 3-10 seconds of audio Cross-lingual synthesis: Speak Chinese with English voice or vice versa Fine-grained control: Emotions, speed, volume via text tags

Prerequisites

macOS with Apple Silicon (M1/M2/M3) Python 3.10 Conda installed ~5GB disk space for models

Installation

Run the installation script: cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts bash install.sh This will: Create conda environment cosyvoice Install PyTorch (CPU version for Apple Silicon) Install CosyVoice dependencies Download Fun-CosyVoice3-0.5B model (~2GB)

Quick Start - Basic TTS

重要：CosyVoice3 需要在参考文本中添加 <|endofprompt|> 标记！ cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo export PATH="$HOME/miniconda3/bin:$PATH" conda activate cosyvoice python -c " import sys sys.path.append('third_party/Matcha-TTS') from cosyvoice.cli.cosyvoice import AutoModel import torchaudio cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B') for i, j in enumerate(cosyvoice.inference_zero_shot( '你好，这是CosyVoice3语音合成测试。', '希望你以后能够做的比我还好呦。<|endofprompt|>', # 注意这个标记！ 'asset/zero_shot_prompt.wav' )): torchaudio.save('output.wav', j['tts_speech'], cosyvoice.sample_rate) print('Generated: output.wav') "

Using the TTS Script

Generate speech from text: cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts conda activate cosyvoice # Basic TTS with default voice python tts.py "你好，这是一个测试。" # With custom reference audio for voice cloning python tts.py "你好，这是克隆的声音。" --reference /path/to/reference.wav # Cross-lingual (English text with Chinese voice) python tts.py "Hello, this is cross-lingual synthesis." --reference asset/zero_shot_prompt.wav --lang en # With speed control python tts.py "这是一段快速的语音。" --speed 1.5 # Save to specific path python tts.py "你好。" --output ~/Desktop/greeting.wav

Available Assets

Reference audio files in cosyvoice3-repo/asset/: zero_shot_prompt.wav - Default Chinese female voice cross_lingual_prompt.wav - English prompt for cross-lingual

Voice Cloning

Clone a voice from 3-10 seconds of reference audio: from cosyvoice.cli.cosyvoice import AutoModel import torchaudio cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B') # Clone voice and generate for i, j in enumerate(cosyvoice.inference_zero_shot( '这是克隆后的声音在说话。', 'Reference text transcription', '/path/to/reference.wav' )): torchaudio.save('cloned.wav', j['tts_speech'], cosyvoice.sample_rate)

Fine-Grained Control

Control prosody with special tags: # Add laughter "他突然[laughter]笑了起来[laughter]。" # Add breathing "他说完这句话[breath]，深吸一口气。" # Strong emphasis "这是非常重要的。" # Combined "在面对挑战时，他展现了非凡的勇气与智慧[breath]。"

Dialect Support

Use instruct mode for dialects: cosyvoice = AutoModel(model_dir='pretrained_models/CosyVoice-300M-Instruct') for i, j in enumerate(cosyvoice.inference_instruct( '你好，这是测试语音。', '中文男', '用四川话说这句话<|endofprompt|>' )): torchaudio.save('sichuan.wav', j['tts_speech'], cosyvoice.sample_rate)

Model not found

If you get "model not found" errors, download models manually: cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo export PATH="$HOME/miniconda3/bin:$PATH" conda activate cosyvoice python -c " from modelscope import snapshot_download snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B') "

Memory issues

For long text, split into sentences: text = "很长的文本..." sentences = text.split('。') for sent in sentences: if sent.strip(): # Process each sentence

Audio format

Reference audio requirements: Format: WAV, MP3 Sample rate: 16kHz+ (automatically resampled) Duration: 3-10 seconds optimal Content: Clear speech, minimal background noise

Scripts

install.sh - Installation script for macOS tts.py - Main TTS script with CLI interface download_models.py - Download pretrained models

References

CosyVoice GitHub Fun-CosyVoice3 Demo

Model Files

Located in cosyvoice3-repo/pretrained_models/: Fun-CosyVoice3-0.5B/ - Main model (recommended) CosyVoice2-0.5B/ - Previous version CosyVoice-300M/ - Lighter model CosyVoice-300M-SFT/ - SFT version CosyVoice-300M-Instruct/ - Instruct version

Notes

First inference takes ~30 seconds (model warmup) Subsequent inferences are faster Apple Silicon uses CPU mode (no CUDA) RTF (real-time factor) ~0.3-0.5 on M-series chips Model files are cached locally after first download

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

3 Scripts1 Docs

SKILL.md Primary doc
scripts/download_models.py Scripts
scripts/install.sh Scripts
scripts/tts.py Scripts