# Send CosyVoice3 macOS to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "cosyvoice3-macos",
    "name": "CosyVoice3 macOS",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/lhuaizhong/cosyvoice3-macos",
    "canonicalUrl": "https://clawhub.ai/lhuaizhong/cosyvoice3-macos",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/cosyvoice3-macos",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=cosyvoice3-macos",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "scripts/download_models.py",
      "scripts/install.sh",
      "scripts/tts.py"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "cosyvoice3-macos",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T20:55:15.630Z",
      "expiresAt": "2026-05-07T20:55:15.630Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=cosyvoice3-macos",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=cosyvoice3-macos",
        "contentDisposition": "attachment; filename=\"cosyvoice3-macos-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "cosyvoice3-macos"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/cosyvoice3-macos"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/cosyvoice3-macos",
    "downloadUrl": "https://openagent3.xyz/downloads/cosyvoice3-macos",
    "agentUrl": "https://openagent3.xyz/skills/cosyvoice3-macos/agent",
    "manifestUrl": "https://openagent3.xyz/skills/cosyvoice3-macos/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/cosyvoice3-macos/agent.md"
  }
}
```
## Documentation

### CosyVoice3 TTS

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon.

### Overview

CosyVoice3 is an advanced TTS system based on large language models, supporting:

9 languages: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian
18+ Chinese dialects: Cantonese, Sichuan, Dongbei, Shanghai, etc.
Zero-shot voice cloning: Clone any voice from 3-10 seconds of audio
Cross-lingual synthesis: Speak Chinese with English voice or vice versa
Fine-grained control: Emotions, speed, volume via text tags

### Prerequisites

macOS with Apple Silicon (M1/M2/M3)
Python 3.10
Conda installed
~5GB disk space for models

### Installation

Run the installation script:

cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts
bash install.sh

This will:

Create conda environment cosyvoice
Install PyTorch (CPU version for Apple Silicon)
Install CosyVoice dependencies
Download Fun-CosyVoice3-0.5B model (~2GB)

### Quick Start - Basic TTS

重要：CosyVoice3 需要在参考文本中添加 <|endofprompt|> 标记！

cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo
export PATH="$HOME/miniconda3/bin:$PATH"
conda activate cosyvoice

python -c "
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import AutoModel
import torchaudio

cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B')
for i, j in enumerate(cosyvoice.inference_zero_shot(
    '你好，这是CosyVoice3语音合成测试。', 
    '希望你以后能够做的比我还好呦。<|endofprompt|>',  # 注意这个标记！
    'asset/zero_shot_prompt.wav'
)):
    torchaudio.save('output.wav', j['tts_speech'], cosyvoice.sample_rate)
print('Generated: output.wav')
"

### Using the TTS Script

Generate speech from text:

cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts
conda activate cosyvoice

# Basic TTS with default voice
python tts.py "你好，这是一个测试。"

# With custom reference audio for voice cloning
python tts.py "你好，这是克隆的声音。" --reference /path/to/reference.wav

# Cross-lingual (English text with Chinese voice)
python tts.py "Hello, this is cross-lingual synthesis." --reference asset/zero_shot_prompt.wav --lang en

# With speed control
python tts.py "这是一段快速的语音。" --speed 1.5

# Save to specific path
python tts.py "你好。" --output ~/Desktop/greeting.wav

### Available Assets

Reference audio files in cosyvoice3-repo/asset/:

zero_shot_prompt.wav - Default Chinese female voice
cross_lingual_prompt.wav - English prompt for cross-lingual

### Voice Cloning

Clone a voice from 3-10 seconds of reference audio:

from cosyvoice.cli.cosyvoice import AutoModel
import torchaudio

cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B')

# Clone voice and generate
for i, j in enumerate(cosyvoice.inference_zero_shot(
    '这是克隆后的声音在说话。',
    'Reference text transcription',
    '/path/to/reference.wav'
)):
    torchaudio.save('cloned.wav', j['tts_speech'], cosyvoice.sample_rate)

### Fine-Grained Control

Control prosody with special tags:

# Add laughter
"他突然[laughter]笑了起来[laughter]。"

# Add breathing
"他说完这句话[breath]，深吸一口气。"

# Strong emphasis
"这是<strong>非常重要</strong>的。"

# Combined
"在面对挑战时，他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>[breath]。"

### Dialect Support

Use instruct mode for dialects:

cosyvoice = AutoModel(model_dir='pretrained_models/CosyVoice-300M-Instruct')

for i, j in enumerate(cosyvoice.inference_instruct(
    '你好，这是测试语音。',
    '中文男',
    '用四川话说这句话<|endofprompt|>'
)):
    torchaudio.save('sichuan.wav', j['tts_speech'], cosyvoice.sample_rate)

### Model not found

If you get "model not found" errors, download models manually:

cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo
export PATH="$HOME/miniconda3/bin:$PATH"
conda activate cosyvoice

python -c "
from modelscope import snapshot_download
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
"

### Memory issues

For long text, split into sentences:

text = "很长的文本..."
sentences = text.split('。')
for sent in sentences:
    if sent.strip():
        # Process each sentence

### Audio format

Reference audio requirements:

Format: WAV, MP3
Sample rate: 16kHz+ (automatically resampled)
Duration: 3-10 seconds optimal
Content: Clear speech, minimal background noise

### Scripts

install.sh - Installation script for macOS
tts.py - Main TTS script with CLI interface
download_models.py - Download pretrained models

### References

CosyVoice GitHub
Fun-CosyVoice3 Demo

### Model Files

Located in cosyvoice3-repo/pretrained_models/:

Fun-CosyVoice3-0.5B/ - Main model (recommended)
CosyVoice2-0.5B/ - Previous version
CosyVoice-300M/ - Lighter model
CosyVoice-300M-SFT/ - SFT version
CosyVoice-300M-Instruct/ - Instruct version

### Notes

First inference takes ~30 seconds (model warmup)
Subsequent inferences are faster
Apple Silicon uses CPU mode (no CUDA)
RTF (real-time factor) ~0.3-0.5 on M-series chips
Model files are cached locally after first download
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: lhuaizhong
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-04-30T20:55:15.630Z
- Expires at: 2026-05-07T20:55:15.630Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/cosyvoice3-macos)
- [Send to Agent page](https://openagent3.xyz/skills/cosyvoice3-macos/agent)
- [JSON manifest](https://openagent3.xyz/skills/cosyvoice3-macos/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/cosyvoice3-macos/agent.md)
- [Download page](https://openagent3.xyz/downloads/cosyvoice3-macos)