Tencent SkillHub · AI

Truly Local Piper Multilang TTS (secure)

Local offline text-to-speech via Piper TTS. Self-contained setup, automatic language detection, per-call voice selection. Extensible to any language. Writes...

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Local offline text-to-speech via Piper TTS. Self-contained setup, automatic language detection, per-call voice selection. Extensible to any language. Writes...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: _meta.json, package.json, index.js, SKILL.md, README.md, piper-tts.sh

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.1.0

Provenance

Publisher: szafranski
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Description

Local (offline) text-to-speech via Piper. Purpose: generate audio files (OGG/Opus by default) from text, fully offline. No sending is performed by the skill — sending is handled by the agent after the file is ready.

Features

Fully offline (no API keys) Self-contained setup via setup() — installs Piper into an isolated venv, no system-wide changes Automatic language detection for 20+ languages with English as default Per-call voice selection via voice parameter On-demand voice download via downloadVoices() — no models bundled, choose what you need Voice removal via removeVoice() — clean up voices you no longer want Extensible: add any language by installing a Piper .onnx model Writes outputs into OpenClaw workspace

First-run flow — full agent procedure

Follow this sequence exactly when the user asks to use TTS for the first time in a setup context.

Step 1 — check status

const s = await status();

Step 2 — install Piper if needed

If s.stage is not-setup or no-piper: Tell the user: "To use local TTS I need to install piper-tts into the skill's venv (~30 seconds, one-time). OK to proceed?" Wait for confirmation, then call setup(). If setup returns a step containing "WARNING: espeak-ng not found", relay the warning and install instructions to the user. Call status() again after setup completes.

Step 3 — offer voice download if no models present

If s.stage is no-model (Piper installed but no .onnx files): 3a. Offer English defaults: Explain that two English voices are available as defaults (~65 MB each): en_US-ryan-medium — male, American en_US-amy-medium — female, American Ask which they want, or both: "Which English voice(s) should I download? Ryan (male), Amy (female), or both?" 3b. Ask about other languages: After the English choice, ask: "Do you need any other languages? For example German, French, Spanish, Polish, Italian, Portuguese, Russian… Just tell me and I'll check what's available." If the user names a language, look up the available models at https://github.com/rhasspy/piper/blob/master/VOICES.md and list the options. Download whatever the user picks using the same downloadVoices() call. 3c. Download everything at once: const result = await downloadVoices(['en_US-ryan-medium', 'en_US-amy-medium', /* + any others */]); // result.downloaded — succeeded // result.failed — [{stem, error}] if any failed Each voice requires internet access. Download takes ~1–2 min per voice on a typical connection. If any downloads fail: Check internet connectivity Verify the stem exists at https://github.com/rhasspy/piper/blob/master/VOICES.md Offer to retry

Step 4 — play samples so the user can choose

After downloading, generate a short audio sample for each downloaded voice and send it to the user. For each voice, use a greeting in the voice's language: English: "Hello, I'm [name]. How can I help you today?" German: "Hallo, ich heiße [Name]. Wie kann ich Ihnen helfen?" French: "Bonjour, je m'appelle [prénom]. Comment puis-je vous aider?" Spanish: "Hola, me llamo [nombre]. ¿Cómo puedo ayudarte?" Polish: "Cześć, mam na imię [imię]. Jak mogę Ci pomóc?" Italian: "Ciao, mi chiamo [nome]. Come posso aiutarti?" Portuguese: "Olá, meu nome é [nome]. Como posso ajudar?" Russian: "Привет, меня зовут [имя]. Чем могу помочь?" For other languages: use an equivalent native greeting. Replace [name] with the voice name (e.g. Ryan, Amy, Thorsten). const sample = await tts({ text: 'Hello, I\'m Ryan. How can I help you today?', voice: 'en_US-ryan-medium' }); // send sample.path to the user as a voice message Send all samples, then ask: "Which voice do you prefer? Or shall I download a different one?"

Step 5 — choose speech speed

After the user picks a voice, ask: "How fast should I speak? Normal is 100%. Some options: 125% (faster), 115% (slightly faster), 100% (normal), 80% (slower) — or tell me a percentage." Always present speed as a percentage to the user. Never mention lengthScale directly. lengthScale is the internal duration multiplier — lower = faster. To convert: lengthScale = 1 / (speed% / 100). Examples: 125% speed → lengthScale 0.8 115% speed → lengthScale 0.87 100% speed → lengthScale 1.0 (default) 80% speed → lengthScale 1.25 Generate a short sample at the chosen speed so the user can hear the difference: const sample = await tts({ text: 'This is how I sound at this speed.', voice: 'chosen-voice', lengthScale: 0.8 }); // send sample.path to the user Confirm with the user, then offer to save it permanently: "Should I save this as your default speed? It'll be used automatically every session." If the user agrees: await saveConfig({ lengthScale: 0.8 }); Once saved, tts() reads it from config.json in the skill directory automatically — no need to pass lengthScale on every call.

Step 6 — note the preferred voice and speed

Once confirmed, remember both voice and lengthScale for the session. Pass them to every subsequent tts() call unless the user asks to change them.

Before first use — always call status()

Always call status() before the first tts() call in a session to determine what is needed. stageMeaningWhat to doreadyFully installed, at least one voice model presentProceed with tts()not-setupPiper not installedAsk user for confirmation, then call setup()no-piperVenv exists but piper binary missingAsk user for confirmation, then call setup()no-modelPiper installed but no voice model downloadedFollow Steps 3–5 of first-run flow above IMPORTANT: Always ask the user for confirmation before calling setup(). It installs the piper-tts package from PyPI into a venv inside the skill directory.

Usage

Input: text, optional format ("ogg" or "wav"), optional voice (model stem), optional lengthScale (speech speed, default 1.0) Output: path to generated file (usually .ogg)

Controlling voice and language

To list installed voices, call listVoices() — returns stems of all installed .onnx models. Never assume a fixed list; it varies per user and installation. Auto-detection (no voice param): The script detects language from the text using character and script analysis: Non-Latin scripts: Cyrillic (Russian, Ukrainian, Bulgarian), Greek, Arabic, Persian, Chinese, Japanese, Korean, Georgian Latin-script languages: Vietnamese, Polish, Romanian, Turkish, Czech, Slovak, Hungarian, Portuguese, Spanish, Catalan, German, Finnish, Scandinavian (Swedish, Norwegian, Danish), French, Italian Fallback: English keywords → first English model → any installed model Auto-detection is best-effort. For reliable results with a specific language, always pass the voice parameter explicitly. Explicit override: set PIPER_VOICE_MODEL env var to a full .onnx path (overrides everything). When the user requests a specific voice or language: Call listVoices() to see what is installed Pass the matching stem as voice to tts(), e.g. voice: "en_US-amy-medium" If the requested voice is not installed, offer to download it with downloadVoices([stem]) To switch back to auto-detect, omit the voice parameter.

Downloading additional voices

The user may say things like "I don't like this voice, use a female one" or "Download a German voice". When this happens: Find the model at https://github.com/rhasspy/piper/blob/master/VOICES.md Confirm the stem (e.g. de_DE-thorsten-medium) and call downloadVoices([stem]) Generate a sample and send it to the user Confirm with listVoices() — the new voice is immediately usable

Removing voices

The user may say "remove that voice" or "I don't need the German voice anymore". When this happens: Call listVoices() to confirm which voices are installed Confirm with the user which voice to remove Call removeVoice(stem) — e.g. removeVoice('de_DE-thorsten-medium') Returns { removed, filesDeleted } on success If the removed voice was the user's preferred voice, ask them to pick a new one Never remove the last remaining voice without warning the user that TTS will stop working.

Changing speech speed

The user may say things like "speak faster", "too slow", or "speed it up". When this happens: Ask what speed they want in %, or suggest: 125% (faster), 115%, 100% (normal), 80% (slower) Convert their % to lengthScale: lengthScale = 1 / (speed% / 100) Generate a short sample: await tts({ text: '...', voice: 'current-voice', lengthScale: 0.8 }) Send the sample and confirm Offer to persist: "Save this as default?" — if yes, call saveConfig({ lengthScale: 0.8 }) Use the new lengthScale for all subsequent tts() calls in the session

Where files are written

OPENCLAW_WORKSPACE/tts/ if OPENCLAW_WORKSPACE env var is set otherwise: ~/.openclaw/workspace/tts/

Dependencies

python3 (3.8+) — required for setup() to create the venv ffmpeg — for WAV → OGG/Opus conversion espeak-ng — system library used by Piper internally; setup() checks for it and warns if missing. Install: sudo apt install espeak-ng (Debian/Ubuntu), sudo dnf install espeak-ng (Fedora), brew install espeak (macOS) At least one Piper .onnx + .onnx.json voice model pair in the skill directory

Platform support

Linux x86_64: fully supported macOS x86_64 / arm64: fully supported Linux ARM: may require building piper-tts from source Windows: not supported

Remove

rm -rf ~/.openclaw/skills/local-piper-tts-multilang-secure This removes everything: skill code, venv, and all voice models.

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

2 Docs2 Scripts2 Config

SKILL.md Primary doc
README.md Docs
index.js Scripts
piper-tts.sh Scripts
_meta.json Config
package.json Config