Tencent SkillHub · AI

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, references/VOICES.md, scripts/setup.sh, scripts/stella-tts.sh, sources/Package.swift, sources/Sources/StellaVoice/main.swift

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.1

Provenance

Publisher: TrondW
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Local Voice (FluidAudio TTS/STT)

Sub-second local voice AI for Apple Silicon Macs using FluidAudio's CoreML models.

Features

TTS: Kokoro model with 54 voices, ~0.6-0.8s latency STT: Parakeet TDT v3, ~0.2-0.3s latency, 25 languages 100% local: No cloud, no cost, works offline Neural Engine: Runs on Apple's ANE for efficiency

Requirements

macOS 14+ on Apple Silicon (M1/M2/M3/M4) Swift 5.9+ espeak-ng (for TTS phoneme fallback)

1. Install Dependencies

brew install espeak-ng

2. Build the Daemon

cd /path/to/skill/sources swift build -c release

3. Install Binary and Framework

mkdir -p ~/clawd/bin cp .build/release/StellaVoice ~/clawd/bin/ cp -R .build/arm64-apple-macosx/release/ESpeakNG.framework ~/clawd/bin/ install_name_tool -add_rpath @executable_path ~/clawd/bin/StellaVoice

4. Create LaunchAgent

cat > ~/Library/LaunchAgents/com.stella.tts.plist << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.stella.tts</string> <key>ProgramArguments</key> <array> <string>$HOME/clawd/bin/StellaVoice</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>StandardOutPath</key> <string>$HOME/.clawdbot/logs/stella-tts.log</string> <key>StandardErrorPath</key> <string>$HOME/.clawdbot/logs/stella-tts.err.log</string> </dict> </plist> EOF launchctl load ~/Library/LaunchAgents/com.stella.tts.plist

API Endpoints

The daemon listens on http://127.0.0.1:18790:

TTS - Text to Speech

# Simple text to WAV curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o output.wav # With speed control (0.5-2.0) curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o output.wav # JSON endpoint curl -X POST http://127.0.0.1:18790/synthesize/json \ -H "Content-Type: application/json" \ -d '{"text": "Hello", "speed": 1.0, "deEss": true}'

STT - Speech to Text

curl -X POST http://127.0.0.1:18790/transcribe \ --data-binary @audio.wav \ -H "Content-Type: audio/wav" # Returns: {"text": "transcribed text"}

Health Check

curl http://127.0.0.1:18790/health # Returns: ok

Voice Options

Default voice is af_sky. Change by modifying the source code. Top Kokoro voices (American female): af_heart (A grade) - warm, natural af_bella (A-) - expressive af_sky (C-) - clear, light All 54 voices: See references/VOICES.md

Speed Control

speed=0.8 → Calm, relaxed speed=1.0 → Natural pace speed=1.2 → Energetic, upbeat

Punctuation (automatic)

! → Excited tone ? → Rising intonation . → Neutral, falling ... → Pauses

SSML Tags

<phoneme ph="kəkˈɔɹO">Kokoro</phoneme> <sub alias="Doctor">Dr.</sub> <say-as interpret-as="date">2024-01-15</say-as>

Helper Script

See scripts/stella-tts.sh for a convenient wrapper: scripts/stella-tts.sh "Hello world" output.wav scripts/stella-tts.sh "Hello world" output.mp3 # Auto-converts

Integration Example

For voice assistants, update your voice proxy to use local endpoints: // STT const response = await fetch('http://127.0.0.1:18790/transcribe', { method: 'POST', headers: { 'Content-Type': 'audio/wav' }, body: audioData }); const { text } = await response.json(); // TTS const audio = await fetch('http://127.0.0.1:18790/synthesize', { method: 'POST', body: textToSpeak });

Troubleshooting

Library not loaded (ESpeakNG) Ensure ESpeakNG.framework is in the same directory as the binary Run install_name_tool -add_rpath @executable_path /path/to/binary Slow first request First request loads models (~8-10s) Subsequent requests are sub-second x86 vs ARM Must build and run on ARM64 native (not Rosetta) Check with uname -m (should show arm64)

Source Code

The daemon source is in sources/ directory. It's a Swift package using: FluidAudio (TTS + STT models) Hummingbird (HTTP server) Rebuild after modifying: cd sources && swift build -c release

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

2 Docs2 Scripts2 Files

SKILL.md Primary doc
references/VOICES.md Docs
scripts/setup.sh Scripts
scripts/stella-tts.sh Scripts
sources/Package.swift Files
sources/Sources/StellaVoice/main.swift Files

Install for OpenClaw

Requirements

Package facts

Validation

Install with your agent

Trust & source

Release facts

Provenance

Documentation

Local Voice (FluidAudio TTS/STT)

Features

Requirements

1. Install Dependencies

2. Build the Daemon

3. Install Binary and Framework

4. Create LaunchAgent

API Endpoints

TTS - Text to Speech

STT - Speech to Text

Health Check

Voice Options

Speed Control

Punctuation (automatic)

SSML Tags

Helper Script

Integration Example

Troubleshooting

Source Code

Package contents

Related skills

Tavily Web Search

Browser Research Agent

UX Research Systems