← All skills
Tencent SkillHub Β· AI

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, references/VOICES.md, scripts/setup.sh, scripts/stella-tts.sh, sources/Package.swift, sources/Sources/StellaVoice/main.swift

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.1

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Local Voice (FluidAudio TTS/STT)

Sub-second local voice AI for Apple Silicon Macs using FluidAudio's CoreML models.

Features

TTS: Kokoro model with 54 voices, ~0.6-0.8s latency STT: Parakeet TDT v3, ~0.2-0.3s latency, 25 languages 100% local: No cloud, no cost, works offline Neural Engine: Runs on Apple's ANE for efficiency

Requirements

macOS 14+ on Apple Silicon (M1/M2/M3/M4) Swift 5.9+ espeak-ng (for TTS phoneme fallback)

1. Install Dependencies

brew install espeak-ng

2. Build the Daemon

cd /path/to/skill/sources swift build -c release

3. Install Binary and Framework

mkdir -p ~/clawd/bin cp .build/release/StellaVoice ~/clawd/bin/ cp -R .build/arm64-apple-macosx/release/ESpeakNG.framework ~/clawd/bin/ install_name_tool -add_rpath @executable_path ~/clawd/bin/StellaVoice

4. Create LaunchAgent

cat > ~/Library/LaunchAgents/com.stella.tts.plist << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.stella.tts</string> <key>ProgramArguments</key> <array> <string>$HOME/clawd/bin/StellaVoice</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>StandardOutPath</key> <string>$HOME/.clawdbot/logs/stella-tts.log</string> <key>StandardErrorPath</key> <string>$HOME/.clawdbot/logs/stella-tts.err.log</string> </dict> </plist> EOF launchctl load ~/Library/LaunchAgents/com.stella.tts.plist

API Endpoints

The daemon listens on http://127.0.0.1:18790:

TTS - Text to Speech

# Simple text to WAV curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o output.wav # With speed control (0.5-2.0) curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o output.wav # JSON endpoint curl -X POST http://127.0.0.1:18790/synthesize/json \ -H "Content-Type: application/json" \ -d '{"text": "Hello", "speed": 1.0, "deEss": true}'

STT - Speech to Text

curl -X POST http://127.0.0.1:18790/transcribe \ --data-binary @audio.wav \ -H "Content-Type: audio/wav" # Returns: {"text": "transcribed text"}

Health Check

curl http://127.0.0.1:18790/health # Returns: ok

Voice Options

Default voice is af_sky. Change by modifying the source code. Top Kokoro voices (American female): af_heart (A grade) - warm, natural af_bella (A-) - expressive af_sky (C-) - clear, light All 54 voices: See references/VOICES.md

Speed Control

speed=0.8 β†’ Calm, relaxed speed=1.0 β†’ Natural pace speed=1.2 β†’ Energetic, upbeat

Punctuation (automatic)

! β†’ Excited tone ? β†’ Rising intonation . β†’ Neutral, falling ... β†’ Pauses

SSML Tags

<phoneme ph="kΙ™kΛˆΙ”ΙΉO">Kokoro</phoneme> <sub alias="Doctor">Dr.</sub> <say-as interpret-as="date">2024-01-15</say-as>

Helper Script

See scripts/stella-tts.sh for a convenient wrapper: scripts/stella-tts.sh "Hello world" output.wav scripts/stella-tts.sh "Hello world" output.mp3 # Auto-converts

Integration Example

For voice assistants, update your voice proxy to use local endpoints: // STT const response = await fetch('http://127.0.0.1:18790/transcribe', { method: 'POST', headers: { 'Content-Type': 'audio/wav' }, body: audioData }); const { text } = await response.json(); // TTS const audio = await fetch('http://127.0.0.1:18790/synthesize', { method: 'POST', body: textToSpeak });

Troubleshooting

Library not loaded (ESpeakNG) Ensure ESpeakNG.framework is in the same directory as the binary Run install_name_tool -add_rpath @executable_path /path/to/binary Slow first request First request loads models (~8-10s) Subsequent requests are sub-second x86 vs ARM Must build and run on ARM64 native (not Rosetta) Check with uname -m (should show arm64)

Source Code

The daemon source is in sources/ directory. It's a Swift package using: FluidAudio (TTS + STT models) Hummingbird (HTTP server) Rebuild after modifying: cd sources && swift build -c release

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs2 Scripts2 Files
  • SKILL.md Primary doc
  • references/VOICES.md Docs
  • scripts/setup.sh Scripts
  • scripts/stella-tts.sh Scripts
  • sources/Package.swift Files
  • sources/Sources/StellaVoice/main.swift Files