← All skills
Tencent SkillHub · AI

Jetson CUDA Voice Pipeline

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
BUILD.md, SKILL.md, pipeline/led.py, pipeline/manage.sh, pipeline/setup.sh, pipeline/voice_pipeline.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 13 sections Open source page

Jetson CUDA Voice Pipeline

Fully offline, GPU-accelerated local voice assistant for NVIDIA Jetson devices. No cloud for STT or TTS — only the LLM call uses the internet (OpenRouter or any OpenAI-compatible endpoint).

Architecture

ReSpeaker mic (hw:Array,0, S24_3LE, 16kHz) ↓ arecord raw stream — never restarted mid-conversation openWakeWord — "Hey Jarvis" detection (~32ms chunks) ↓ wake word triggered → two-tone beep _measure_ambient() — 480ms median RMS → dynamic VAD thresholds ↓ transcribe_stream() — VAD + whisper.cpp CUDA HTTP (~2-4s per utterance) ↓ ask_llm() — OpenRouter or local OpenAI-compatible API (~1-2s) ↓ Piper TTS — offline neural TTS, hot-loaded at startup → aplay ↓ ReSpeaker LEDs: 🔵 blue=listening 🩵 cyan=thinking ⚫ off=done 🔴 red=error Total latency: ~5-8 seconds from wake word to first spoken word.

Key Features

Zero mic-restart gap — same arecord pipe feeds wake word detection and STT Dynamic ambient calibration — measures room noise floor on every wake word trigger (adapts to fans, AC, time of day) Conversation history — 20-turn rolling context for natural follow-ups Auto language detection — whisper -l auto, works multilingual ReSpeaker LED ring — visual state feedback (silent no-op if device not present) Fully configurable — all paths and thresholds via environment variables

Hardware Requirements

ComponentTestedNotesJetson Xavier NX✅ARM64, sm_72, 8GB, JetPack 5.1.4ReSpeaker USB Mic Array v1.0✅2886:0007, S24_3LE, 16kHzAny ALSA speaker✅tested with Creative MUVO 2cOther Jetson models✅change CMAKE_CUDA_ARCHITECTURES

Quick Start

# 1. Install Python deps pip install openwakeword piper-tts numpy requests pyusb # 2. Build whisper.cpp with CUDA (see BUILD.md — ~45 min, one-time) # Then place binary at ~/.local/bin/whisper-server-gpu # 3. Download Piper voice model mkdir -p ~/.local/share/piper/voices && cd ~/.local/share/piper/voices wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json # 4. Install and start services export OPENROUTER_API_KEY=your-key-here bash pipeline/setup.sh bash pipeline/manage.sh start # Say "Hey Jarvis" — blue LED = listening

Build whisper.cpp with CUDA

See BUILD.md for full instructions. Critical flag: cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=72 -DCMAKE_BUILD_TYPE=Release make -j4 # ~45 min — detach with nohup if needed ⚠️ CMAKE_CUDA_ARCHITECTURES=72 (sm_72 = Xavier NX) is critical. Default multi-arch compilation OOMs on 8GB Jetson. Architecture map: Xavier NX / AGX Xavier → 72 Orin → 87 TX2 → 62 Nano → 53

Piper Voice Models

mkdir -p ~/.local/share/piper/voices && cd "$_" # English (required) wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json # Greek (optional — any language from huggingface.co/rhasspy/piper-voices works) wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx.json

Service Install

setup.sh writes and enables the systemd user services automatically: bash pipeline/setup.sh [/path/to/voice_pipeline.py] [API_KEY] Or with env var: OPENROUTER_API_KEY=sk-... bash pipeline/setup.sh Re-run to update an existing install.

ReSpeaker Mic Gain & USB Autosuspend

# Optimal gain (no clipping, RMS ~180 ambient) amixer -c 0 set Mic 90 # Prevent USB autosuspend (mic sleeps after 2s idle without this) sudo tee /etc/udev/rules.d/99-usb-audio-nosuspend.rules << 'EOF' ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="0007", \ ATTR{power/control}="on", ATTR{power/autosuspend}="-1" EOF sudo udevadm control --reload-rules

Management

bash pipeline/manage.sh start # start both services bash pipeline/manage.sh stop # stop both services bash pipeline/manage.sh restart # restart both bash pipeline/manage.sh status # systemd status bash pipeline/manage.sh logs # tail live log bash pipeline/manage.sh test-mic # record 4s + play back bash pipeline/manage.sh test-stt # record 4s + transcribe bash pipeline/manage.sh test-tts # speak a test phrase

Environment Variables

VariableDefaultDescriptionOPENROUTER_API_KEY(required)API key for OpenRouter (or any OpenAI-compatible provider)VOICE_MIChw:Array,0ALSA mic device nameVOICE_SPEAKERhw:C2c,0ALSA speaker device nameVOICE_LLM_URLOpenRouterLLM API endpointVOICE_LLM_MODELanthropic/claude-3.5-haikuModel nameVOICE_WAKE_THRESHOLD0.5Wake word confidence (0.0–1.0)VOICE_SPEECH_RMS400Fallback speech RMS thresholdVOICE_SILENCE_RMS250Fallback silence RMS thresholdVOICE_UTC_OFFSET0Timezone offset hours for LLM contextPIPER_VOICES_DIR~/.local/share/piper/voicesPiper voice models directoryWHISPER_URLhttp://127.0.0.1:8181/inferencewhisper-server endpointWHISPER_BIN~/.local/bin/whisper-server-gpuwhisper-server binary (used by setup.sh)WHISPER_MODEL~/.local/share/whisper/models/ggml-base.binWhisper model (used by setup.sh)

Troubleshooting

Mic records silence Check gain: amixer -c 0 set Mic 90 Use card name not number (hw:Array,0 not hw:0,0) — numbers shift on reboot ReSpeaker requires S24_3LE format, not S16_LE Disable USB autosuspend (see setup above) Records full 6s timeout, never cuts off Room ambient noise > VOICE_SILENCE_RMS fallback. Dynamic calibration handles this automatically. If still an issue, set VOICE_SILENCE_RMS slightly above your measured ambient floor. [BEEPING] or (bell dings) in transcript Speaker beep being picked up by mic. The 0.3s drain buffer after beep handles this. Check speaker/mic distance and speaker volume. Whisper OOM during build Must use -DCMAKE_CUDA_ARCHITECTURES=72 — default multi-arch build exhausts 8GB RAM. Use -j4 not -j6. LED not lighting up Install pyusb: pip install pyusb Only supported on ReSpeaker USB Mic Array v1.0 (2886:0007) All LED errors are silent — pipeline continues without it. Wake word triggers constantly (false positives) Lower VOICE_WAKE_THRESHOLD to 0.7 or higher. Ensure no TV/radio playing phrases close to "Hey Jarvis".

File Structure

jetson-cuda-voice/ ├── SKILL.md ← this file ├── BUILD.md ← whisper.cpp CUDA build guide └── pipeline/ ├── voice_pipeline.py ← main pipeline ├── led.py ← ReSpeaker LED control (optional) ├── setup.sh ← one-command service installer └── manage.sh ← start/stop/status/test

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
4 Scripts2 Docs
  • SKILL.md Primary doc
  • BUILD.md Docs
  • pipeline/led.py Scripts
  • pipeline/manage.sh Scripts
  • pipeline/setup.sh Scripts
  • pipeline/voice_pipeline.py Scripts