Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Provision a local STT backend used by voice skills.
Python venv for faster-whisper transcribe-server.py HTTP endpoint at http://127.0.0.1:18790/transcribe systemd user service: openclaw-transcribe.service
On first startup, faster-whisper downloads model weights from Hugging Face (~1.5 GB for medium). This requires internet access and disk space. After the initial download, models are cached locally and the service runs fully offline. ModelDownload sizeRAM usagetiny~75 MB~400 MBbase~150 MB~500 MBsmall~500 MB~800 MBmedium~1.5 GB~1.4 GBlarge-v3~3.0 GB~3.5 GB To pre-download models in an air-gapped environment, see faster-whisper docs.
Binds to 127.0.0.1 only โ not reachable from the network. CORS restricted to a single origin (https://127.0.0.1:8443 by default). No credentials, API keys, or secrets are used or stored.
Upload size limit: Requests exceeding the configured limit are rejected before processing (HTTP 413). Default: 50 MB, configurable via MAX_UPLOAD_MB. Magic-byte check: Only files with recognized audio signatures (WAV, OGG, FLAC, MP3, WebM, M4A) are accepted. Unrecognized formats are rejected (HTTP 415) before reaching GStreamer. Subprocess safety: All arguments to gst-launch-1.0 are passed as a list โ no shell expansion or injection is possible.
The service uses GStreamer's decodebin for audio format conversion. Like any media library, GStreamer's parsers process binary data and should be kept up to date. Mitigation: install gst-launch-1.0 from your OS vendor's trusted packages and apply security updates regularly. The magic-byte pre-filter above reduces the attack surface by rejecting non-audio payloads before they reach GStreamer.
No outbound network calls (after initial model download). No telemetry, analytics, or phone-home behavior. Temporary files are created in a per-request TemporaryDirectory and cleaned up immediately.
Pinned package: faster-whisper==1.1.1 (override via env) Explicit dependency check for gst-launch-1.0 CORS restricted to one origin by default Configurable workspace/service paths (no hardcoded user path)
bash scripts/deploy.sh With custom settings: WORKSPACE=~/.openclaw/workspace \ TRANSCRIBE_PORT=18790 \ WHISPER_MODEL_SIZE=medium \ WHISPER_LANGUAGE=auto \ TRANSCRIBE_ALLOWED_ORIGIN=https://10.0.0.42:8443 \ bash scripts/deploy.sh
Default: auto (auto-detect language). Set WHISPER_LANGUAGE=de for German-only, en for English-only, etc. Fixed language is faster and more accurate if you only use one language. Idempotent: safe to run repeatedly.
WhatPathActionPython venv$WORKSPACE/.venv-faster-whisper/Creates venv, installs faster-whisper via pipTranscribe server$WORKSPACE/voice-input/transcribe-server.pyWrites server scriptSystemd service~/.config/systemd/user/openclaw-transcribe.serviceCreates + enables persistent serviceModel cache~/.cache/huggingface/Downloads model weights on first run
systemctl --user stop openclaw-transcribe.service systemctl --user disable openclaw-transcribe.service rm -f ~/.config/systemd/user/openclaw-transcribe.service systemctl --user daemon-reload Optional full cleanup: rm -rf ~/.openclaw/workspace/.venv-faster-whisper rm -f ~/.openclaw/workspace/voice-input/transcribe-server.py
bash scripts/status.sh Expected: service active endpoint responds (HTTP 200/500 acceptable for invalid sample payload)
This skill provides backend transcription only. Pair with webchat-voice-proxy for browser mic + HTTPS/WSS integration. For one-step install, use webchat-voice-full-stack (deploys backend + proxy in order).
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.