Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....
Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
High-performance local speech-to-text using faster-whisper.
Execute the setup script to create a virtual environment and install dependencies. It will automatically detect NVIDIA GPUs for CUDA acceleration. ./setup.sh Requirements: Python 3.10 or later ffmpeg (installed on the system)
Use the transcription script to process audio files.
./scripts/transcribe audio.mp3
Specific Model: ./scripts/transcribe audio.mp3 --model large-v3-turbo Word Timestamps: ./scripts/transcribe audio.mp3 --word-timestamps JSON Output: ./scripts/transcribe audio.mp3 --json VAD (Silence Removal): ./scripts/transcribe audio.mp3 --vad
distil-large-v3 (default): Best balance of speed and accuracy. large-v3-turbo: Recommended for multilingual or highest accuracy tasks. medium.en, small.en: Faster, English-only versions.
No GPU detected: Ensure NVIDIA drivers and CUDA are correctly installed. CPU transcription is significantly slower. OOM Error: Use a smaller model (e.g., small or base) or use --compute-type int8.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.