Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.
π GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription π 100% Local: No data leaves your machine. Complete privacy. π° Free Forever: No API costs. Run unlimited transcriptions. π Multilingual: Supports 99 languages with automatic detection π Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON π― Multiple Models: From tiny (fast) to large-v3 (most accurate) π¬ Subtitle Generation: Create SRT files with word-level timestamps
NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM) Or CPU-only mode (slower but works on any machine)
Python 3.8+ NVIDIA drivers (for GPU support) CUDA Toolkit 11.8+ or 12.x
# Install dependencies pip install faster-whisper torch # Verify GPU is available python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Transcribe an audio file (auto-detects GPU) python transcribe.py audio.mp3 # Specify language explicitly python transcribe.py audio.mp3 --language pt # Output as SRT subtitles python transcribe.py audio.mp3 --format srt --output subtitles.srt # Use larger model for better accuracy python transcribe.py audio.mp3 --model large-v3
python transcribe.py <audio_file> [options] Options: --model {tiny,base,small,medium,large-v1,large-v2,large-v3} Model size to use (default: base) --language LANG Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified. --format {txt,srt,json,vtt} Output format (default: txt) --output FILE Output file path (default: stdout) --device {cuda,cpu} Device to use (default: cuda if available) --compute_type {int8,int8_float16,int16,float16,float32} Computation precision (default: float16) --task {transcribe,translate} Task: transcribe or translate to English (default: transcribe) --vad_filter Enable voice activity detection filter --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF VAD parameters as comma-separated values --condition_on_previous_text Condition on previous text (default: True) --initial_prompt PROMPT Initial prompt to guide transcription --word_timestamps Include word-level timestamps (for SRT/JSON) --hotwords WORDS Comma-separated hotwords to boost recognition
Portuguese Transcription with SRT Output python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt English Translation from Any Language python transcribe.py japanese_audio.mp3 --task translate --format txt High-Accuracy Mode with Large Model python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps CPU-Only Mode (no GPU) python transcribe.py audio.mp3 --device cpu --compute_type int8
from faster_whisper import WhisperModel # Load model model = WhisperModel("base", device="cuda", compute_type="float16") # Transcribe segments, info = model.transcribe("audio.mp3", language="pt") print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})") for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
ModelParametersVRAM RequiredRelative SpeedAccuracytiny39 M~1 GB~32xBasicbase74 M~1 GB~16xGoodsmall244 M~2 GB~6xBettermedium769 M~5 GB~2xGreatlarge-v31550 M~10 GB1xBest Benchmarks measured on NVIDIA RTX 4090
Faster Whisper supports 99 languages including: Portuguese (pt) English (en) Spanish (es) French (fr) German (de) Italian (it) Japanese (ja) Chinese (zh) Russian (ru) And 90+ more...
# Use smaller model python transcribe.py audio.mp3 --model tiny # Or use CPU python transcribe.py audio.mp3 --device cpu # Or reduce precision python transcribe.py audio.mp3 --compute_type int8
Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set: export HF_HOME=/path/to/custom/cache
Ensure GPU is being used: check nvidia-smi during transcription Use smaller model for faster results Enable VAD filter to skip silent parts
Contributions are welcome! Please: Fork the repository Create a feature branch Submit a pull request
MIT License - See LICENSE for details. Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.
OpenAI Whisper - Original model Faster Whisper - Optimized implementation CTranslate2 - Fast inference engine Made with β€οΈ for the OpenClaw community
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.