← All skills
Tencent SkillHub Β· AI

Faster Whisper Gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
EXAMPLES.md, README.md, SKILL.md, requirements.txt, transcribe.py

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
0.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 17 sections Open source page

πŸŽ™οΈ Faster Whisper GPU

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

✨ Features

πŸš€ GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription πŸ”’ 100% Local: No data leaves your machine. Complete privacy. πŸ’° Free Forever: No API costs. Run unlimited transcriptions. 🌍 Multilingual: Supports 99 languages with automatic detection πŸ“ Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON 🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate) 🎬 Subtitle Generation: Create SRT files with word-level timestamps

Hardware

NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM) Or CPU-only mode (slower but works on any machine)

Software

Python 3.8+ NVIDIA drivers (for GPU support) CUDA Toolkit 11.8+ or 12.x

Installation

# Install dependencies pip install faster-whisper torch # Verify GPU is available python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Basic Usage

# Transcribe an audio file (auto-detects GPU) python transcribe.py audio.mp3 # Specify language explicitly python transcribe.py audio.mp3 --language pt # Output as SRT subtitles python transcribe.py audio.mp3 --format srt --output subtitles.srt # Use larger model for better accuracy python transcribe.py audio.mp3 --model large-v3

Command Line Options

python transcribe.py <audio_file> [options] Options: --model {tiny,base,small,medium,large-v1,large-v2,large-v3} Model size to use (default: base) --language LANG Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified. --format {txt,srt,json,vtt} Output format (default: txt) --output FILE Output file path (default: stdout) --device {cuda,cpu} Device to use (default: cuda if available) --compute_type {int8,int8_float16,int16,float16,float32} Computation precision (default: float16) --task {transcribe,translate} Task: transcribe or translate to English (default: transcribe) --vad_filter Enable voice activity detection filter --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF VAD parameters as comma-separated values --condition_on_previous_text Condition on previous text (default: True) --initial_prompt PROMPT Initial prompt to guide transcription --word_timestamps Include word-level timestamps (for SRT/JSON) --hotwords WORDS Comma-separated hotwords to boost recognition

Examples

Portuguese Transcription with SRT Output python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt English Translation from Any Language python transcribe.py japanese_audio.mp3 --task translate --format txt High-Accuracy Mode with Large Model python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps CPU-Only Mode (no GPU) python transcribe.py audio.mp3 --device cpu --compute_type int8

🐍 Python API

from faster_whisper import WhisperModel # Load model model = WhisperModel("base", device="cuda", compute_type="float16") # Transcribe segments, info = model.transcribe("audio.mp3", language="pt") print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})") for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

πŸ“Š Model Sizes & VRAM Requirements

ModelParametersVRAM RequiredRelative SpeedAccuracytiny39 M~1 GB~32xBasicbase74 M~1 GB~16xGoodsmall244 M~2 GB~6xBettermedium769 M~5 GB~2xGreatlarge-v31550 M~10 GB1xBest Benchmarks measured on NVIDIA RTX 4090

πŸ” Supported Languages

Faster Whisper supports 99 languages including: Portuguese (pt) English (en) Spanish (es) French (fr) German (de) Italian (it) Japanese (ja) Chinese (zh) Russian (ru) And 90+ more...

CUDA Out of Memory

# Use smaller model python transcribe.py audio.mp3 --model tiny # Or use CPU python transcribe.py audio.mp3 --device cpu # Or reduce precision python transcribe.py audio.mp3 --compute_type int8

Model Download Issues

Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set: export HF_HOME=/path/to/custom/cache

Slow Transcription

Ensure GPU is being used: check nvidia-smi during transcription Use smaller model for faster results Enable VAD filter to skip silent parts

🀝 Contributing

Contributions are welcome! Please: Fork the repository Create a feature branch Submit a pull request

πŸ“œ License

MIT License - See LICENSE for details. Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.

πŸ™ Acknowledgments

OpenAI Whisper - Original model Faster Whisper - Optimized implementation CTranslate2 - Fast inference engine Made with ❀️ for the OpenClaw community

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
3 Docs1 Scripts1 Files
  • SKILL.md Primary doc
  • EXAMPLES.md Docs
  • README.md Docs
  • transcribe.py Scripts
  • requirements.txt Files