Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Transcribe audio with scripts/transcribe.sh: # Basic (auto-detect language, base model) scripts/transcribe.sh recording.mp3 # German, small model, SRT subtitles scripts/transcribe.sh --model small --language de --format srt lecture.wav # Batch process, all formats scripts/transcribe.sh --format all --output-dir ./transcripts/ *.mp3 # Word-level timestamps scripts/transcribe.sh --timestamps interview.m4a
ModelRAMSpeedAccuracyBest fortiny~1GB⚡⚡⚡★★Quick drafts, known languagebase~1GB⚡⚡★★★General use (default)small~2GB⚡★★★★Good accuracymedium~5GB🐢★★★★★High accuracylarge~10GB🐌★★★★★Best accuracy (slow on Pi)
txt — Plain text transcript srt — SubRip subtitles (for video) vtt — WebVTT subtitles json — Detailed JSON with timestamps and confidence all — Generate all formats at once
whisper CLI (pip install openai-whisper) ffmpeg (for audio decoding) First run downloads the model (~150MB for base)
Messaging, meetings, inboxes, CRM, and teammate communication surfaces.
Largest current source with strong distribution and engagement signals.