Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
语音转文字 - 使用OpenAI Whisper将音频文件识别为文字
语音转文字 - 使用OpenAI Whisper将音频文件识别为文字
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
将音频/语音文件识别并转换为文字。
当用户发送语音/音频文件时,自动: 识别语音内容 转换为文字 发送文字到飞书
用户发送音频文件后,技能自动处理。
MP3, WAV, M4A, OGG, FLAC, WebM 任何音频格式(FFmpeg支持即可)
模型大小速度精度tiny~1GB最快基础base~1GB快一般small~2GB中等较好medium~5GB较慢很好large~10GB最慢最佳turbo~6GB快接近large
import whisper # 加载模型(首次使用会下载) model = whisper.load_model("base") # 可选: tiny/base/small/medium/large/turbo # 识别语音 result = model.transcribe("audio.mp3") # 输出文字 print(result["text"])
Python 3.8+ PyTorch openai-whisper ffmpeg
首次使用会下载模型(1-10GB) 大模型需要较多内存 中文识别效果很好
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.