Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper processing. Perfect for extracting TL;DRs, timestamps, and ac...
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper processing. Perfect for extracting TL;DRs, timestamps, and ac...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
A tool to download, transcribe, and analyze videos from any platform using a smart two-tier system (yt-dlp for fast subtitles, local whisper-cpp for robust fallback).
When the user asks you to summarize, transcribe, or download a video/audio from a URL, use the bundled python script: uv run {baseDir}/scripts/analyze_video.py --action <ACTION> --url "<URL>" [--quality <normal|max>] [--lang <en|it|etc>]
transcript: Extracts the text with timestamps. Use this when the user asks for a summary or transcript. download-video: Downloads the video as MP4 to the Desktop. download-audio: Downloads the audio as M4A/MP3 to the Desktop.
If the script needs to fall back to Whisper (e.g., for X/Twitter videos), it uses normal by default: normal: Fast (~1 min for 30 min video) β Default max: Best quality (~5 min for 30 min video) β use --quality max when accuracy is critical
All Whisper models are multilingual by default. The skill can transcribe videos in any language (Italian, Spanish, Japanese, etc.). IMPORTANT: Always respond to the user in THEIR language, not the video's language. If the user speaks Italian but sends an English video, give them the summary in Italian.
The transcript includes precise timestamps like [05:53] text.... If the user asks "When do they talk about X?", grep the transcript and return the exact timestamp from the file.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.