Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
This item is timing out or returning errors right now. Review the source page and try again later.
Use the source page and any available docs to guide the install because the item is currently unstable or timing out.
I tried to install a skill package from Yavira, but the item is currently unstable or timing out. Inspect the source page and any extracted docs, then tell me what you can confirm and any manual steps still required. Then review README.md for any prerequisites, environment setup, or post-install checks.
I tried to upgrade a skill package from Yavira, but the item is currently unstable or timing out. Compare the source page and any extracted docs with my current installation, then summarize what changed and what manual follow-up I still need. Then review README.md for any prerequisites, environment setup, or post-install checks.
Turn any YouTube video into a polished document + audio summary. Drop a YouTube link β get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
No subscriptions or API keys β works out of the box Local processing β transcription, speaker detection, and TTS run on your machine Network access β fetching from YouTube (captions, metadata, comments) requires internet No data uploaded β nothing is sent to external services; all processing stays on your machine Safe sub-agent β spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube
π Transcript with summary and key quotes β Export as DOCX, HTML, or Markdown π― Smart Speaker Detection β Automatically identifies participants π Audio Summaries β Listen to key points (MP3/WAV) π Clickable Timestamps β Every quote links directly to that moment in the video π¬ YouTube Comments β Viewer sentiment analysis and best comments π Queue Support β Send multiple links, they get processed in order π Non-Blocking Workflow β Conversation continues while video processes in background
Interviews & podcasts (multi-speaker detection) Lectures & tutorials (single speaker) Music videos (lyrics extraction) News & documentaries Any YouTube content with captions
When user sends a YouTube URL: Spawn sub-agent with the full pipeline task immediately Reply: "π¬ TubeScribe is processing β I'll let you know when it's ready!" Continue conversation (don't wait!) Sub-agent notification will announce completion with title and details DO NOT BLOCK β spawn and move on instantly.
Run setup to check dependencies and configure defaults: python skills/tubescribe/scripts/setup.py This checks: summarize CLI, pandoc, ffmpeg, Kokoro TTS
Spawn ONE sub-agent that does the entire pipeline: sessions_spawn( task=f""" ## TubeScribe: Process {youtube_url} β οΈ CRITICAL: Do NOT install any software. No pip, brew, curl, venv, or binary downloads. If a tool is missing, STOP and report what's needed. Run the COMPLETE pipeline β do not stop until all steps are done. ### Step 1: Extract ```bash python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}" Note the Source and Output paths printed by the script. Use those exact paths in subsequent steps.
Read the Source path from Step 1 output and note: metadata.title (for filename) metadata.video_id metadata.channel, upload_date, duration_string
Write to the Output path from Step 1: # **<title>** Video info block β Channel, Date, Duration, URL (clickable). Empty line between each field. ## **Participants** β table with bold headers: | **Name** | **Role** | **Description** | |----------|----------|-----------------| ## **Summary** β 3-5 paragraphs of prose ## **Key Quotes** β 5 best with clickable YouTube timestamps. Format each as: "Quote text here." - [12:34](https://www.youtube.com/watch?v=ID&t=754s) "Another quote." - [25:10](https://www.youtube.com/watch?v=ID&t=1510s) Use regular dash -, NOT em dash β. Do NOT use blockquotes >. Plain paragraphs only. ## **Viewer Sentiment** (if comments exist) ## **Best Comments** (if comments exist) β Top 5, NO lines between them: Comment text here. *- β² 123 @AuthorName* Next comment text here. *- β² 45 @AnotherAuthor* Attribution line: dash + italic. Just blank line between comments, NO --- separators. ## **Full Transcript** β merge segments, speaker labels, clickable timestamps
Clean the title for filename (remove special chars), then: pandoc <output_path> -o ~/Documents/TubeScribe/<safe_title>.docx
Write the summary text to a temp file, then use TubeScribe's built-in audio generation: # Write summary to temp file (use python3 to write, avoids shell escaping issues) python3 -c " text = '''YOUR SUMMARY TEXT HERE''' with open('<temp_dir>/tubescribe_<video_id>_summary.txt', 'w') as f: f.write(text) " # Generate audio (auto-detects engine, voice, format from config) python3 skills/tubescribe/scripts/tubescribe.py \ --generate-audio <temp_dir>/tubescribe_<video_id>_summary.txt \ --audio-output ~/Documents/TubeScribe/<safe_title>_summary This reads ~/.tubescribe/config.json and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.
python3 skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>
open ~/Documents/TubeScribe/
Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" ) **After spawning, reply immediately:** > π¬ TubeScribe is processing - I'll let you know when it's ready! Then continue the conversation. The sub-agent notification announces completion. ## Configuration Config file: `~/.tubescribe/config.json` ```json { "output": { "folder": "~/Documents/TubeScribe", "open_folder_after": true, "open_document_after": false, "open_audio_after": false }, "document": { "format": "docx", "engine": "pandoc" }, "audio": { "enabled": true, "format": "mp3", "tts_engine": "mlx" }, "mlx_audio": { "path": "~/.openclaw/tools/mlx-audio", "model": "mlx-community/Kokoro-82M-bf16", "voice": "af_heart", "lang_code": "a", "speed": 1.05 }, "kokoro": { "path": "~/.openclaw/tools/kokoro", "voice_blend": { "af_heart": 0.6, "af_sky": 0.4 }, "speed": 1.05 }, "processing": { "subagent_timeout": 600, "cleanup_temp_files": true } }
OptionDefaultDescriptionoutput.folder~/Documents/TubeScribeWhere to save filesoutput.open_folder_aftertrueOpen output folder when doneoutput.open_document_afterfalseAuto-open generated documentoutput.open_audio_afterfalseAuto-open generated audio summary
OptionDefaultValuesDescriptiondocument.formatdocxdocx, html, mdOutput formatdocument.enginepandocpandocConverter for DOCX (falls back to HTML)
OptionDefaultValuesDescriptionaudio.enabledtruetrue, falseGenerate audio summaryaudio.formatmp3mp3, wavAudio format (mp3 needs ffmpeg)audio.tts_enginemlxmlx, kokoro, builtinTTS engine (mlx = fastest on Apple Silicon)
OptionDefaultDescriptionmlx_audio.path~/.openclaw/tools/mlx-audiomlx-audio venv locationmlx_audio.modelmlx-community/Kokoro-82M-bf16MLX model to usemlx_audio.voiceaf_heartVoice preset (used if no voice_blend)mlx_audio.voice_blend{af_heart: 0.6, af_sky: 0.4}Custom voice mix (weighted blend)mlx_audio.lang_codeaLanguage code (a=US English)mlx_audio.speed1.05Playback speed (1.0 = normal, 1.05 = 5% faster)
OptionDefaultDescriptionkokoro.path~/.openclaw/tools/kokoroKokoro repo locationkokoro.voice_blend{af_heart: 0.6, af_sky: 0.4}Custom voice mixkokoro.speed1.05Playback speed (1.0 = normal, 1.05 = 5% faster)
OptionDefaultDescriptionprocessing.subagent_timeout600Seconds for sub-agent (increase for long videos)processing.cleanup_temp_filestrueRemove /tmp files after completion
OptionDefaultDescriptioncomments.max_count50Number of comments to fetchcomments.timeout90Timeout for comment fetching (seconds)
OptionDefaultDescriptionqueue.stale_minutes30Consider a processing job stale after this many minutes
~/Documents/TubeScribe/ βββ {Video Title}.html # Formatted document (or .docx / .md) βββ {Video Title}_summary.mp3 # Audio summary (or .wav) After generation, opens the folder (not individual files) so you can access everything.
Required: summarize CLI β brew install steipete/tap/summarize Python 3.8+ Optional (better quality): pandoc β DOCX output: brew install pandoc ffmpeg β MP3 audio: brew install ffmpeg yt-dlp β YouTube comments: brew install yt-dlp mlx-audio β Fastest TTS on Apple Silicon: pip install mlx-audio (uses MLX backend for Kokoro) Kokoro TTS β PyTorch fallback: see https://github.com/hexgrad/kokoro
TubeScribe checks these locations (in order): PriorityPathSource1which yt-dlpSystem PATH2/opt/homebrew/bin/yt-dlpHomebrew (Apple Silicon)3/usr/local/bin/yt-dlpHomebrew (Intel) / Linux4~/.local/bin/yt-dlppip install --user5~/.local/pipx/venvs/yt-dlp/bin/yt-dlppipx6~/.openclaw/tools/yt-dlp/yt-dlpTubeScribe auto-install If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.
When user sends multiple YouTube URLs while one is processing:
python skills/tubescribe/scripts/tubescribe.py --queue-status
# Add to queue instead of starting parallel processing python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL" # β Replies: "π Added to queue (position 2)"
# Check if more in queue python skills/tubescribe/scripts/tubescribe.py --queue-next # β Automatically pops and processes next URL
CommandDescription--queue-statusShow what's processing + queued items--queue-add URLAdd URL to queue--queue-nextProcess next item from queue--queue-clearClear entire queue
python skills/tubescribe/scripts/tubescribe.py url1 url2 url3 Processes all URLs sequentially with a summary at the end.
The script detects and reports these errors with clear messages: ErrorMessageInvalid URLβ Not a valid YouTube URLPrivate videoβ Video is private β can't accessVideo removedβ Video not found or removedNo captionsβ No captions available for this videoAge-restrictedβ Age-restricted video β can't access without loginRegion-blockedβ Video blocked in your regionLive streamβ Live streams not supported β wait until it endsNetwork errorβ Network error β check your connectionTimeoutβ Request timed out β try again later When an error occurs, report it to the user and don't proceed with that video.
For long videos (>30 min), increase sub-agent timeout to 900s Speaker detection works best with clear interview/podcast formats Single-speaker videos (tutorials, lectures) skip speaker labels automatically Timestamps link directly to YouTube at that moment Use batch mode for multiple videos: tubescribe url1 url2 url3
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.