Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Professional text-to-speech skill with emotion detection, voice cloning, and audio processing capabilities powered by MiniMax Voice API and FFmpeg.
AreaFeaturesTTSSync (HTTP/WebSocket), async (long text), streamingSegment-basedMulti-voice, multi-emotion synthesis from segments.json, auto mergeVoiceCloning (10sβ5min), design (text prompt), managementAudioFormat conversion, merge, normalize, trim, remove silence (FFmpeg)
mmVoice_Maker/ βββ SKILL.md # This overview βββ mmvoice.py # CLI tool (recommended for Agents) βββ check_environment.py # Environment verification βββ requirements.txt βββ scripts/ # Entry: scripts/__init__.py β βββ utils.py # Config, data classes β βββ sync_tts.py # HTTP/WebSocket TTS β βββ async_tts.py # Long text TTS β βββ segment_tts.py # Segment-based TTS (multi-voice, multi-emotion) β βββ voice_clone.py # Voice cloning β βββ voice_design.py # Voice design β βββ voice_management.py # List/delete voices β βββ audio_processing.py # FFmpeg audio tools βββ reference/ # Load as needed βββ cli-guide.md # CLI usage guide βββ getting-started.md # Setup and quick test βββ tts-guide.md # Sync/async TTS workflows βββ voice-guide.md # Clone/design/manage βββ audio-guide.md # Audio processing βββ script-examples.md # Runnable code snippets βββ troubleshooting.md # Common issues βββ api_documentation.md # Complete API reference βββ voice_catalog.md # Voice selection guide
6-step workflow: [step1]. Verify environment [step2-preparation]β οΈNOTE: Before processing the text, you must read voice-catalog.md for voice selection. [step2]. Process text into script β <cwd>/audio/segments.json. Note: [Step2.4] is really important, you must check it twice before sending the script to the user. [step2.5]. β οΈ Generate preview for user confirmation (highly recommended for multi-voice content) [step3]. Present plan to user for confirmation [step4]. Validate segments.json [step5]. Generate and merge audio β intermediate files in <cwd>/audio/tmp/, final output in <cwd>/audio/output.mp3 [step6]. β οΈ CRITICAL: User confirms audio quality FIRST β THEN cleanup temp files (only after user is satisfied) <cwd> is Claude's current working directory (not the skill directory). Audio files are saved relative to where Claude is running commands.
python check_environment.py Checks: Python 3.8+ Required packages (requests, websockets) FFmpeg installation MINIMAX_VOICE_API_KEY environment variable If API key is not set, ask user for keys and set it: export MINIMAX_VOICE_API_KEY="your-api-key-here"
Before generating audio, validate the segments file: # Default: speech-2.8-hd (auto emotion matching) python mmvoice.py validate <cwd>/audio/segments.json # Specify model for context-specific validation python mmvoice.py validate <cwd>/audio/segments.json --model speech-2.6-hd # Validate voice_ids against available voices (slower, requires API call) python mmvoice.py validate <cwd>/audio/segments.json --validate-voices # Combined options (recommended) python mmvoice.py validate <cwd>/audio/segments.json --model speech-2.6-hd --validate-voices # Use `--verbose` to see segment details python mmvoice.py validate <cwd>/audio/segments.json --model speech-2.6-hd --validate-voices --verbose Emotion Validation checks: ModelEmotion Validationspeech-2.8-hd/turboEmotion can be empty (auto emotion matching)speech-2.6-hd/turboAll 9 emotions supportedOlder modelshappy, sad, angry, fearful, disgusted, surprised, calm (7 emotions) Voice ID validation: With --validate-voices: Calls API once to get all available voices Validates each voice_id against the list Shows errors for invalid voice_ids (blocks validation)
Generate audio for all segments and merge into final output. File placement (default behavior if user doesn't specify): <cwd>/ # Claude's current working directory βββ audio/ # Created automatically βββ tmp/ # Intermediate segment files β βββ segment_0000.mp3 β βββ segment_0001.mp3 β βββ ... βββ <custom_audio_name>.mp3 # Final merged audio, name can be customized Where <cwd> is Claude's current working directory (where commands are executed). If -o is not specified, output goes to <cwd>/audio/output.mp3 Intermediate files go to <cwd>/audio/tmp/ After user confirms the final audio, ask whether to delete <cwd>/audio/tmp/ Basic usage: # Default: speech-2.8-hd, output to <cwd>/audio/output.mp3 python mmvoice.py generate <cwd>/audio/segments.json # Specify output path python mmvoice.py generate <cwd>/audio/segments.json -o <cwd>/audio/<custom_audio_name>.mp3 # Specify model if needed python mmvoice.py generate <cwd>/audio/segments.json --model speech-2.6-hd Skip existing segments (for rate limit retries): # Only generate segments that don't exist yet - skips already-generated files python mmvoice.py generate <cwd>/audio/segments.json --skip-existing Error handling: If a segment fails, the script reports which segment and why Use --continue-on-error to generate remaining segments despite failures Use --skip-existing to skip already successfully generated segments (recommended for retries after rate limit) The script automatically uses fallback merging if FFmpeg filter_complex fails
β οΈ CRITICAL: Never delete temp files until user confirms! After generation completes, you MUST follow this exact sequence: Step 6.1: Report generation result to user β Audio saved to: <output_path> Generated: X/Y segments Intermediate files in: <cwd>/audio/tmp/ Step 6.2: Ask user to confirm audio quality Ask the user to listen to the audio and confirm: Is the audio quality satisfactory? Are all voices appropriate? Any adjustments needed? Step 6.3: Wait for user response Step 6.4: Only after user confirms, offer cleanup After confirming audio quality, temporary files can be deleted with: rm -rf <cwd>/audio/tmp/ NEVER execute rm -rf on temp files without explicit user confirmation! If user is NOT satisfied: Do NOT delete temp files Discuss what needs to be adjusted Re-generate affected segments if needed Ask for confirmation again
Use the following when the task involves voice creation, single-voice TTS (sync/async), or audio processing instead of the main segment-based workflow. Each subsection gives CLI commands, script paths, and the reference doc to open for details.
Purpose: Create custom voices from audio (clone) or from a text description (design); list system and custom voices. CLI (entry point: mmvoice.py): python mmvoice.py clone AUDIO_FILE --voice-id VOICE_ID # Clone from 10sβ5min audio python mmvoice.py design "DESCRIPTION" --voice-id ID # Design from text python mmvoice.py list-voices # List all voices Scripts: scripts/voice_clone.py (clone), scripts/voice_design.py (design), scripts/voice_management.py (list/manage). Documentation: reference/voice-guide.md β cloning (quick + high-quality + step-by-step), design workflow, management.
Purpose: Single-voice TTS: sync for short text (β€10k chars), async for long text (up to 1M chars); optional streaming. CLI: python mmvoice.py tts "TEXT" -o OUTPUT.mp3 [-v VOICE_ID] [--model MODEL] Scripts: scripts/sync_tts.py (HTTP/WebSocket sync), scripts/async_tts.py (async task + poll). Documentation: reference/tts-guide.md β sync TTS, async TTS, streaming, segment-based production.
Purpose: Merge files (with optional crossfade), convert format, normalize loudness, trim. CLI: python mmvoice.py merge FILE1 [FILE2 ...] -o OUTPUT [--crossfade MS] python mmvoice.py convert INPUT -o OUTPUT [--format FORMAT] Script: scripts/audio_processing.py (merge, convert, normalize, trim). Documentation: reference/audio-guide.md β format conversion, merging (filter_complex + concat demuxer fallback), normalization, trimming, optimization.
CLI: validate and generate as in Steps 4β5 above. Script: scripts/segment_tts.py. Documentation: reference/cli-guide.md, reference/api_documentation.md.
Open these when you need concrete usage, parameters, or troubleshooting. Paths are relative to the skill root. DocumentContent for the Agentreference/cli-guide.mdAll CLI commands (validate, generate, tts, clone, design, list-voices, merge, convert, check-env) with options and examples. Use for correct CLI invocation.reference/getting-started.mdEnvironment setup (venv, pip install, FFmpeg), MINIMAX_VOICE_API_KEY, basic synthesis test. Use for first-time setup or βenv not workingβ.reference/tts-guide.mdSync TTS (short text), async TTS (long text), streaming TTS, multi-segment production. Use for sync/async/streaming logic and parameters.reference/voice-guide.mdVoice cloning (quick, high-quality with prompt audio, step-by-step), voice design, voice management. Use for custom voice creation flows.reference/audio-guide.mdFormat conversion, merging (including crossfade and fallback), normalization, trimming, optimization. Use for merge/convert/normalize behavior and options.reference/script-examples.mdCopy-paste runnable examples for sync TTS, async TTS, segment-based TTS, audio processing, voice clone/design/management. Use for quick Python snippets.reference/troubleshooting.mdEnvironment (API key, FFmpeg), API errors, segment-based TTS, audio, voice. Use when an error message or unexpected behavior appears.reference/api_documentation.mdFull API reference: config, sync/async TTS, emotion parameter, segment-based TTS, voice clone/design/management, audio processing, common parameters, error handling. Use for exact function signatures and parameter details.reference/voice_catalog.mdSystem voices list (male/female/beta), selection guide, voice parameters, custom voices, voice IDs. Use to choose or look up voice_id.
Python: 3.8 or higher API Key: MINIMAX_VOICE_API_KEY environment variable must be set FFmpeg: Required for audio processing (merge, convert, normalize) Install: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu)
Text length: Sync TTS β€10,000 chars; async TTS β€1,000,000 chars Voice cloning: Audio must be 10sβ5min duration, β€20MB, formats: mp3/wav/m4a Voice expiration: Custom voices (cloned/designed) expire after 7 days if not used with TTS
Pause insertion: Use <#x#> in text where x = pause duration in seconds (0.01β99.99) Example: "Hello<#1.5#>world" creates 1.5s pause between words Supported emotions: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper speech-2.8: automatic matching; speech-2.6: all 9; older models: first 7
Run python check_environment.py to diagnose setup issues See troubleshooting.md for common problems and solutions Check getting-started.md for detailed setup instructions
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.