{
  "schemaVersion": "1.0",
  "item": {
    "slug": "her-voice",
    "name": "Her Voice",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/matusvojtek/her-voice",
    "canonicalUrl": "https://clawhub.ai/matusvojtek/her-voice",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/her-voice",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=her-voice",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "CHANGELOG.md",
      "SKILL.md",
      "assets/HerVoice.swift",
      "scripts/config.py",
      "scripts/daemon.py",
      "scripts/setup.py"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/her-voice"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/her-voice",
    "agentPageUrl": "https://openagent3.xyz/skills/her-voice/agent",
    "manifestUrl": "https://openagent3.xyz/skills/her-voice/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/her-voice/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Her Voice 🎙️",
        "body": "Give your agent a voice. Audio responses powered by Kokoro TTS — a compact, naturally expressive model running entirely on-device."
      },
      {
        "title": "✨ Features",
        "body": "Highly optimized response time thanks to on-the-fly audio streaming technology. 100% free, no API keys required. Inspired by Samantha and Sky.\n\n⚡ On-the-fly Streaming — Audio plays as it generates, very low latency\n👄 The Voice of an angel — Cutting-edge local text-to-speech model Kokoro TTS\n🧠 TTS Daemon — Keep the model warm in RAM for instant responses (can be disabled to save RAM)\n🖥️ Persist Mode — Drag & drop audio, paste text, use as a voice station\n🔧 Fully Configurable — Voice, speed, visualizer, notification sounds\n🍎 MLX + PyTorch — Native Metal acceleration on Apple Silicon, PyTorch fallback everywhere else\n🎨 Real-time Visualizer — Floating 60fps LED bars that react to speech (macOS only)"
      },
      {
        "title": "First-Run Setup",
        "body": "python3 SKILL_DIR/scripts/setup.py\n\nNote: SKILL_DIR is the root directory of this skill — the agent resolves it automatically when running commands.\n\nThe setup wizard will:\n\nDetect platform and select TTS engine (MLX on Apple Silicon, PyTorch elsewhere)\nFind or install the appropriate TTS backend (mlx-audio or kokoro)\nInstall espeak-ng (Homebrew on macOS, apt on Linux)\nPatch espeak loader if needed (macOS compatibility)\nCompile the native visualizer binary (macOS only)\nDownload the Kokoro model\nCreate config at ~/.her-voice/config.json\n\nCheck status anytime:\n\npython3 SKILL_DIR/scripts/setup.py status"
      },
      {
        "title": "Post-Setup: Names & Pronunciation",
        "body": "After setup, configure the agent and user names:\n\npython3 SKILL_DIR/scripts/config.py set agent_name \"Jackie\"\npython3 SKILL_DIR/scripts/config.py set user_name \"Matúš\"\npython3 SKILL_DIR/scripts/config.py set user_name_tts \"Mah-toosh\"\n\nTTS pronunciation tip: If the user's name is non-English, figure out a phonetic English spelling that Kokoro will pronounce correctly. Store it in user_name_tts and use that spelling whenever speaking the name aloud. The real name stays in user_name for display purposes."
      },
      {
        "title": "Speaking Text",
        "body": "# Basic usage\npython3 SKILL_DIR/scripts/speak.py \"Hello, world!\"\n\n# Skip visualizer for this call\npython3 SKILL_DIR/scripts/speak.py --no-viz \"Quick note\"\n\n# Save to file instead of playing\npython3 SKILL_DIR/scripts/speak.py --save /tmp/output.wav \"Save this\"\n\n# Override voice or speed\npython3 SKILL_DIR/scripts/speak.py --voice af_bella --speed 1.2 \"Faster!\"\n\n# Pipe text from stdin\necho \"Piped text\" | python3 SKILL_DIR/scripts/speak.py"
      },
      {
        "title": "Options",
        "body": "FlagDescription--no-vizSkip the visualizer for this call--persistKeep visualizer open after playback ends--save PATHSave audio to WAV file instead of playing--voice NAMEOverride the configured voice--speed NOverride the configured speed multiplier--mode MODEOverride visualizer mode (v2 or classic)"
      },
      {
        "title": "Agent Workflow",
        "body": "When the user wants voice responses:\n\nCheck voice mode — is voice enabled or did the user ask for it?\nPlay notification sound (instant feedback while TTS generates):\nafplay /System/Library/Sounds/Blow.aiff &\n\n\nSpeak the response:\npython3 SKILL_DIR/scripts/speak.py \"Response text here\"\n\n\nAlways provide text alongside voice — accessibility matters."
      },
      {
        "title": "Notification Sound",
        "body": "The notification sound plays instantly (~0.1s) while TTS generates (~0.3-3s). This gives the user immediate feedback that the agent is responding.\n\nConfigure in ~/.her-voice/config.json:\n\n{\n  \"notification_sound\": {\n    \"enabled\": true,\n    \"sound\": \"Blow\"\n  }\n}\n\nAvailable macOS sounds: Blow, Bottle, Frog, Funk, Glass, Hero, Morse, Ping, Pop, Purr, Sosumi, Submarine, Tink. Located in /System/Library/Sounds/."
      },
      {
        "title": "TTS Daemon",
        "body": "The daemon keeps the Kokoro model warm in RAM, eliminating ~1.1s of startup overhead per call.\n\nThe daemon auto-resolves the mlx-audio venv — no need to find the venv Python manually.\n\n# Start (persists in background)\nnohup python3 SKILL_DIR/scripts/daemon.py start > /tmp/her-voice-daemon.log 2>&1 & disown\n\n# Status\npython3 SKILL_DIR/scripts/daemon.py status\n\n# Stop\npython3 SKILL_DIR/scripts/daemon.py stop\n\n# Restart\npython3 SKILL_DIR/scripts/daemon.py restart\n\nspeak.py auto-detects the daemon: uses it if available, falls back to direct model loading.\n\nThe daemon is optional. Without it, speech still works — just ~1s slower per call as the model loads each time. Skip the daemon to save ~2.3GB RAM.\n\nNote: The daemon writes its PID file and socket after the model is fully loaded and ready to accept connections. They live in ~/.her-voice/ with restricted permissions (owner-only access). The daemon won't survive a reboot — start it again after restart if needed."
      },
      {
        "title": "Visualizer",
        "body": "A floating overlay with three animated LED bars that react to speech in real-time. 60fps, native macOS (Cocoa + AVFoundation). macOS only — on other platforms, audio plays without the visualizer."
      },
      {
        "title": "Modes",
        "body": "v2 (default) — Three-tier pure red, center raw amplitude, sides with lag\nclassic — Original smooth gradient look"
      },
      {
        "title": "Controls",
        "body": "KeyActionESCQuitSpacePause/Resume (file mode)← →Seek ±5s (file mode)⌘VPaste text to speak (persist mode)"
      },
      {
        "title": "Persist Mode",
        "body": "Keep the visualizer on screen between playbacks. Use as a standalone voice station:\n\n# Launch in persist mode (stays open, idle breathing animation)\n~/.her-voice/bin/her-voice-viz --persist\n\n# Stream mode + persist (stays open after speech ends)\npython3 SKILL_DIR/scripts/speak.py --persist \"Hello!\"\n\nIn persist mode:\n\nDrag & drop audio files (.wav, .mp3, .aiff, .m4a) onto the visualizer to play them\n⌘V pastes clipboard text → streams directly from TTS daemon with full visualizer animation\nIdle breathing — subtle center bar pulse when waiting for input"
      },
      {
        "title": "Standalone Usage",
        "body": "# Play a file with visualizer\n~/.her-voice/bin/her-voice-viz --audio /path/to/file.wav\n\n# Demo mode (simulated audio)\n~/.her-voice/bin/her-voice-viz --demo\n\n# Stream raw PCM\ncat audio.raw | ~/.her-voice/bin/her-voice-viz --stream --sample-rate 24000"
      },
      {
        "title": "Disable Visualizer",
        "body": "python3 SKILL_DIR/scripts/config.py set visualizer.enabled false"
      },
      {
        "title": "Configuration",
        "body": "Config file: ~/.her-voice/config.json\n\n# View all settings\npython3 SKILL_DIR/scripts/config.py status\n\n# Get a value\npython3 SKILL_DIR/scripts/config.py get voice\n\n# Set a value (dot notation for nested keys)\npython3 SKILL_DIR/scripts/config.py set speed 1.1\npython3 SKILL_DIR/scripts/config.py set visualizer.mode classic"
      },
      {
        "title": "Key Settings",
        "body": "KeyDefaultDescriptionagent_name\"\"Agent's name (e.g. \"Jackie\")user_name\"\"User's real nameuser_name_tts\"\"Phonetic spelling for TTS (e.g. \"Mah-toosh\" for Matúš)voiceaf_heartBase voice namevoice_blend{af_heart: 0.6, af_sky: 0.4}Voice blend weightsspeed1.05Speech speed multiplierlanguageenLanguage codetts_engineautoTTS engine: auto, mlx, or pytorchmodelmlx-community/Kokoro-82M-bf16Model identifier (MLX)visualizer.enabledtrueShow visualizer overlayvisualizer.modev2Animation mode (v2/classic)visualizer.remember_positiontrueSave window position between sessionsnotification_sound.enabledtruePlay sound before speakingnotification_sound.soundBlowmacOS system sound namedaemon.auto_starttrueAdvisory flag only — the daemon never self-starts. When true, the agent should start it on first voice use (saves ~1s/call, costs ~2.3GB RAM)daemon.socket_path~/.her-voice/tts.sockUnix socket path"
      },
      {
        "title": "Voice Blending",
        "body": "Mix multiple voices for a unique sound. Configure voice_blend in config:\n\n{\n  \"voice_blend\": {\"af_heart\": 0.6, \"af_sky\": 0.4}\n}\n\nThe blended voice is stored as a .safetensors file in the model's voices directory (e.g., af_heart_60_af_sky_40.safetensors). Create it by running TTS once — speak.py looks for the pre-blended file automatically."
      },
      {
        "title": "Error Handling",
        "body": "ErrorCauseFix\"mlx-audio not found\"Venv missing or brokenRun setup.py\"espeak-ng not found\"Phonemizer missingbrew install espeak-ngCompilation failedXcode tools missingxcode-select --install\"Model not found\"First run, no downloadRun setup.py or speak onceDaemon \"not running\"Crashed or rebootedStart daemon againNo sound outputmacOS audio permissionsCheck System Settings → Sound → OutputVisualizer not showingBinary not compiledRun setup.py\"kokoro not found\"PyTorch venv missingRun setup.pyPyTorch CUDA errorGPU driver mismatchpip install torch --force-reinstall in kokoro venv\"soundfile not found\"Missing dependencypip install soundfile in kokoro venv"
      },
      {
        "title": "Requirements",
        "body": "macOS + Apple Silicon recommended for best experience (MLX engine + visualizer + notification sounds)\nLinux/Intel Mac supported via PyTorch Kokoro engine (no visualizer)\nWindows is not supported\nXcode Command Line Tools for visualizer on macOS (xcode-select --install)\nespeak-ng for phonemization (brew install espeak-ng on macOS, apt install espeak-ng on Linux)\n~500MB disk (model + venv)\n~2.3GB RAM when daemon is running"
      },
      {
        "title": "Uninstall",
        "body": "Remove all Her Voice data (config, venvs, compiled binary, daemon state):\n\npython3 SKILL_DIR/scripts/daemon.py stop\nrm -rf ~/.her-voice"
      },
      {
        "title": "How It Works",
        "body": "Kokoro 82M — A compact neural TTS model with two backends: MLX (Apple's framework for native Metal GPU acceleration on Apple Silicon) and PyTorch (works everywhere). The engine is auto-detected based on platform, or can be forced via the tts_engine config option (auto, mlx, or pytorch)\nStreaming — Audio generates and plays simultaneously. First sound in ~0.3s (with daemon) vs ~3s batch\nVisualizer — Native macOS app (Swift/Cocoa) reads raw PCM from stdin, plays via AVAudioEngine with real-time amplitude metering\nDaemon — Unix socket server holding the model in RAM. Eliminates Python import + model load overhead on every call"
      }
    ],
    "body": "Her Voice 🎙️\n\nGive your agent a voice. Audio responses powered by Kokoro TTS — a compact, naturally expressive model running entirely on-device.\n\n✨ Features\n\nHighly optimized response time thanks to on-the-fly audio streaming technology. 100% free, no API keys required. Inspired by Samantha and Sky.\n\n⚡ On-the-fly Streaming — Audio plays as it generates, very low latency\n👄 The Voice of an angel — Cutting-edge local text-to-speech model Kokoro TTS\n🧠 TTS Daemon — Keep the model warm in RAM for instant responses (can be disabled to save RAM)\n🖥️ Persist Mode — Drag & drop audio, paste text, use as a voice station\n🔧 Fully Configurable — Voice, speed, visualizer, notification sounds\n🍎 MLX + PyTorch — Native Metal acceleration on Apple Silicon, PyTorch fallback everywhere else\n🎨 Real-time Visualizer — Floating 60fps LED bars that react to speech (macOS only)\nFirst-Run Setup\npython3 SKILL_DIR/scripts/setup.py\n\n\nNote: SKILL_DIR is the root directory of this skill — the agent resolves it automatically when running commands.\n\nThe setup wizard will:\n\nDetect platform and select TTS engine (MLX on Apple Silicon, PyTorch elsewhere)\nFind or install the appropriate TTS backend (mlx-audio or kokoro)\nInstall espeak-ng (Homebrew on macOS, apt on Linux)\nPatch espeak loader if needed (macOS compatibility)\nCompile the native visualizer binary (macOS only)\nDownload the Kokoro model\nCreate config at ~/.her-voice/config.json\n\nCheck status anytime:\n\npython3 SKILL_DIR/scripts/setup.py status\n\nPost-Setup: Names & Pronunciation\n\nAfter setup, configure the agent and user names:\n\npython3 SKILL_DIR/scripts/config.py set agent_name \"Jackie\"\npython3 SKILL_DIR/scripts/config.py set user_name \"Matúš\"\npython3 SKILL_DIR/scripts/config.py set user_name_tts \"Mah-toosh\"\n\n\nTTS pronunciation tip: If the user's name is non-English, figure out a phonetic English spelling that Kokoro will pronounce correctly. Store it in user_name_tts and use that spelling whenever speaking the name aloud. The real name stays in user_name for display purposes.\n\nSpeaking Text\n# Basic usage\npython3 SKILL_DIR/scripts/speak.py \"Hello, world!\"\n\n# Skip visualizer for this call\npython3 SKILL_DIR/scripts/speak.py --no-viz \"Quick note\"\n\n# Save to file instead of playing\npython3 SKILL_DIR/scripts/speak.py --save /tmp/output.wav \"Save this\"\n\n# Override voice or speed\npython3 SKILL_DIR/scripts/speak.py --voice af_bella --speed 1.2 \"Faster!\"\n\n# Pipe text from stdin\necho \"Piped text\" | python3 SKILL_DIR/scripts/speak.py\n\nOptions\nFlag\tDescription\n--no-viz\tSkip the visualizer for this call\n--persist\tKeep visualizer open after playback ends\n--save PATH\tSave audio to WAV file instead of playing\n--voice NAME\tOverride the configured voice\n--speed N\tOverride the configured speed multiplier\n--mode MODE\tOverride visualizer mode (v2 or classic)\nAgent Workflow\n\nWhen the user wants voice responses:\n\nCheck voice mode — is voice enabled or did the user ask for it?\nPlay notification sound (instant feedback while TTS generates):\nafplay /System/Library/Sounds/Blow.aiff &\n\nSpeak the response:\npython3 SKILL_DIR/scripts/speak.py \"Response text here\"\n\nAlways provide text alongside voice — accessibility matters.\nNotification Sound\n\nThe notification sound plays instantly (~0.1s) while TTS generates (~0.3-3s). This gives the user immediate feedback that the agent is responding.\n\nConfigure in ~/.her-voice/config.json:\n\n{\n  \"notification_sound\": {\n    \"enabled\": true,\n    \"sound\": \"Blow\"\n  }\n}\n\n\nAvailable macOS sounds: Blow, Bottle, Frog, Funk, Glass, Hero, Morse, Ping, Pop, Purr, Sosumi, Submarine, Tink. Located in /System/Library/Sounds/.\n\nTTS Daemon\n\nThe daemon keeps the Kokoro model warm in RAM, eliminating ~1.1s of startup overhead per call.\n\nThe daemon auto-resolves the mlx-audio venv — no need to find the venv Python manually.\n\n# Start (persists in background)\nnohup python3 SKILL_DIR/scripts/daemon.py start > /tmp/her-voice-daemon.log 2>&1 & disown\n\n# Status\npython3 SKILL_DIR/scripts/daemon.py status\n\n# Stop\npython3 SKILL_DIR/scripts/daemon.py stop\n\n# Restart\npython3 SKILL_DIR/scripts/daemon.py restart\n\n\nspeak.py auto-detects the daemon: uses it if available, falls back to direct model loading.\n\nThe daemon is optional. Without it, speech still works — just ~1s slower per call as the model loads each time. Skip the daemon to save ~2.3GB RAM.\n\nNote: The daemon writes its PID file and socket after the model is fully loaded and ready to accept connections. They live in ~/.her-voice/ with restricted permissions (owner-only access). The daemon won't survive a reboot — start it again after restart if needed.\n\nVisualizer\n\nA floating overlay with three animated LED bars that react to speech in real-time. 60fps, native macOS (Cocoa + AVFoundation). macOS only — on other platforms, audio plays without the visualizer.\n\nModes\nv2 (default) — Three-tier pure red, center raw amplitude, sides with lag\nclassic — Original smooth gradient look\nControls\nKey\tAction\nESC\tQuit\nSpace\tPause/Resume (file mode)\n← →\tSeek ±5s (file mode)\n⌘V\tPaste text to speak (persist mode)\nPersist Mode\n\nKeep the visualizer on screen between playbacks. Use as a standalone voice station:\n\n# Launch in persist mode (stays open, idle breathing animation)\n~/.her-voice/bin/her-voice-viz --persist\n\n# Stream mode + persist (stays open after speech ends)\npython3 SKILL_DIR/scripts/speak.py --persist \"Hello!\"\n\n\nIn persist mode:\n\nDrag & drop audio files (.wav, .mp3, .aiff, .m4a) onto the visualizer to play them\n⌘V pastes clipboard text → streams directly from TTS daemon with full visualizer animation\nIdle breathing — subtle center bar pulse when waiting for input\nStandalone Usage\n# Play a file with visualizer\n~/.her-voice/bin/her-voice-viz --audio /path/to/file.wav\n\n# Demo mode (simulated audio)\n~/.her-voice/bin/her-voice-viz --demo\n\n# Stream raw PCM\ncat audio.raw | ~/.her-voice/bin/her-voice-viz --stream --sample-rate 24000\n\nDisable Visualizer\npython3 SKILL_DIR/scripts/config.py set visualizer.enabled false\n\nConfiguration\n\nConfig file: ~/.her-voice/config.json\n\n# View all settings\npython3 SKILL_DIR/scripts/config.py status\n\n# Get a value\npython3 SKILL_DIR/scripts/config.py get voice\n\n# Set a value (dot notation for nested keys)\npython3 SKILL_DIR/scripts/config.py set speed 1.1\npython3 SKILL_DIR/scripts/config.py set visualizer.mode classic\n\nKey Settings\nKey\tDefault\tDescription\nagent_name\t\"\"\tAgent's name (e.g. \"Jackie\")\nuser_name\t\"\"\tUser's real name\nuser_name_tts\t\"\"\tPhonetic spelling for TTS (e.g. \"Mah-toosh\" for Matúš)\nvoice\taf_heart\tBase voice name\nvoice_blend\t{af_heart: 0.6, af_sky: 0.4}\tVoice blend weights\nspeed\t1.05\tSpeech speed multiplier\nlanguage\ten\tLanguage code\ntts_engine\tauto\tTTS engine: auto, mlx, or pytorch\nmodel\tmlx-community/Kokoro-82M-bf16\tModel identifier (MLX)\nvisualizer.enabled\ttrue\tShow visualizer overlay\nvisualizer.mode\tv2\tAnimation mode (v2/classic)\nvisualizer.remember_position\ttrue\tSave window position between sessions\nnotification_sound.enabled\ttrue\tPlay sound before speaking\nnotification_sound.sound\tBlow\tmacOS system sound name\ndaemon.auto_start\ttrue\tAdvisory flag only — the daemon never self-starts. When true, the agent should start it on first voice use (saves ~1s/call, costs ~2.3GB RAM)\ndaemon.socket_path\t~/.her-voice/tts.sock\tUnix socket path\nVoice Selection\nVoice Blending\n\nMix multiple voices for a unique sound. Configure voice_blend in config:\n\n{\n  \"voice_blend\": {\"af_heart\": 0.6, \"af_sky\": 0.4}\n}\n\n\nThe blended voice is stored as a .safetensors file in the model's voices directory (e.g., af_heart_60_af_sky_40.safetensors). Create it by running TTS once — speak.py looks for the pre-blended file automatically.\n\nError Handling\nError\tCause\tFix\n\"mlx-audio not found\"\tVenv missing or broken\tRun setup.py\n\"espeak-ng not found\"\tPhonemizer missing\tbrew install espeak-ng\nCompilation failed\tXcode tools missing\txcode-select --install\n\"Model not found\"\tFirst run, no download\tRun setup.py or speak once\nDaemon \"not running\"\tCrashed or rebooted\tStart daemon again\nNo sound output\tmacOS audio permissions\tCheck System Settings → Sound → Output\nVisualizer not showing\tBinary not compiled\tRun setup.py\n\"kokoro not found\"\tPyTorch venv missing\tRun setup.py\nPyTorch CUDA error\tGPU driver mismatch\tpip install torch --force-reinstall in kokoro venv\n\"soundfile not found\"\tMissing dependency\tpip install soundfile in kokoro venv\nRequirements\nmacOS + Apple Silicon recommended for best experience (MLX engine + visualizer + notification sounds)\nLinux/Intel Mac supported via PyTorch Kokoro engine (no visualizer)\nWindows is not supported\nXcode Command Line Tools for visualizer on macOS (xcode-select --install)\nespeak-ng for phonemization (brew install espeak-ng on macOS, apt install espeak-ng on Linux)\n~500MB disk (model + venv)\n~2.3GB RAM when daemon is running\nUninstall\n\nRemove all Her Voice data (config, venvs, compiled binary, daemon state):\n\npython3 SKILL_DIR/scripts/daemon.py stop\nrm -rf ~/.her-voice\n\nHow It Works\nKokoro 82M — A compact neural TTS model with two backends: MLX (Apple's framework for native Metal GPU acceleration on Apple Silicon) and PyTorch (works everywhere). The engine is auto-detected based on platform, or can be forced via the tts_engine config option (auto, mlx, or pytorch)\nStreaming — Audio generates and plays simultaneously. First sound in ~0.3s (with daemon) vs ~3s batch\nVisualizer — Native macOS app (Swift/Cocoa) reads raw PCM from stdin, plays via AVAudioEngine with real-time amplitude metering\nDaemon — Unix socket server holding the model in RAM. Eliminates Python import + model load overhead on every call"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/matusvojtek/her-voice",
    "publisherUrl": "https://clawhub.ai/matusvojtek/her-voice",
    "owner": "matusvojtek",
    "version": "1.0.2",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/her-voice",
    "downloadUrl": "https://openagent3.xyz/downloads/her-voice",
    "agentUrl": "https://openagent3.xyz/skills/her-voice/agent",
    "manifestUrl": "https://openagent3.xyz/skills/her-voice/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/her-voice/agent.md"
  }
}