{
  "schemaVersion": "1.0",
  "item": {
    "slug": "lb-pocket-tts-skill",
    "name": "Pocket TTS Complete Documentation",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/leonaaardob/lb-pocket-tts-skill",
    "canonicalUrl": "https://clawhub.ai/leonaaardob/lb-pocket-tts-skill",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/lb-pocket-tts-skill",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=lb-pocket-tts-skill",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "README.md",
      "SKILL.md",
      "docs/export_voice.md",
      "docs/generate.md",
      "docs/python-api.md",
      "docs/serve.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/lb-pocket-tts-skill"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/lb-pocket-tts-skill",
    "agentPageUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Pocket TTS",
        "body": "Lightweight CPU-friendly text-to-speech with voice cloning. No GPU required."
      },
      {
        "title": "When to Use",
        "body": "Generating speech from text on CPU without GPU\nVoice cloning from audio samples\nStreaming audio generation (low latency)\nLocal TTS without API dependencies\nReal-time speech synthesis (~6x faster than real-time)"
      },
      {
        "title": "Key Features",
        "body": "100M parameters - Small, efficient model\nCPU-optimized - No GPU needed, uses only 2 cores\n~6x real-time - Fast generation on modern CPUs\n~200ms latency - To first audio chunk (streaming)\nVoice cloning - From 3-10s audio samples\n24kHz mono WAV - High-quality output\nEnglish only - More languages planned"
      },
      {
        "title": "Installation",
        "body": "pip install pocket-tts\n# or\nuv add pocket-tts"
      },
      {
        "title": "Generate Speech",
        "body": "# Basic generation (default voice)\npocket-tts generate --text \"Hello world\"\n\n# Custom voice (local file, URL, or safetensors)\npocket-tts generate --voice ./my_voice.wav\npocket-tts generate --voice \"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"\npocket-tts generate --voice ./voice.safetensors\n\n# Quality tuning\npocket-tts generate --temperature 0.7 --lsd-decode-steps 3\n\nSee docs/generate.md for full CLI reference."
      },
      {
        "title": "Start Web Server",
        "body": "# Start FastAPI server with web UI\npocket-tts serve\n\n# Custom host/port\npocket-tts serve --host localhost --port 8080\n\nSee docs/serve.md for server options."
      },
      {
        "title": "Export Voice Embeddings",
        "body": "Convert audio files to .safetensors for faster loading:\n\n# Single file\npocket-tts export-voice voice.mp3 voice.safetensors\n\n# Batch conversion\npocket-tts export-voice voices/ embeddings/ --truncate\n\nSee docs/export_voice.md for export options."
      },
      {
        "title": "Basic Usage",
        "body": "from pocket_tts import TTSModel\nimport scipy.io.wavfile\n\n# Load model\nmodel = TTSModel.load_model()\n\n# Get voice state\nvoice = model.get_state_for_audio_prompt(\n    \"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"\n)\n\n# Generate audio\naudio = model.generate_audio(voice, \"Hello world!\")\n\n# Save\nscipy.io.wavfile.write(\"output.wav\", model.sample_rate, audio.numpy())"
      },
      {
        "title": "Load Model",
        "body": "model = TTSModel.load_model(\n    config=\"b6369a24\",       # Model variant\n    temp=0.7,                # Temperature (0.5-1.0)\n    lsd_decode_steps=1,      # Generation steps (1-5)\n    eos_threshold=-4.0       # End-of-sequence threshold\n)"
      },
      {
        "title": "Voice State",
        "body": "# From audio file/URL\nvoice = model.get_state_for_audio_prompt(\"./voice.wav\")\nvoice = model.get_state_for_audio_prompt(\"hf://kyutai/tts-voices/alba-mackenna/casual.wav\")\n\n# From safetensors (fast loading)\nvoice = model.get_state_for_audio_prompt(\"./voice.safetensors\")"
      },
      {
        "title": "Streaming Generation",
        "body": "# Stream audio chunks\nfor chunk in model.generate_audio_stream(voice, \"Long text...\"):\n    # Process/save/play each chunk as generated\n    print(f\"Chunk: {chunk.shape[0]} samples\")"
      },
      {
        "title": "Multi-Voice Management",
        "body": "# Preload multiple voices\nvoices = {\n    \"casual\": model.get_state_for_audio_prompt(\"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"),\n    \"announcer\": model.get_state_for_audio_prompt(\"./announcer.safetensors\"),\n}\n\n# Use different voices\naudio1 = model.generate_audio(voices[\"casual\"], \"Hey there!\")\naudio2 = model.generate_audio(voices[\"announcer\"], \"Breaking news!\")\n\nSee docs/python-api.md for complete API reference."
      },
      {
        "title": "Available Voices",
        "body": "Pre-made voices from hf://kyutai/tts-voices/:\n\nalba-mackenna/casual.wav (default, female)\njessica-jian/casual.wav (female)\nvoice-donations/Selfie.wav (male, marius)\nvoice-donations/Butter.wav (male, javert)\nears/p010/freeform_speech_01.wav (male, jean)\nvctk/p244_023.wav (female, fantine)\nvctk/p262_023.wav (female, eponine)\nvctk/p303_023.wav (female, azelma)\n\nOr clone any voice from your own audio samples."
      },
      {
        "title": "Voice Cloning Tips",
        "body": "Clean audio - Remove background noise (use Adobe Podcast Enhance)\nLength - 3-10 seconds of speech is ideal\nQuality - Input quality affects output quality\nFormat - WAV, MP3, or any common audio format supported"
      },
      {
        "title": "Performance Tips",
        "body": "CPU-only - GPU provides no speedup (model too small, batch size 1)\n2 cores - Uses only 2 CPU cores efficiently\nStreaming - Low latency (<200ms to first chunk)\nSafetensors - Pre-process voices to .safetensors for instant loading"
      },
      {
        "title": "Output Format",
        "body": "All commands output WAV files:\n\nSample rate: 24 kHz\nChannels: Mono\nBit depth: 16-bit PCM"
      },
      {
        "title": "Links",
        "body": "GitHub\nTech Report\nPaper (arXiv)\nHuggingFace Model\nVoice Repository\nLive Demo"
      }
    ],
    "body": "Pocket TTS\n\nLightweight CPU-friendly text-to-speech with voice cloning. No GPU required.\n\nWhen to Use\nGenerating speech from text on CPU without GPU\nVoice cloning from audio samples\nStreaming audio generation (low latency)\nLocal TTS without API dependencies\nReal-time speech synthesis (~6x faster than real-time)\nKey Features\n100M parameters - Small, efficient model\nCPU-optimized - No GPU needed, uses only 2 cores\n~6x real-time - Fast generation on modern CPUs\n~200ms latency - To first audio chunk (streaming)\nVoice cloning - From 3-10s audio samples\n24kHz mono WAV - High-quality output\nEnglish only - More languages planned\nInstallation\npip install pocket-tts\n# or\nuv add pocket-tts\n\nCLI Commands\nGenerate Speech\n# Basic generation (default voice)\npocket-tts generate --text \"Hello world\"\n\n# Custom voice (local file, URL, or safetensors)\npocket-tts generate --voice ./my_voice.wav\npocket-tts generate --voice \"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"\npocket-tts generate --voice ./voice.safetensors\n\n# Quality tuning\npocket-tts generate --temperature 0.7 --lsd-decode-steps 3\n\n\nSee docs/generate.md for full CLI reference.\n\nStart Web Server\n# Start FastAPI server with web UI\npocket-tts serve\n\n# Custom host/port\npocket-tts serve --host localhost --port 8080\n\n\nSee docs/serve.md for server options.\n\nExport Voice Embeddings\n\nConvert audio files to .safetensors for faster loading:\n\n# Single file\npocket-tts export-voice voice.mp3 voice.safetensors\n\n# Batch conversion\npocket-tts export-voice voices/ embeddings/ --truncate\n\n\nSee docs/export_voice.md for export options.\n\nPython API\nBasic Usage\nfrom pocket_tts import TTSModel\nimport scipy.io.wavfile\n\n# Load model\nmodel = TTSModel.load_model()\n\n# Get voice state\nvoice = model.get_state_for_audio_prompt(\n    \"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"\n)\n\n# Generate audio\naudio = model.generate_audio(voice, \"Hello world!\")\n\n# Save\nscipy.io.wavfile.write(\"output.wav\", model.sample_rate, audio.numpy())\n\nLoad Model\nmodel = TTSModel.load_model(\n    config=\"b6369a24\",       # Model variant\n    temp=0.7,                # Temperature (0.5-1.0)\n    lsd_decode_steps=1,      # Generation steps (1-5)\n    eos_threshold=-4.0       # End-of-sequence threshold\n)\n\nVoice State\n# From audio file/URL\nvoice = model.get_state_for_audio_prompt(\"./voice.wav\")\nvoice = model.get_state_for_audio_prompt(\"hf://kyutai/tts-voices/alba-mackenna/casual.wav\")\n\n# From safetensors (fast loading)\nvoice = model.get_state_for_audio_prompt(\"./voice.safetensors\")\n\nStreaming Generation\n# Stream audio chunks\nfor chunk in model.generate_audio_stream(voice, \"Long text...\"):\n    # Process/save/play each chunk as generated\n    print(f\"Chunk: {chunk.shape[0]} samples\")\n\nMulti-Voice Management\n# Preload multiple voices\nvoices = {\n    \"casual\": model.get_state_for_audio_prompt(\"hf://kyutai/tts-voices/alba-mackenna/casual.wav\"),\n    \"announcer\": model.get_state_for_audio_prompt(\"./announcer.safetensors\"),\n}\n\n# Use different voices\naudio1 = model.generate_audio(voices[\"casual\"], \"Hey there!\")\naudio2 = model.generate_audio(voices[\"announcer\"], \"Breaking news!\")\n\n\nSee docs/python-api.md for complete API reference.\n\nAvailable Voices\n\nPre-made voices from hf://kyutai/tts-voices/:\n\nalba-mackenna/casual.wav (default, female)\njessica-jian/casual.wav (female)\nvoice-donations/Selfie.wav (male, marius)\nvoice-donations/Butter.wav (male, javert)\nears/p010/freeform_speech_01.wav (male, jean)\nvctk/p244_023.wav (female, fantine)\nvctk/p262_023.wav (female, eponine)\nvctk/p303_023.wav (female, azelma)\n\nOr clone any voice from your own audio samples.\n\nVoice Cloning Tips\nClean audio - Remove background noise (use Adobe Podcast Enhance)\nLength - 3-10 seconds of speech is ideal\nQuality - Input quality affects output quality\nFormat - WAV, MP3, or any common audio format supported\nPerformance Tips\nCPU-only - GPU provides no speedup (model too small, batch size 1)\n2 cores - Uses only 2 CPU cores efficiently\nStreaming - Low latency (<200ms to first chunk)\nSafetensors - Pre-process voices to .safetensors for instant loading\nOutput Format\n\nAll commands output WAV files:\n\nSample rate: 24 kHz\nChannels: Mono\nBit depth: 16-bit PCM\nLinks\nGitHub\nTech Report\nPaper (arXiv)\nHuggingFace Model\nVoice Repository\nLive Demo"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/leonaaardob/lb-pocket-tts-skill",
    "publisherUrl": "https://clawhub.ai/leonaaardob/lb-pocket-tts-skill",
    "owner": "leonaaardob",
    "version": "0.1.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill",
    "downloadUrl": "https://openagent3.xyz/downloads/lb-pocket-tts-skill",
    "agentUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent",
    "manifestUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/lb-pocket-tts-skill/agent.md"
  }
}