{
  "schemaVersion": "1.0",
  "item": {
    "slug": "talking-head-production",
    "name": "Talking Head Production",
    "source": "tencent",
    "type": "skill",
    "category": "内容创作",
    "sourceUrl": "https://clawhub.ai/okaris/talking-head-production",
    "canonicalUrl": "https://clawhub.ai/okaris/talking-head-production",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/talking-head-production",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=talking-head-production",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/talking-head-production"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/talking-head-production",
    "agentPageUrl": "https://openagent3.xyz/skills/talking-head-production/agent",
    "manifestUrl": "https://openagent3.xyz/skills/talking-head-production/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/talking-head-production/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Talking Head Production",
        "body": "Create talking head videos with AI avatars and lipsync via inference.sh CLI."
      },
      {
        "title": "Quick Start",
        "body": "curl -fsSL https://cli.inference.sh | sh && infsh login\n\n# Generate dialogue audio\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Welcome to our product tour. Today I will show you three features that will save you hours every week.\"\n}'\n\n# Create talking head video with OmniHuman\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"path/to/portrait.png\",\n  \"audio\": \"path/to/dialogue.mp3\"\n}'\n\nInstall note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available."
      },
      {
        "title": "Portrait Requirements",
        "body": "The source portrait image is critical. Poor portraits = poor video output."
      },
      {
        "title": "Must Have",
        "body": "RequirementWhySpecCenter-framedAvatar needs face in predictable positionFace centered in frameHead and shouldersBody visible for natural gesturesCrop below chestEyes to cameraCreates connection with viewerDirect frontal gazeNeutral expressionStarting point for animationSlight smile OK, not laughing/frowningClear faceModel needs to detect featuresNo sunglasses, heavy shadows, or obstructionsHigh resolutionDetail preservationMin 512x512 face region, ideally 1024x1024+"
      },
      {
        "title": "Background",
        "body": "TypeWhen to UseSolid colorProfessional, clean, easy to compositeSoft bokehNatural, lifestyle feelOffice/studioBusiness contextTransparent (via bg removal)Compositing into other scenes\n\n# Generate a professional portrait background\ninfsh app run falai/flux-dev-lora --input '{\n  \"prompt\": \"professional headshot photograph of a friendly business person, soft studio lighting, clean grey background, head and shoulders, direct eye contact, neutral pleasant expression, high quality portrait photography\"\n}'\n\n# Or remove background from existing portrait\ninfsh app run <bg-removal-app> --input '{\n  \"image\": \"path/to/portrait-with-background.png\"\n}'"
      },
      {
        "title": "Audio Quality",
        "body": "Audio quality directly impacts lipsync accuracy. Clean audio = accurate lip movement."
      },
      {
        "title": "Requirements",
        "body": "ParameterTargetWhyBackground noiseNone/minimalNoise confuses lipsync timingVolumeConsistent throughoutPrevents sync driftSample rate44.1kHz or 48kHzStandard qualityFormatMP3 128kbps+ or WAVCompatible with all tools"
      },
      {
        "title": "Generating Audio",
        "body": "# Simple narration\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Hi there! I am excited to share something with you today. We have been working on a feature that our users have been requesting for months... and it is finally here.\"\n}'\n\n# With emotion and pacing\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] You know what is frustrating? Spending hours on tasks that should take minutes. (sighs) We have all been there. But what if I told you... there is a better way?\"\n}'"
      },
      {
        "title": "Model Selection",
        "body": "ModelApp IDBest ForMax DurationOmniHuman 1.5bytedance/omnihuman-1-5Multi-character, gestures, high quality~30s per clipOmniHuman 1.0bytedance/omnihuman-1-0Single character, simpler~30s per clipPixVerse Lipsyncfalai/pixverse-lipsyncQuick lipsync on existing videoShort clipsFabricfalai/fabric-1-0Cloth/fabric animation on portraitsShort clips"
      },
      {
        "title": "Basic: Portrait + Audio -> Video",
        "body": "# 1. Generate or prepare audio\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Your narration script here.\"\n}'\n\n# 2. Generate talking head\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"portrait.png\",\n  \"audio\": \"narration.mp3\"\n}'"
      },
      {
        "title": "With Captions",
        "body": "# 1-2. Same as above\n\n# 3. Add captions to the talking head video\ninfsh app run infsh/caption-videos --input '{\n  \"video\": \"talking-head.mp4\",\n  \"caption_file\": \"captions.srt\"\n}'"
      },
      {
        "title": "Long-Form (Stitched Clips)",
        "body": "For content longer than 30 seconds, split into segments:\n\n# Generate audio segments\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment one script.\"}' --no-wait\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment two script.\"}' --no-wait\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment three script.\"}' --no-wait\n\n# Generate talking head for each segment (same portrait for consistency)\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment1.mp3\"}' --no-wait\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment2.mp3\"}' --no-wait\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment3.mp3\"}' --no-wait\n\n# Merge all segments\ninfsh app run infsh/media-merger --input '{\n  \"media\": [\"segment1.mp4\", \"segment2.mp4\", \"segment3.mp4\"]\n}'"
      },
      {
        "title": "Multi-Character Conversation",
        "body": "OmniHuman 1.5 supports up to 2 characters:\n\n# 1. Generate dialogue with two speakers\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] So tell me about the new feature. [S2] Sure! We built a dashboard that shows real-time analytics. [S1] That sounds great. How long did it take? [S2] About two weeks from concept to launch.\"\n}'\n\n# 2. Create video with two characters\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"two-person-portrait.png\",\n  \"audio\": \"dialogue.mp3\"\n}'"
      },
      {
        "title": "Framing Guidelines",
        "body": "┌─────────────────────────────────┐\n│          Headroom (minimal)     │\n│  ┌───────────────────────────┐  │\n│  │                           │  │\n│  │     ● ─ ─ Eyes at 1/3 ─ ─│─ │ ← Eyes at top 1/3 line\n│  │    /|\\                    │  │\n│  │     |   Head & shoulders  │  │\n│  │    / \\  visible           │  │\n│  │                           │  │\n│  └───────────────────────────┘  │\n│       Crop below chest          │\n└─────────────────────────────────┘"
      },
      {
        "title": "Common Mistakes",
        "body": "MistakeProblemFixLow-res portraitBlurry face, poor lipsyncUse 1024x1024+ face regionProfile/side angleLipsync can't track mouth wellUse frontal or near-frontalNoisy audioLipsync drifts, looks unnaturalRecord clean or use TTSToo-long clipsQuality degrades after 30sSplit into segments, stitchSunglasses/obstructionFace features hiddenClear face requiredInconsistent lightingUncanny when animatedEven, soft lightingNo captionsLoses silent/mobile viewersAlways add captions"
      },
      {
        "title": "Related Skills",
        "body": "npx skills add inference-sh/skills@ai-avatar-video\nnpx skills add inference-sh/skills@ai-video-generation\nnpx skills add inference-sh/skills@text-to-speech\n\nBrowse all apps: infsh app list"
      }
    ],
    "body": "Talking Head Production\n\nCreate talking head videos with AI avatars and lipsync via inference.sh CLI.\n\nQuick Start\ncurl -fsSL https://cli.inference.sh | sh && infsh login\n\n# Generate dialogue audio\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Welcome to our product tour. Today I will show you three features that will save you hours every week.\"\n}'\n\n# Create talking head video with OmniHuman\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"path/to/portrait.png\",\n  \"audio\": \"path/to/dialogue.mp3\"\n}'\n\n\nInstall note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.\n\nPortrait Requirements\n\nThe source portrait image is critical. Poor portraits = poor video output.\n\nMust Have\nRequirement\tWhy\tSpec\nCenter-framed\tAvatar needs face in predictable position\tFace centered in frame\nHead and shoulders\tBody visible for natural gestures\tCrop below chest\nEyes to camera\tCreates connection with viewer\tDirect frontal gaze\nNeutral expression\tStarting point for animation\tSlight smile OK, not laughing/frowning\nClear face\tModel needs to detect features\tNo sunglasses, heavy shadows, or obstructions\nHigh resolution\tDetail preservation\tMin 512x512 face region, ideally 1024x1024+\nBackground\nType\tWhen to Use\nSolid color\tProfessional, clean, easy to composite\nSoft bokeh\tNatural, lifestyle feel\nOffice/studio\tBusiness context\nTransparent (via bg removal)\tCompositing into other scenes\n# Generate a professional portrait background\ninfsh app run falai/flux-dev-lora --input '{\n  \"prompt\": \"professional headshot photograph of a friendly business person, soft studio lighting, clean grey background, head and shoulders, direct eye contact, neutral pleasant expression, high quality portrait photography\"\n}'\n\n# Or remove background from existing portrait\ninfsh app run <bg-removal-app> --input '{\n  \"image\": \"path/to/portrait-with-background.png\"\n}'\n\nAudio Quality\n\nAudio quality directly impacts lipsync accuracy. Clean audio = accurate lip movement.\n\nRequirements\nParameter\tTarget\tWhy\nBackground noise\tNone/minimal\tNoise confuses lipsync timing\nVolume\tConsistent throughout\tPrevents sync drift\nSample rate\t44.1kHz or 48kHz\tStandard quality\nFormat\tMP3 128kbps+ or WAV\tCompatible with all tools\nGenerating Audio\n# Simple narration\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Hi there! I am excited to share something with you today. We have been working on a feature that our users have been requesting for months... and it is finally here.\"\n}'\n\n# With emotion and pacing\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] You know what is frustrating? Spending hours on tasks that should take minutes. (sighs) We have all been there. But what if I told you... there is a better way?\"\n}'\n\nModel Selection\nModel\tApp ID\tBest For\tMax Duration\nOmniHuman 1.5\tbytedance/omnihuman-1-5\tMulti-character, gestures, high quality\t~30s per clip\nOmniHuman 1.0\tbytedance/omnihuman-1-0\tSingle character, simpler\t~30s per clip\nPixVerse Lipsync\tfalai/pixverse-lipsync\tQuick lipsync on existing video\tShort clips\nFabric\tfalai/fabric-1-0\tCloth/fabric animation on portraits\tShort clips\nProduction Workflows\nBasic: Portrait + Audio -> Video\n# 1. Generate or prepare audio\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] Your narration script here.\"\n}'\n\n# 2. Generate talking head\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"portrait.png\",\n  \"audio\": \"narration.mp3\"\n}'\n\nWith Captions\n# 1-2. Same as above\n\n# 3. Add captions to the talking head video\ninfsh app run infsh/caption-videos --input '{\n  \"video\": \"talking-head.mp4\",\n  \"caption_file\": \"captions.srt\"\n}'\n\nLong-Form (Stitched Clips)\n\nFor content longer than 30 seconds, split into segments:\n\n# Generate audio segments\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment one script.\"}' --no-wait\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment two script.\"}' --no-wait\ninfsh app run falai/dia-tts --input '{\"prompt\": \"[S1] Segment three script.\"}' --no-wait\n\n# Generate talking head for each segment (same portrait for consistency)\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment1.mp3\"}' --no-wait\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment2.mp3\"}' --no-wait\ninfsh app run bytedance/omnihuman-1-5 --input '{\"image\": \"portrait.png\", \"audio\": \"segment3.mp3\"}' --no-wait\n\n# Merge all segments\ninfsh app run infsh/media-merger --input '{\n  \"media\": [\"segment1.mp4\", \"segment2.mp4\", \"segment3.mp4\"]\n}'\n\nMulti-Character Conversation\n\nOmniHuman 1.5 supports up to 2 characters:\n\n# 1. Generate dialogue with two speakers\ninfsh app run falai/dia-tts --input '{\n  \"prompt\": \"[S1] So tell me about the new feature. [S2] Sure! We built a dashboard that shows real-time analytics. [S1] That sounds great. How long did it take? [S2] About two weeks from concept to launch.\"\n}'\n\n# 2. Create video with two characters\ninfsh app run bytedance/omnihuman-1-5 --input '{\n  \"image\": \"two-person-portrait.png\",\n  \"audio\": \"dialogue.mp3\"\n}'\n\nFraming Guidelines\n┌─────────────────────────────────┐\n│          Headroom (minimal)     │\n│  ┌───────────────────────────┐  │\n│  │                           │  │\n│  │     ● ─ ─ Eyes at 1/3 ─ ─│─ │ ← Eyes at top 1/3 line\n│  │    /|\\                    │  │\n│  │     |   Head & shoulders  │  │\n│  │    / \\  visible           │  │\n│  │                           │  │\n│  └───────────────────────────┘  │\n│       Crop below chest          │\n└─────────────────────────────────┘\n\nCommon Mistakes\nMistake\tProblem\tFix\nLow-res portrait\tBlurry face, poor lipsync\tUse 1024x1024+ face region\nProfile/side angle\tLipsync can't track mouth well\tUse frontal or near-frontal\nNoisy audio\tLipsync drifts, looks unnatural\tRecord clean or use TTS\nToo-long clips\tQuality degrades after 30s\tSplit into segments, stitch\nSunglasses/obstruction\tFace features hidden\tClear face required\nInconsistent lighting\tUncanny when animated\tEven, soft lighting\nNo captions\tLoses silent/mobile viewers\tAlways add captions\nRelated Skills\nnpx skills add inference-sh/skills@ai-avatar-video\nnpx skills add inference-sh/skills@ai-video-generation\nnpx skills add inference-sh/skills@text-to-speech\n\n\nBrowse all apps: infsh app list"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/okaris/talking-head-production",
    "publisherUrl": "https://clawhub.ai/okaris/talking-head-production",
    "owner": "okaris",
    "version": "0.1.5",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/talking-head-production",
    "downloadUrl": "https://openagent3.xyz/downloads/talking-head-production",
    "agentUrl": "https://openagent3.xyz/skills/talking-head-production/agent",
    "manifestUrl": "https://openagent3.xyz/skills/talking-head-production/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/talking-head-production/agent.md"
  }
}