{
  "schemaVersion": "1.0",
  "item": {
    "slug": "speech-recognition",
    "name": "speech-recognition",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/demo112/speech-recognition",
    "canonicalUrl": "https://clawhub.ai/demo112/speech-recognition",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/speech-recognition",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=speech-recognition",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md",
      "skill.json"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-07T17:22:31.273Z",
      "expiresAt": "2026-05-14T17:22:31.273Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
        "contentDisposition": "attachment; filename=\"afrexai-annual-report-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/speech-recognition"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/speech-recognition",
    "agentPageUrl": "https://openagent3.xyz/skills/speech-recognition/agent",
    "manifestUrl": "https://openagent3.xyz/skills/speech-recognition/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/speech-recognition/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "通用语音识别",
        "body": "使用硅基流动 SenseVoice API 进行语音识别，支持多种音频格式。"
      },
      {
        "title": "激活条件",
        "body": "触发场景说明用户发送语音消息.ogg / .mp3 / .wav / .m4a 文件用户要求转录音频\"转录这个音频\"、\"语音转文字\"音频文件处理需要提取音频中的文字内容"
      },
      {
        "title": "API Key",
        "body": "在 ~/.openclaw/openclaw.json 中配置：\n\n{\n  \"providers\": {\n    \"siliconflow\": {\n      \"apiKey\": \"sk-xxx\"\n    }\n  }\n}"
      },
      {
        "title": "API 端点",
        "body": "POST https://api.siliconflow.cn/v1/audio/transcriptions"
      },
      {
        "title": "支持的模型",
        "body": "模型说明FunAudioLLM/SenseVoiceSmall默认，中文效果好"
      },
      {
        "title": "方法一：直接调用 API",
        "body": "import requests\n\napi_key = \"sk-xxx\"\n\nwith open(\"/path/to/audio.mp3\", \"rb\") as f:\n    audio_data = f.read()\n\nresponse = requests.post(\n    \"https://api.siliconflow.cn/v1/audio/transcriptions\",\n    headers={\"Authorization\": f\"Bearer {api_key}\"},\n    files={\"file\": (\"audio.mp3\", audio_data, \"audio/mpeg\")},\n    data={\"model\": \"FunAudioLLM/SenseVoiceSmall\"},\n    timeout=60\n)\n\nprint(response.json().get(\"text\", \"\"))"
      },
      {
        "title": "方法二：处理用户语音消息",
        "body": "当用户发送 .ogg 语音消息时：\n\n# 1. 转换格式（如果是 ogg）\nffmpeg -i /path/to/audio.ogg -ar 16000 -ac 1 /tmp/audio.mp3 -y\n\n# 2. 调用硅基流动 API（API Key 从环境变量读取）\npython3 -c \"\nimport requests\nimport os\n\napi_key = os.environ.get('SILICONFLOW_API_KEY')\nif not api_key:\n    raise ValueError('请设置 SILICONFLOW_API_KEY 环境变量')\n\nwith open('/tmp/audio.mp3', 'rb') as f:\n    audio_data = f.read()\n\nresponse = requests.post(\n    'https://api.siliconflow.cn/v1/audio/transcriptions',\n    headers={'Authorization': f'Bearer {api_key}'},\n    files={'file': ('audio.mp3', audio_data, 'audio/mpeg')},\n    data={'model': 'FunAudioLLM/SenseVoiceSmall'},\n    timeout=60\n)\nprint(response.json().get('text', ''))\n\""
      },
      {
        "title": "支持的音频格式",
        "body": "格式扩展名说明MP3.mp3推荐，兼容性好OGG.oggTelegram/Signal 语音格式，需转换WAV.wav无压缩，文件大M4A.m4aiOS 录音格式FLAC.flac无损压缩"
      },
      {
        "title": "格式转换",
        "body": "如果音频不是 MP3 格式，用 FFmpeg 转换：\n\n# OGG → MP3\nffmpeg -i input.ogg -ar 16000 -ac 1 output.mp3 -y\n\n# WAV → MP3\nffmpeg -i input.wav -ar 16000 -ac 1 output.mp3 -y\n\n# M4A → MP3\nffmpeg -i input.m4a -ar 16000 -ac 1 output.mp3 -y\n\n参数说明：\n\n-ar 16000: 采样率 16kHz（语音识别推荐）\n-ac 1: 单声道（减少文件大小）\n-y: 覆盖已存在的文件"
      },
      {
        "title": "错误处理",
        "body": "错误原因解决401 UnauthorizedAPI Key 无效检查配置413 Payload Too Large文件太大压缩或分割音频timeout网络超时重试或检查网络Invalid audio format格式不支持用 FFmpeg 转换"
      },
      {
        "title": "注意事项",
        "body": "文件大小限制：建议 < 10MB\n时长限制：建议 < 5 分钟\n语言支持：中文效果最好，英文也支持\n隐私：音频会上传到硅基流动服务器"
      },
      {
        "title": "相关 Skills",
        "body": "Skill说明douyin-video抖音视频语音提取cosyvoice-tts文字转语音\n\n版本：1.0.0\n创建于：2026-02-26"
      }
    ],
    "body": "通用语音识别\n\n使用硅基流动 SenseVoice API 进行语音识别，支持多种音频格式。\n\n激活条件\n触发场景\t说明\n用户发送语音消息\t.ogg / .mp3 / .wav / .m4a 文件\n用户要求转录音频\t\"转录这个音频\"、\"语音转文字\"\n音频文件处理\t需要提取音频中的文字内容\n配置\nAPI Key\n\n在 ~/.openclaw/openclaw.json 中配置：\n\n{\n  \"providers\": {\n    \"siliconflow\": {\n      \"apiKey\": \"sk-xxx\"\n    }\n  }\n}\n\nAPI 端点\nPOST https://api.siliconflow.cn/v1/audio/transcriptions\n\n支持的模型\n模型\t说明\nFunAudioLLM/SenseVoiceSmall\t默认，中文效果好\n使用方法\n方法一：直接调用 API\nimport requests\n\napi_key = \"sk-xxx\"\n\nwith open(\"/path/to/audio.mp3\", \"rb\") as f:\n    audio_data = f.read()\n\nresponse = requests.post(\n    \"https://api.siliconflow.cn/v1/audio/transcriptions\",\n    headers={\"Authorization\": f\"Bearer {api_key}\"},\n    files={\"file\": (\"audio.mp3\", audio_data, \"audio/mpeg\")},\n    data={\"model\": \"FunAudioLLM/SenseVoiceSmall\"},\n    timeout=60\n)\n\nprint(response.json().get(\"text\", \"\"))\n\n方法二：处理用户语音消息\n\n当用户发送 .ogg 语音消息时：\n\n# 1. 转换格式（如果是 ogg）\nffmpeg -i /path/to/audio.ogg -ar 16000 -ac 1 /tmp/audio.mp3 -y\n\n# 2. 调用硅基流动 API（API Key 从环境变量读取）\npython3 -c \"\nimport requests\nimport os\n\napi_key = os.environ.get('SILICONFLOW_API_KEY')\nif not api_key:\n    raise ValueError('请设置 SILICONFLOW_API_KEY 环境变量')\n\nwith open('/tmp/audio.mp3', 'rb') as f:\n    audio_data = f.read()\n\nresponse = requests.post(\n    'https://api.siliconflow.cn/v1/audio/transcriptions',\n    headers={'Authorization': f'Bearer {api_key}'},\n    files={'file': ('audio.mp3', audio_data, 'audio/mpeg')},\n    data={'model': 'FunAudioLLM/SenseVoiceSmall'},\n    timeout=60\n)\nprint(response.json().get('text', ''))\n\"\n\n支持的音频格式\n格式\t扩展名\t说明\nMP3\t.mp3\t推荐，兼容性好\nOGG\t.ogg\tTelegram/Signal 语音格式，需转换\nWAV\t.wav\t无压缩，文件大\nM4A\t.m4a\tiOS 录音格式\nFLAC\t.flac\t无损压缩\n格式转换\n\n如果音频不是 MP3 格式，用 FFmpeg 转换：\n\n# OGG → MP3\nffmpeg -i input.ogg -ar 16000 -ac 1 output.mp3 -y\n\n# WAV → MP3\nffmpeg -i input.wav -ar 16000 -ac 1 output.mp3 -y\n\n# M4A → MP3\nffmpeg -i input.m4a -ar 16000 -ac 1 output.mp3 -y\n\n\n参数说明：\n\n-ar 16000: 采样率 16kHz（语音识别推荐）\n-ac 1: 单声道（减少文件大小）\n-y: 覆盖已存在的文件\n错误处理\n错误\t原因\t解决\n401 Unauthorized\tAPI Key 无效\t检查配置\n413 Payload Too Large\t文件太大\t压缩或分割音频\ntimeout\t网络超时\t重试或检查网络\nInvalid audio format\t格式不支持\t用 FFmpeg 转换\n注意事项\n文件大小限制：建议 < 10MB\n时长限制：建议 < 5 分钟\n语言支持：中文效果最好，英文也支持\n隐私：音频会上传到硅基流动服务器\n相关 Skills\nSkill\t说明\ndouyin-video\t抖音视频语音提取\ncosyvoice-tts\t文字转语音\n\n版本：1.0.0 创建于：2026-02-26"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/demo112/speech-recognition",
    "publisherUrl": "https://clawhub.ai/demo112/speech-recognition",
    "owner": "demo112",
    "version": "1.0.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/speech-recognition",
    "downloadUrl": "https://openagent3.xyz/downloads/speech-recognition",
    "agentUrl": "https://openagent3.xyz/skills/speech-recognition/agent",
    "manifestUrl": "https://openagent3.xyz/skills/speech-recognition/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/speech-recognition/agent.md"
  }
}