# Send Zhipu AI ASR to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "zhipu-asr",
    "name": "Zhipu AI ASR",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/franklu0819-lang/zhipu-asr",
    "canonicalUrl": "https://clawhub.ai/franklu0819-lang/zhipu-asr",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/zhipu-asr",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=zhipu-asr",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "README.md",
      "SKILL.md",
      "_meta.json",
      "package.json",
      "scripts/speech_to_text.sh"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/zhipu-asr"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/zhipu-asr",
    "downloadUrl": "https://openagent3.xyz/downloads/zhipu-asr",
    "agentUrl": "https://openagent3.xyz/skills/zhipu-asr/agent",
    "manifestUrl": "https://openagent3.xyz/skills/zhipu-asr/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/zhipu-asr/agent.md"
  }
}
```
## Documentation

### Zhipu AI Automatic Speech Recognition (ASR)

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

### Setup

1. Get your API Key:
Get a key from Zhipu AI Console

2. Set it in your environment:

export ZHIPU_API_KEY="your-key-here"

### Supported Audio Formats

WAV - Recommended, best quality
MP3 - Widely supported
OGG - Auto-converted to MP3
M4A - Auto-converted to MP3
AAC - Auto-converted to MP3
FLAC - Auto-converted to MP3
WMA - Auto-converted to MP3

Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

### File Constraints

Maximum file size: 25 MB
Maximum duration: 30 seconds
Recommended sample rate: 16000 Hz or higher
Audio channels: Mono or stereo

### Basic Transcription

Transcribe an audio file with default settings:

bash scripts/speech_to_text.sh recording.wav

### Transcription with Context

Provide previous transcription or context for better accuracy:

bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容，有助于提高准确性"

### Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

### Full Options

Combine context and hotwords:

bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

Parameters:

audio_file (required): Path to audio file (.wav or .mp3)
prompt (optional): Previous transcription or context text (max 8000 chars)
hotwords (optional): Comma-separated list of specific terms (max 100 words)

### Context Prompts

Why use context prompts:

Improves accuracy in long conversations
Helps with domain-specific terminology
Maintains consistency across multiple segments

When to use:

Multi-part conversations or meetings
Technical or specialized content
Continuing from previous transcriptions

Example:

bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容：讨论了项目进展和下一步计划"

### Hotwords

What are hotwords:
Custom vocabulary list that boosts recognition accuracy for specific terms.

Best use cases:

Proper names (people, places)
Domain-specific terminology
Company names and products
Technical jargon
Industry-specific terms

Examples:

# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"

# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

### Transcribe a Meeting

# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav

# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"

### Transcribe a Lecture

bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"

### Process Multiple Files

for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

### Audio Quality Tips

Best practices for accurate transcription:

Clear audio source

Minimize background noise
Use good quality microphone
Speak clearly and at moderate pace



Optimal audio settings

Sample rate: 16000 Hz or higher
Bit depth: 16-bit or higher
Single channel (mono) is sufficient



File preparation

Remove silence from beginning/end
Normalize audio levels
Ensure consistent volume

### Output Format

The script outputs JSON with:

id: Task ID
created: Request timestamp (Unix timestamp)
request_id: Unique request identifier
model: Model name used
text: Transcribed text

Example output:

{
  "id": "task-12345",
  "created": 1234567890,
  "request_id": "req-abc123",
  "model": "glm-asr-2512",
  "text": "你好，这是转录的文本内容"
}

### Troubleshooting

File Size Issues:

Split audio files larger than 25 MB
Reduce sample rate or bit depth
Use compression (MP3) for smaller files

Duration Issues:

Split recordings longer than 30 seconds
Process segments separately
Use context prompts to maintain continuity

Poor Accuracy:

Improve audio quality
Use hotwords for specific terms
Provide context prompts
Ensure clear speech and minimal noise

Format Issues:

Ensure file is .wav or .mp3
Check file is not corrupted
Verify audio can be played by standard players

### Limitations

Maximum audio duration: 30 seconds per request
File size limit: 25 MB
Maximum hotwords: 100 terms
Context prompt limit: 8000 characters
Best performance with Chinese language audio

### Performance Notes

Typical transcription time: 1-3 seconds
Real-time or faster for most audio
Processing time scales with audio quality and length
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: franklu0819-lang
- Version: 1.0.2
## Source health
- Status: healthy
- Source download looks usable.
- Yavira can redirect you to the upstream package for this source.
- Health scope: source
- Reason: direct_download_ok
- Checked at: 2026-04-30T16:55:25.780Z
- Expires at: 2026-05-07T16:55:25.780Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/zhipu-asr)
- [Send to Agent page](https://openagent3.xyz/skills/zhipu-asr/agent)
- [JSON manifest](https://openagent3.xyz/skills/zhipu-asr/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/zhipu-asr/agent.md)
- [Download page](https://openagent3.xyz/downloads/zhipu-asr)