Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Real-time speech synthesis with Alibaba Cloud Model Studio Qwen TTS Realtime models. Use when low-latency interactive speech is required, including instruction-controlled realtime synthesis.
Real-time speech synthesis with Alibaba Cloud Model Studio Qwen TTS Realtime models. Use when low-latency interactive speech is required, including instruction-controlled realtime synthesis.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Use realtime TTS models for low-latency streaming speech output.
Use one of these exact model strings: qwen3-tts-flash-realtime qwen3-tts-instruct-flash-realtime qwen3-tts-instruct-flash-realtime-2026-01-22 qwen3-tts-vd-realtime-2026-01-15 qwen3-tts-vc-realtime-2026-01-15
Install SDK in a virtual environment: python3 -m venv .venv . .venv/bin/activate python -m pip install dashscope Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
text (string, required) voice (string, required) instruction (string, optional) sample_rate (int, optional)
audio_base64_pcm_chunks (array<string>) sample_rate (int) finish_reason (string)
Use websocket or streaming endpoint for realtime mode. Keep each utterance short for lower latency. For instruction models, keep instruction explicit and concise. Some SDK/runtime combinations may reject realtime model calls over MultiModalConversation; use the probe script below to verify compatibility.
Use the probe script to verify realtime compatibility in your current SDK/runtime, and optionally fallback to a non-realtime model for immediate output: .venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-realtime/scripts/realtime_tts_demo.py \ --text "This is a realtime speech demo." \ --fallback \ --output output/ai-audio-tts-realtime/audio/fallback-demo.wav Strict mode (for CI / gating): .venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-realtime/scripts/realtime_tts_demo.py \ --text "realtime health check" \ --strict
Default output: output/ai-audio-tts-realtime/audio/ Override base dir with OUTPUT_DIR.
mkdir -p output/alicloud-ai-audio-tts-realtime for f in skills/ai/audio/alicloud-ai-audio-tts-realtime/scripts/*.py; do python3 -m py_compile "$f" done echo "py_compile_ok" > output/alicloud-ai-audio-tts-realtime/validate.txt Pass criteria: command exits 0 and output/alicloud-ai-audio-tts-realtime/validate.txt is generated.
Save artifacts, command outputs, and API response summaries under output/alicloud-ai-audio-tts-realtime/. Include key parameters (region/resource id/time range) in evidence files for reproducibility.
Confirm user intent, region, identifiers, and whether the operation is read-only or mutating. Run one minimal read-only query first to verify connectivity and permissions. Execute the target operation with explicit parameters and bounded scope. Verify results and save output/evidence files.
references/sources.md
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.