Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Pronunciation coaching with real voice analysis using Azure Speech Services. Analyzes audio files for phoneme-level accuracy, fluency, prosody, and intonatio...
Pronunciation coaching with real voice analysis using Azure Speech Services. Analyzes audio files for phoneme-level accuracy, fluency, prosody, and intonatio...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Analyze spoken English pronunciation using Azure Speech Services and provide actionable coaching feedback. Privacy Note: This skill reads local voice messages from ~/.openclaw/media/inbound/ and transmits them to Microsoft Azure Speech Services for processing.
Azure Speech API Key: Set AZURE_SPEECH_KEY env var Azure Speech Region: Set AZURE_SPEECH_REGION env var (e.g., southeastasia) ffmpeg: Required for audio format conversion (must be on PATH) Node.js: Required for report generation
Voice messages from Telegram are stored in ~/.openclaw/media/inbound/. Find the latest .ogg file matching the message timestamp. ls -lt ~/.openclaw/media/inbound/*.ogg | head -5
scripts/pronunciation-assess.sh <audio_file> "<reference_text>" audio_file: Path to the voice message (ogg/wav/mp3/m4a) reference_text: What the speaker intended to say (from transcript) The script auto-converts any format to WAV 16kHz mono
Pipe the JSON output into the report generator: scripts/pronunciation-assess.sh audio.ogg "reference text" | node scripts/pronunciation-report.js The report includes: Overall scores (Pronunciation, Accuracy, Fluency, Prosody, Completeness) Word-by-word breakdown with per-phoneme scores Problem sounds highlighted Verdict with actionable next steps
After generating the report: Send the text report to the user (scores + word breakdown) Identify top 3 problem sounds from the phoneme scores Explain each problem β what the correct sound is and how to produce it See references/phoneme-guide.md for phoneme descriptions and fixes Send a voice message (via TTS) demonstrating the correct pronunciation of problem words Assign practice β give the user specific sentences to re-record focusing on weak sounds
Scores β₯ 90: Excellent, minor polish Scores 70-89: Good, targeted practice needed Scores < 70: Needs focused drill on that specific sound "Omission" errors mean the word wasn't detected β speaker may have been too quiet or mumbled Prosody score < 85 suggests monotone delivery β coach on intonation rises/falls Compare scores across multiple recordings to track improvement
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.