Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Local Voice Input/Output for Agents using the AI Voice Agent API.
Local Voice Input/Output for Agents using the AI Voice Agent API.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
This skill allows you to speak and listen to the user using a local Voice Agent API. It is client-only and does not start containers or services. It uses local Whisper for Speech-to-Text transcription and AWS Polly for Text-to-Speech generation.
Requires a running backend API at http://localhost:8000. Backend setup instructions are in this repository: README.md walkthrough.md DOCKER_README.md
Audio First: When the user communicates via audio (files), your PRIMARY mode of response is Audio File. Silent Delivery: When sending an audio response, DO NOT send a text explanation like "I sent an audio". Just send the audio file. Workflow: User sends audio. Use transcribe to read it. You think of a response. Use synthesize to generate the audio file. You send the file. STOP. Do not add text commentary. Failure Handling: If health fails or connection errors occur, do not attempt service management from this skill. Ask the user to start or fix the backend using the repository docs.
To transcribe an audio file with local Whisper STT, run the client script with the transcribe command. python3 {baseDir}/scripts/client.py transcribe "/path/to/audio/file.ogg"
To generate audio from text with AWS Polly TTS and save it to a file, run the client script with the synthesize command. python3 {baseDir}/scripts/client.py synthesize "Text to speak" --output "/path/to/output.mp3"
To check if the voice agent API is running and healthy: python3 {baseDir}/scripts/client.py health
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.