Tencent SkillHub · Productivity

Azure Ai Voicelive Py

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, references/api-reference.md, references/examples.md, references/models.md

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 0.1.0

Provenance

Publisher: thegovind
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 18 sections Open source page

Azure AI Voice Live SDK

Build real-time voice AI applications with bidirectional WebSocket communication.

Installation

pip install azure-ai-voicelive aiohttp azure-identity

Environment Variables

AZURE_COGNITIVE_SERVICES_ENDPOINT=https://<region>.api.cognitive.microsoft.com # For API key auth (not recommended for production) AZURE_COGNITIVE_SERVICES_KEY=<api-key>

Authentication

DefaultAzureCredential (preferred): from azure.ai.voicelive.aio import connect from azure.identity.aio import DefaultAzureCredential async with connect( endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"], credential=DefaultAzureCredential(), model="gpt-4o-realtime-preview", credential_scopes=["https://cognitiveservices.azure.com/.default"] ) as conn: ... API Key: from azure.ai.voicelive.aio import connect from azure.core.credentials import AzureKeyCredential async with connect( endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"], credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]), model="gpt-4o-realtime-preview" ) as conn: ...

Quick Start

import asyncio import os from azure.ai.voicelive.aio import connect from azure.identity.aio import DefaultAzureCredential async def main(): async with connect( endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"], credential=DefaultAzureCredential(), model="gpt-4o-realtime-preview", credential_scopes=["https://cognitiveservices.azure.com/.default"] ) as conn: # Update session with instructions await conn.session.update(session={ "instructions": "You are a helpful assistant.", "modalities": ["text", "audio"], "voice": "alloy" }) # Listen for events async for event in conn: print(f"Event: {event.type}") if event.type == "response.audio_transcript.done": print(f"Transcript: {event.transcript}") elif event.type == "response.done": break asyncio.run(main())

Connection Resources

The VoiceLiveConnection exposes these resources: ResourcePurposeKey Methodsconn.sessionSession configurationupdate(session=...)conn.responseModel responsescreate(), cancel()conn.input_audio_bufferAudio inputappend(), commit(), clear()conn.output_audio_bufferAudio outputclear()conn.conversationConversation stateitem.create(), item.delete(), item.truncate()conn.transcription_sessionTranscription configupdate(session=...)

Session Configuration

from azure.ai.voicelive.models import RequestSession, FunctionTool await conn.session.update(session=RequestSession( instructions="You are a helpful voice assistant.", modalities=["text", "audio"], voice="alloy", # or "echo", "shimmer", "sage", etc. input_audio_format="pcm16", output_audio_format="pcm16", turn_detection={ "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500 }, tools=[ FunctionTool( type="function", name="get_weather", description="Get current weather", parameters={ "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } ) ] ))

Send Audio (Base64 PCM16)

import base64 # Read audio chunk (16-bit PCM, 24kHz mono) audio_chunk = await read_audio_from_microphone() b64_audio = base64.b64encode(audio_chunk).decode() await conn.input_audio_buffer.append(audio=b64_audio)

Receive Audio

async for event in conn: if event.type == "response.audio.delta": audio_bytes = base64.b64decode(event.delta) await play_audio(audio_bytes) elif event.type == "response.audio.done": print("Audio complete")

Event Handling

async for event in conn: match event.type: # Session events case "session.created": print(f"Session: {event.session}") case "session.updated": print("Session updated") # Audio input events case "input_audio_buffer.speech_started": print(f"Speech started at {event.audio_start_ms}ms") case "input_audio_buffer.speech_stopped": print(f"Speech stopped at {event.audio_end_ms}ms") # Transcription events case "conversation.item.input_audio_transcription.completed": print(f"User said: {event.transcript}") case "conversation.item.input_audio_transcription.delta": print(f"Partial: {event.delta}") # Response events case "response.created": print(f"Response started: {event.response.id}") case "response.audio_transcript.delta": print(event.delta, end="", flush=True) case "response.audio.delta": audio = base64.b64decode(event.delta) case "response.done": print(f"Response complete: {event.response.status}") # Function calls case "response.function_call_arguments.done": result = handle_function(event.name, event.arguments) await conn.conversation.item.create(item={ "type": "function_call_output", "call_id": event.call_id, "output": json.dumps(result) }) await conn.response.create() # Errors case "error": print(f"Error: {event.error.message}")

Manual Turn Mode (No VAD)

await conn.session.update(session={"turn_detection": None}) # Manually control turns await conn.input_audio_buffer.append(audio=b64_audio) await conn.input_audio_buffer.commit() # End of user turn await conn.response.create() # Trigger response

Interrupt Handling

async for event in conn: if event.type == "input_audio_buffer.speech_started": # User interrupted - cancel current response await conn.response.cancel() await conn.output_audio_buffer.clear()

Conversation History

# Add system message await conn.conversation.item.create(item={ "type": "message", "role": "system", "content": [{"type": "input_text", "text": "Be concise."}] }) # Add user message await conn.conversation.item.create(item={ "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }) await conn.response.create()

Voice Options

VoiceDescriptionalloyNeutral, balancedechoWarm, conversationalshimmerClear, professionalsageCalm, authoritativecoralFriendly, upbeatashDeep, measuredballadExpressiveverseStorytelling Azure voices: Use AzureStandardVoice, AzureCustomVoice, or AzurePersonalVoice models.

Audio Formats

FormatSample RateUse Casepcm1624kHzDefault, high qualitypcm16-8000hz8kHzTelephonypcm16-16000hz16kHzVoice assistantsg711_ulaw8kHzTelephony (US)g711_alaw8kHzTelephony (EU)

Turn Detection Options

# Server VAD (default) {"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500} # Azure Semantic VAD (smarter detection) {"type": "azure_semantic_vad"} {"type": "azure_semantic_vad_en"} # English optimized {"type": "azure_semantic_vad_multilingual"}

Error Handling

from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed try: async with connect(...) as conn: async for event in conn: if event.type == "error": print(f"API Error: {event.error.code} - {event.error.message}") except ConnectionClosed as e: print(f"Connection closed: {e.code} - {e.reason}") except ConnectionError as e: print(f"Connection error: {e}")

References

Detailed API Reference: See references/api-reference.md Complete Examples: See references/examples.md All Models & Types: See references/models.md

Category context

Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

4 Docs

SKILL.md Primary doc
references/api-reference.md Docs
references/examples.md Docs
references/models.md Docs