Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.
Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).
npm install -g openclaw-avatarcam
SettingDefaultDescriptionavatardefault.vrmVRM avatar file pathbackground#00FF00Color (hex) or image path
PlatformCommandmacOSbrew install ffmpegLinuxsudo apt-get install -y xvfb xauth ffmpegWindowsInstall ffmpeg and add to PATHDockerSee Docker section below Note: macOS and Windows don't need xvfb — they have native display support.
Add to OPENCLAW_DOCKER_APT_PACKAGES: build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg
# With color background avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00" # With image background avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png" # With custom avatar avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"
Use OpenClaw's message tool with asVideoNote: message action=send filePath=/tmp/video.mp4 asVideoNote=true
Read config from TOOLS.md (avatar, background) Generate TTS if given text: tts text="..." → audio path Run avatarcam with audio + settings → MP4 output Send as video note via message action=send filePath=... asVideoNote=true Return NO_REPLY after sending
User: "Send me a video message saying hello" # 1. TTS tts text="Hello! How are you today?" → /tmp/voice.mp3 # 2. Generate video avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00" # 3. Send as video note message action=send filePath=/tmp/video.mp4 asVideoNote=true # 4. Reply NO_REPLY
SettingValueResolution384x384 (square)Frame rate30fps constantMax duration60 secondsVideo codecH.264 (libx264)Audio codecAACQualityCRF 18 (high quality)ContainerMP4
Electron renders VRM avatar with lip sync at 1280x720 WebM captured via canvas.captureStream(30) FFmpeg processes: crop → fps normalize → scale → encode Message tool sends via Telegram sendVideoNote API
PlatformDisplayNotesmacOSNative QuartzNo extra depsLinuxxvfb (headless)apt install xvfbWindowsNativeNo extra deps
Avatarcam auto-detects headless environments: Uses xvfb-run when $DISPLAY is not set (Linux only) macOS/Windows use native display GPU stall warnings are safe to ignore Generation time: ~1.5x realtime (20s audio ≈ 30s processing)
Config is read from TOOLS.md Clean up temp files after sending: rm /tmp/video*.mp4 For regular video (not circular), omit asVideoNote=true
Messaging, meetings, inboxes, CRM, and teammate communication surfaces.
Largest current source with strong distribution and engagement signals.