Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Use this skill when the user wants AssemblyAI rather than generic transcription, or when the job benefits from AssemblyAI-specific capabilities such as: model routing across universal-3-pro and universal-2 language detection and code switching diarisation plus speaker name / role mapping translation, custom formatting, or AssemblyAI speaker identification subtitles, paragraphs, sentences, topic / entity / sentiment tasks transcript output that is easy for other agents to consume as Markdown or normalised JSON The skill is designed for AI agents like OpenClaw, not just end users. It provides: A no-dependency Node CLI in scripts/assemblyai.mjs (and a compatibility wrapper at assemblyai.mjs) Bundled model/language knowledge via models and languages commands Stable transcript output formats agent-friendly Markdown normalised agent JSON bundle manifests for downstream automation Speaker mapping workflows manual speaker/channel maps AssemblyAI speaker identification merged display names in both Markdown and JSON AssemblyAI LLM Gateway integration for structured extraction from transcripts
If they just want βa transcriptβ, a generic solution may be enough. Reach for this skill when the user mentions AssemblyAI, wants a specific AssemblyAI feature, or needs the richer outputs and post-processing this skill provides.
New transcription β transcribe Existing transcript id β get or wait Re-render existing saved JSON β format Post-process an existing transcript β understand Run transcript text through LLM Gateway β llm Need a quick capability lookup before deciding β models or languages
For most unknown-language or mixed-language jobs, prefer: node {baseDir}/assemblyai.mjs transcribe INPUT --bundle-dir ./assemblyai-out --all-exports Why: the CLI defaults to auto-best routing when models are not specified it writes a manifest + multiple files that agents can inspect without reparsing terminal output Markdown and agent JSON become available immediately for follow-on steps
Use this when the source language is unknown or could be outside the 6-language Universal-3-Pro set: node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --bundle-dir ./out --all-exports This defaults to model routing plus language detection unless the request already specifies a model or language.
If the language is known and supported by Universal-3-Pro, prefer an explicit request: node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speech-model universal-3-pro --language-code en_us --bundle-dir ./out
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --bundle-dir ./out
Manual mapping: node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --speaker-map @assets/speaker-map.example.json --bundle-dir ./out AssemblyAI speaker identification: node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --speaker-type role --known-speakers "host,guest" --bundle-dir ./out Or post-process an existing transcript: node {baseDir}/assemblyai.mjs understand TRANSCRIPT_ID --speaker-type name --speaker-profiles @assets/speaker-profiles-name.example.json --bundle-dir ./out
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --translate-to de,fr --match-original-utterance --bundle-dir ./out
node {baseDir}/assemblyai.mjs llm TRANSCRIPT_ID --prompt @assets/example-prompt.txt --schema @assets/llm-json-schema.example.json --out ./summary.json
Use for local files or remote URLs. Local files are uploaded first. Public URLs are sent directly to AssemblyAI. Waits by default, then renders output. Prefer --bundle-dir for anything longer than a trivial clip.
Use when you already have the transcript id. wait blocks until completion; get fetches immediately unless you add --wait.
Use when you already saved: raw transcript JSON from AssemblyAI, or the normalised agent JSON produced by this skill This is useful when you want to apply a new speaker map, re-render Markdown, or generate a fresh bundle without retranscribing.
Use when you need AssemblyAI Speech Understanding on an existing transcript: translation speaker identification custom formatting This command fetches the transcript, merges in the returned understanding results, then renders updated Markdown / agent JSON / bundle outputs.
Use when the user wants: summaries extraction structured JSON downstream reasoning over the transcript Prefer --schema when the next step is automated.
--bundle-dir writes a directory containing: Markdown transcript agent JSON raw JSON optional paragraphs / sentences / subtitles a machine-readable manifest This is usually better than dumping everything to stdout.
Use --export to choose the main output: markdown (default) agent-json json / raw-json text paragraphs sentences srt vtt manifest
You can request extra files directly with: --markdown-out --agent-json-out --raw-json-out --paragraphs-out --sentences-out --srt-out --vtt-out --understanding-json-out
Speaker display names are merged in this order: manual --speaker-map AssemblyAI speaker identification mapping fallback generic names like Speaker A or Channel 1 This means you can let AssemblyAI identify speakers first, then still override individual display names later. Example manual map file: assets/speaker-map.example.json
Before choosing parameters, inspect the bundled reference data: node {baseDir}/assemblyai.mjs models node {baseDir}/assemblyai.mjs models --format json node {baseDir}/assemblyai.mjs languages --model universal-3-pro node {baseDir}/assemblyai.mjs languages --model universal-2 --codes --format json The bundled data lives in: assets/model-capabilities.json assets/language-codes.json
Keep API keys out of chat logs; use environment injection. Use the EU AssemblyAI base URL when the user explicitly needs EU processing. Uploads and transcript creation must use API keys from the same AssemblyAI project. Prefer --bundle-dir or --out for long outputs. The CLI is non-interactive and sends diagnostics to stderr, which makes it easier for agents to script reliably. Use raw --config or --request when you need a newly added AssemblyAI parameter that this skill has not exposed yet.
Read these when you need more depth: Capabilities Workflows and recipes Output formats Speaker mapping LLM Gateway notes Troubleshooting
assemblyai.mjs β root wrapper for compatibility with the original skill scripts/assemblyai.mjs β main CLI assets/speaker-map.example.json assets/speaker-profiles-name.example.json assets/speaker-profiles-role.example.json assets/custom-spelling.example.json assets/llm-json-schema.example.json assets/transcript-agent-json-schema.json
Did you pick the right region (api.assemblyai.com vs api.eu.assemblyai.com)? Did you choose a model strategy that matches the language situation? If speaker naming matters, did you enable diarisation and/or provide a speaker map? If the result will feed another agent, did you produce Markdown and/or agent JSON rather than only raw stdout? If the transcript will be machine-consumed, did you keep the manifest or explicit output filenames?
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.