{
  "schemaVersion": "1.0",
  "item": {
    "slug": "clack",
    "name": "Clack",
    "source": "tencent",
    "type": "skill",
    "category": "通讯协作",
    "sourceUrl": "https://clawhub.ai/fbn3799/clack",
    "canonicalUrl": "https://clawhub.ai/fbn3799/clack",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/clack",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=clack",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "server.py",
      "CHANGELOG.md",
      "README.md",
      "SKILL.md",
      "scripts/setup.sh",
      "scripts/clack.sh"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-23T16:43:11.935Z",
      "expiresAt": "2026-04-30T16:43:11.935Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
        "contentDisposition": "attachment; filename=\"4claw-imageboard-1.0.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/clack"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/clack",
    "agentPageUrl": "https://openagent3.xyz/skills/clack/agent",
    "manifestUrl": "https://openagent3.xyz/skills/clack/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/clack/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Clack",
        "body": "WebSocket relay server that enables real-time voice conversations with an OpenClaw agent.\n\nFlow: Client audio (PCM 16kHz/16-bit/mono) → STT → OpenClaw Gateway → TTS → PCM audio back to client.\n\nPer-session provider selection: The client can independently choose STT and TTS providers per call — any combination of on-device (Apple speech frameworks) and server-side providers (ElevenLabs, OpenAI, Deepgram). The server auto-detects all available providers based on configured API keys and exposes them via /info."
      },
      {
        "title": "Prerequisites",
        "body": "Python 3.10+\nAPI key for at least one provider (ElevenLabs, OpenAI, or Deepgram) — not needed for local speech mode\nOpenClaw Gateway with chatCompletions endpoint enabled\nRoot/sudo access (for systemd)\nSecure connection: Domain with SSL (recommended) or Tailscale"
      },
      {
        "title": "Setup",
        "body": "Run the setup script. It creates a venv, installs deps, prompts for API keys, configures a systemd service, and optionally sets up SSL.\n\nsudo bash scripts/setup.sh\n\nThe script auto-detects your OpenClaw gateway config and interactively prompts for provider API keys (ElevenLabs, OpenAI, Deepgram — all optional). On re-runs, existing keys can be kept, updated, or deleted."
      },
      {
        "title": "Options",
        "body": "bash scripts/setup.sh [--port 9878] [--domain clack.example.com]\n\nFlagDefaultDescription--port9878Relay server port--domain(none)Domain for SSL setup (enables WSS)"
      },
      {
        "title": "Connection modes",
        "body": "All connections are encrypted. The app supports two modes:\n\nDomain with SSL (recommended):\n\nbash scripts/setup.sh --domain clack.yourdomain.com\n# → wss://clack.yourdomain.com/voice\n\nRequires a DNS A record pointing the domain to your server IP. The setup script auto-configures SSL via Caddy. You can use a free domain from DuckDNS or your own.\n\nTailscale:\n\n# Install Tailscale on your server, then connect from the app using your Tailscale IP\n# → ws://100.x.x.x:9878/voice (encrypted at network level)\n\nNo domain or SSL setup needed. Tailscale encrypts all traffic at the network layer. Install Tailscale on both your server and phone, then use the server's Tailscale IP in the app.\n\nSecurity note: Port 9878 should be firewalled from the public internet. Only allow access via localhost (for Caddy reverse proxy) and Tailscale. The app does not support unencrypted public connections."
      },
      {
        "title": "Enable OpenClaw Gateway endpoint",
        "body": "The gateway must have chatCompletions enabled. Apply this config patch:\n\n{\"http\": {\"endpoints\": {\"chatCompletions\": {\"enabled\": true}}}}"
      },
      {
        "title": "Management",
        "body": "clack status     # Check service status\nclack restart    # Restart the server\nclack logs       # Tail logs\nclack pair       # Generate a new pairing code\nclack update     # Pull latest code and restart\nclack setup      # Re-run interactive setup (add SSL later, update keys, etc.)\nclack uninstall  # Remove service and venv"
      },
      {
        "title": "Client App",
        "body": "📱 iOS — Available on the App Store (or build from source at github.com/fbn3799/clack-app)\n🤖 Android — Coming soon!"
      },
      {
        "title": "Authentication",
        "body": "All endpoints except GET /health and POST /pair require a valid auth token (RELAY_AUTH_TOKEN). Tokens are verified using constant-time HMAC comparison to prevent timing attacks."
      },
      {
        "title": "Pairing System",
        "body": "6-character alphanumeric one-time codes (~2.1 billion combinations)\nCodes expire after 5 minutes (TTL) and are single-use\nRate limited: 5 attempts per IP per 5 minutes — returns HTTP 429 after\n2-second delay on failed attempts to slow brute force\nGenerating a code requires the admin auth token (GET /pair)\nRedeeming a code is public but rate-limited (POST /pair)"
      },
      {
        "title": "Encrypted Connections",
        "body": "Domain mode: WSS (WebSocket Secure) via Caddy with automatic SSL certificates\nTailscale mode: WireGuard encryption at the network layer\nThe app enforces encrypted connections — no unencrypted public access\nPort 9878 should be firewalled; only accessible via localhost and Tailscale"
      },
      {
        "title": "Input Sanitization",
        "body": "All user-facing text inputs are sanitized before processing:\n\nVoice transcripts: Capped at 300 characters (CLACK_MAX_INPUT_CHARS), echo detection filters feedback loops, hallucination detection discards nonsense STT output\nUser context: Stripped to natural-language characters only (letters, numbers, common punctuation, whitespace). Control characters, escape sequences, and non-printable characters are removed. Capped at 1000 characters. Context is wrapped in explicit delimiters before injection into the system prompt.\nNo shell execution: All external communication uses structured HTTP/WebSocket APIs. No user input is ever passed to a shell."
      },
      {
        "title": "Data Privacy",
        "body": "No analytics, tracking, or telemetry\nVoice audio goes to your server and only to the providers you choose\nThe iOS app stores only settings locally (server address, token, preferences)\nThird-party API usage depends on your provider config (ElevenLabs, OpenAI, Deepgram)"
      },
      {
        "title": "Session Routing",
        "body": "Each voice call creates a clack:<uuid> session in OpenClaw. These are small, isolated sessions — one per call — so voice conversations don't pollute your main agent context."
      },
      {
        "title": "Session Picker",
        "body": "The session picker in the iOS app provides context injection only. When you select a session key, it is added as text context to the LLM prompt — it does not change routing. All voice calls still create their own clack:<uuid> session."
      },
      {
        "title": "User Context",
        "body": "Users can provide persistent context that gets injected into the system prompt for every voice call. This lets the AI know about the user's preferences, notes, or any background information."
      },
      {
        "title": "How to set context",
        "body": "App text field: In the Clack app under Settings → Context, enter free-form text\nSession picker: Select an OpenClaw session to inject its content as context\nWebSocket message: Send {\"type\": \"set_context\", \"text\": \"...\"} during a voice session\nHTTP API: PUT /context?token=...&text=... or POST /context with JSON body {\"text\": \"...\"}\n\nContext is sanitized before saving — only natural-language characters are kept (letters, numbers, common punctuation). IP addresses and domains are stripped. The server returns the sanitized text in the response so the app can show the user exactly what will be sent as context.\n\nContext persists across calls and server restarts. Clear it via DELETE /context or by sending an empty set_context message."
      },
      {
        "title": "Conversation History",
        "body": "The relay maintains a shared history file across calls for continuity. History is stored as JSON in CLACK_HISTORY_DIR (default: /var/lib/clack/history).\n\nMax messages: 50 (configurable via CLACK_MAX_HISTORY)\nHistory persists across calls and server restarts\nViewable via GET /history, clearable via DELETE /history"
      },
      {
        "title": "Echo Test Mode",
        "body": "For testing audio round-trips without using LLM credits:\n\nServer-wide: Set CLACK_ECHO_MODE=true environment variable\nPer-session: Send {\"type\":\"start\",\"config\":{\"echo\":true}} from the client\n\nIn echo mode, transcribed text is echoed back through TTS instead of being sent to the LLM. Audio is peak-normalized with capped gain to ensure consistent playback volume."
      },
      {
        "title": "Provider Selection",
        "body": "STT and TTS providers can be configured independently per session. The server auto-detects all available providers at startup based on which API keys are set (ELEVENLABS_API_KEY, OPENAI_API_KEY, DEEPGRAM_API_KEY)."
      },
      {
        "title": "Available modes per direction (STT / TTS):",
        "body": "On-device (local): Uses Apple's built-in speech frameworks. Zero API costs.\nServer provider: ElevenLabs, OpenAI, or Deepgram — whichever keys are configured."
      },
      {
        "title": "How it works:",
        "body": "App fetches GET /info to discover available providers\nUser picks STT and TTS providers independently in Settings → Voice\nOn call start, the app sends sttProvider and ttsProvider in the session config\nServer creates the appropriate provider instances per session"
      },
      {
        "title": "Example combinations:",
        "body": "STTTTSUse caseElevenLabsElevenLabsFull cloud — best qualityOn-deviceElevenLabsSave STT costs, keep premium voicesOn-deviceOn-deviceFully local — zero API usage, works offlineOpenAIDeepgramMix providers freely\n\nCost optimization: Use on-device STT (free, unlimited) with a premium cloud TTS voice — get great output quality while eliminating transcription costs entirely. Or go fully on-device for zero API spend."
      },
      {
        "title": "Text input mode",
        "body": "When STT is set to on-device, the client sends transcribed text instead of audio:\n\n{\"type\": \"text_input\", \"text\": \"What's the weather like?\"}\n\nWhen TTS is set to on-device, the server returns response_text only and skips audio synthesis."
      },
      {
        "title": "AI Response Rules",
        "body": "Responses are enforced to 1–3 sentences for natural voice conversation\nServer-side max_tokens: 150 to prevent runaway responses\nServer-side max input: 300 characters (CLACK_MAX_INPUT_CHARS) — transcripts exceeding this are truncated"
      },
      {
        "title": "HTTP Endpoints",
        "body": "EndpointMethodAuthDescriptionGET /healthGETNoHealth check — returns service statusPOST /pairPOSTNoRedeem pairing code → get auth token (rate-limited)GET /pairGETYesGenerate one-time pairing codeGET /infoGETYesServer info: agent name, available STT/TTS providersGET /voicesGETYesList available TTS voicesGET /sessionsGETYesList active sessionsGET /historyGETYesGet conversation historyDELETE /historyDELETEYesClear conversation historyGET /contextGETYesGet current user contextPUT /contextPUTYesSet user context (query param text)POST /contextPOSTYesSet user context (JSON body {\"text\": \"...\"})DELETE /contextDELETEYesClear user contextWebSocket /voiceWSYesVoice relay connection"
      },
      {
        "title": "WebSocket Protocol",
        "body": "Endpoint: ws://<host>:<port>/voice?token=<RELAY_AUTH_TOKEN>"
      },
      {
        "title": "Client → Server",
        "body": "MessageFormatDescription{\"type\":\"start\",\"config\":{...}}JSONStart session. Config: voice, systemPrompt, echo, sttProvider, ttsProviderBinary framesbytesRaw PCM audio (16kHz, 16-bit, mono){\"type\":\"text_input\",\"text\":\"...\"}JSONLocal speech mode — send text directly{\"type\":\"end_speech\"}JSONSignal end of speech, triggers processing{\"type\":\"interrupt\"}JSONCancel current TTS playback{\"type\":\"ping\"}JSONKeepalive{\"type\":\"set_context\",\"text\":\"...\"}JSONSet user context (sanitized before saving){\"type\":\"auth\",\"token\":\"...\"}JSONAuthenticate (alternative to query param)"
      },
      {
        "title": "Server → Client",
        "body": "MessageFormatDescription{\"type\":\"ready\"}JSONSession ready{\"type\":\"auth_ok\"} / {\"type\":\"auth_failed\"}JSONAuth result{\"type\":\"processing\",\"stage\":\"...\"}JSONStage: transcribing, thinking, speaking, filtered{\"type\":\"transcript\",\"text\":\"...\",\"final\":true}JSONSTT result{\"type\":\"response_text\",\"text\":\"...\"}JSONLLM text response{\"type\":\"response_start\",\"format\":\"pcm_16000\"}JSONAudio stream startingBinary framesbytesTTS audio (PCM 16kHz, 16-bit, mono){\"type\":\"response_end\"}JSONAudio stream done{\"type\":\"tts_cancelled\"}JSONTTS playback was interrupted{\"type\":\"context_updated\",\"text\":\"...\"}JSONContext saved — text contains the sanitized version{\"type\":\"context_cleared\"}JSONContext was cleared"
      },
      {
        "title": "Features",
        "body": "Multi-provider STT/TTS: ElevenLabs, OpenAI, and Deepgram support\nIndependent voice input/output configuration: Choose STT and TTS providers separately — full control over how your voice is transcribed and how the AI speaks back\nOn-device speech: Apple speech frameworks for STT and/or TTS — zero API costs, mix with cloud providers freely\nCost optimization: Use free on-device transcription with premium cloud voices, or go fully local for zero spend\nVoice response rules: AI responses enforced short (1-3 sentences, max_tokens 150)\nInput length limiting: Configurable max transcript length (default 300 chars)\nConfidence filtering: Low-confidence STT results are discarded\nEcho detection: Prevents feedback loops (TTS → mic → STT)\nEcho test mode: Test audio pipeline without LLM (server-wide or per-session)\nAudio normalization: Peak normalization with capped gain for echo mode playback\nAudio chunking: Long recordings auto-split for reliable transcription\nHallucination detection: Filters repetitive/nonsense STT output\nInterrupt/TTS cancellation: Cancel in-progress TTS for all providers\nPairing system: Rate-limited one-time codes for secure device pairing\nSession isolation: Each call gets its own clack:<uuid> session\nConversation history: Shared across calls, 50 messages max, persistent\nToken auth: Constant-time HMAC verification\nKeepalive pings: Prevents client timeout during long LLM responses\nSilence detection: Default threshold 220, configurable range 20–1000\nAuto-restart: systemd restarts on crash"
      },
      {
        "title": "Voice Configuration",
        "body": "20 built-in ElevenLabs voices available. Default: Will. Pass voice name or ID in session config:\n\n{\"type\": \"start\", \"config\": {\"voice\": \"aria\"}}\n\nAvailable aliases: will, aria, roger, sarah, laura, charlie, george, callum, river, liam, charlotte, alice, matilda, jessica, eric, chris, brian, daniel, lily, bill."
      },
      {
        "title": "Environment Variables",
        "body": "VariableDefaultDescriptionRELAY_AUTH_TOKEN—Required. Client auth token (32-char)OPENCLAW_GATEWAY_URLhttp://127.0.0.1:18789OpenClaw Gateway URLOPENCLAW_GATEWAY_TOKEN—Gateway bearer tokenSTT_PROVIDERelevenlabsSTT provider (elevenlabs, openai, deepgram)TTS_PROVIDERelevenlabsTTS provider (elevenlabs, openai, deepgram)TTS_VOICEWillDefault voice (name or ID)VOICE_RELAY_PORT9878Server portCLACK_ECHO_MODEfalseEnable echo test mode server-wideCLACK_MAX_INPUT_CHARS300Max transcript length (chars)CLACK_HISTORY_DIR/var/lib/clack/historyHistory file storage directoryCLACK_MAX_HISTORY50Max conversation history messagesCLACK_AGENT_NAMEStormAgent name shown in the iOS app\n\nProvider API keys (ELEVENLABS_API_KEY, OPENAI_API_KEY, DEEPGRAM_API_KEY) are stored in config.json with restricted file permissions, not as environment variables. The setup script manages these interactively."
      }
    ],
    "body": "Clack\n\nWebSocket relay server that enables real-time voice conversations with an OpenClaw agent.\n\nFlow: Client audio (PCM 16kHz/16-bit/mono) → STT → OpenClaw Gateway → TTS → PCM audio back to client.\n\nPer-session provider selection: The client can independently choose STT and TTS providers per call — any combination of on-device (Apple speech frameworks) and server-side providers (ElevenLabs, OpenAI, Deepgram). The server auto-detects all available providers based on configured API keys and exposes them via /info.\n\nPrerequisites\nPython 3.10+\nAPI key for at least one provider (ElevenLabs, OpenAI, or Deepgram) — not needed for local speech mode\nOpenClaw Gateway with chatCompletions endpoint enabled\nRoot/sudo access (for systemd)\nSecure connection: Domain with SSL (recommended) or Tailscale\nSetup\n\nRun the setup script. It creates a venv, installs deps, prompts for API keys, configures a systemd service, and optionally sets up SSL.\n\nsudo bash scripts/setup.sh\n\n\nThe script auto-detects your OpenClaw gateway config and interactively prompts for provider API keys (ElevenLabs, OpenAI, Deepgram — all optional). On re-runs, existing keys can be kept, updated, or deleted.\n\nOptions\nbash scripts/setup.sh [--port 9878] [--domain clack.example.com]\n\nFlag\tDefault\tDescription\n--port\t9878\tRelay server port\n--domain\t(none)\tDomain for SSL setup (enables WSS)\nConnection modes\n\nAll connections are encrypted. The app supports two modes:\n\nDomain with SSL (recommended):\n\nbash scripts/setup.sh --domain clack.yourdomain.com\n# → wss://clack.yourdomain.com/voice\n\n\nRequires a DNS A record pointing the domain to your server IP. The setup script auto-configures SSL via Caddy. You can use a free domain from DuckDNS or your own.\n\nTailscale:\n\n# Install Tailscale on your server, then connect from the app using your Tailscale IP\n# → ws://100.x.x.x:9878/voice (encrypted at network level)\n\n\nNo domain or SSL setup needed. Tailscale encrypts all traffic at the network layer. Install Tailscale on both your server and phone, then use the server's Tailscale IP in the app.\n\nSecurity note: Port 9878 should be firewalled from the public internet. Only allow access via localhost (for Caddy reverse proxy) and Tailscale. The app does not support unencrypted public connections.\n\nEnable OpenClaw Gateway endpoint\n\nThe gateway must have chatCompletions enabled. Apply this config patch:\n\n{\"http\": {\"endpoints\": {\"chatCompletions\": {\"enabled\": true}}}}\n\nManagement\nclack status     # Check service status\nclack restart    # Restart the server\nclack logs       # Tail logs\nclack pair       # Generate a new pairing code\nclack update     # Pull latest code and restart\nclack setup      # Re-run interactive setup (add SSL later, update keys, etc.)\nclack uninstall  # Remove service and venv\n\nClient App\n\n📱 iOS — Available on the App Store (or build from source at github.com/fbn3799/clack-app) 🤖 Android — Coming soon!\n\nSecurity\nAuthentication\n\nAll endpoints except GET /health and POST /pair require a valid auth token (RELAY_AUTH_TOKEN). Tokens are verified using constant-time HMAC comparison to prevent timing attacks.\n\nPairing System\n6-character alphanumeric one-time codes (~2.1 billion combinations)\nCodes expire after 5 minutes (TTL) and are single-use\nRate limited: 5 attempts per IP per 5 minutes — returns HTTP 429 after\n2-second delay on failed attempts to slow brute force\nGenerating a code requires the admin auth token (GET /pair)\nRedeeming a code is public but rate-limited (POST /pair)\nEncrypted Connections\nDomain mode: WSS (WebSocket Secure) via Caddy with automatic SSL certificates\nTailscale mode: WireGuard encryption at the network layer\nThe app enforces encrypted connections — no unencrypted public access\nPort 9878 should be firewalled; only accessible via localhost and Tailscale\nInput Sanitization\n\nAll user-facing text inputs are sanitized before processing:\n\nVoice transcripts: Capped at 300 characters (CLACK_MAX_INPUT_CHARS), echo detection filters feedback loops, hallucination detection discards nonsense STT output\nUser context: Stripped to natural-language characters only (letters, numbers, common punctuation, whitespace). Control characters, escape sequences, and non-printable characters are removed. Capped at 1000 characters. Context is wrapped in explicit delimiters before injection into the system prompt.\nNo shell execution: All external communication uses structured HTTP/WebSocket APIs. No user input is ever passed to a shell.\nData Privacy\nNo analytics, tracking, or telemetry\nVoice audio goes to your server and only to the providers you choose\nThe iOS app stores only settings locally (server address, token, preferences)\nThird-party API usage depends on your provider config (ElevenLabs, OpenAI, Deepgram)\nSession Routing\n\nEach voice call creates a clack:<uuid> session in OpenClaw. These are small, isolated sessions — one per call — so voice conversations don't pollute your main agent context.\n\nSession Picker\n\nThe session picker in the iOS app provides context injection only. When you select a session key, it is added as text context to the LLM prompt — it does not change routing. All voice calls still create their own clack:<uuid> session.\n\nUser Context\n\nUsers can provide persistent context that gets injected into the system prompt for every voice call. This lets the AI know about the user's preferences, notes, or any background information.\n\nHow to set context\nApp text field: In the Clack app under Settings → Context, enter free-form text\nSession picker: Select an OpenClaw session to inject its content as context\nWebSocket message: Send {\"type\": \"set_context\", \"text\": \"...\"} during a voice session\nHTTP API: PUT /context?token=...&text=... or POST /context with JSON body {\"text\": \"...\"}\n\nContext is sanitized before saving — only natural-language characters are kept (letters, numbers, common punctuation). IP addresses and domains are stripped. The server returns the sanitized text in the response so the app can show the user exactly what will be sent as context.\n\nContext persists across calls and server restarts. Clear it via DELETE /context or by sending an empty set_context message.\n\nConversation History\n\nThe relay maintains a shared history file across calls for continuity. History is stored as JSON in CLACK_HISTORY_DIR (default: /var/lib/clack/history).\n\nMax messages: 50 (configurable via CLACK_MAX_HISTORY)\nHistory persists across calls and server restarts\nViewable via GET /history, clearable via DELETE /history\nEcho Test Mode\n\nFor testing audio round-trips without using LLM credits:\n\nServer-wide: Set CLACK_ECHO_MODE=true environment variable\nPer-session: Send {\"type\":\"start\",\"config\":{\"echo\":true}} from the client\n\nIn echo mode, transcribed text is echoed back through TTS instead of being sent to the LLM. Audio is peak-normalized with capped gain to ensure consistent playback volume.\n\nProvider Selection\n\nSTT and TTS providers can be configured independently per session. The server auto-detects all available providers at startup based on which API keys are set (ELEVENLABS_API_KEY, OPENAI_API_KEY, DEEPGRAM_API_KEY).\n\nAvailable modes per direction (STT / TTS):\nOn-device (local): Uses Apple's built-in speech frameworks. Zero API costs.\nServer provider: ElevenLabs, OpenAI, or Deepgram — whichever keys are configured.\nHow it works:\nApp fetches GET /info to discover available providers\nUser picks STT and TTS providers independently in Settings → Voice\nOn call start, the app sends sttProvider and ttsProvider in the session config\nServer creates the appropriate provider instances per session\nExample combinations:\nSTT\tTTS\tUse case\nElevenLabs\tElevenLabs\tFull cloud — best quality\nOn-device\tElevenLabs\tSave STT costs, keep premium voices\nOn-device\tOn-device\tFully local — zero API usage, works offline\nOpenAI\tDeepgram\tMix providers freely\n\nCost optimization: Use on-device STT (free, unlimited) with a premium cloud TTS voice — get great output quality while eliminating transcription costs entirely. Or go fully on-device for zero API spend.\n\nText input mode\n\nWhen STT is set to on-device, the client sends transcribed text instead of audio:\n\n{\"type\": \"text_input\", \"text\": \"What's the weather like?\"}\n\n\nWhen TTS is set to on-device, the server returns response_text only and skips audio synthesis.\n\nAI Response Rules\nResponses are enforced to 1–3 sentences for natural voice conversation\nServer-side max_tokens: 150 to prevent runaway responses\nServer-side max input: 300 characters (CLACK_MAX_INPUT_CHARS) — transcripts exceeding this are truncated\nHTTP Endpoints\nEndpoint\tMethod\tAuth\tDescription\nGET /health\tGET\tNo\tHealth check — returns service status\nPOST /pair\tPOST\tNo\tRedeem pairing code → get auth token (rate-limited)\nGET /pair\tGET\tYes\tGenerate one-time pairing code\nGET /info\tGET\tYes\tServer info: agent name, available STT/TTS providers\nGET /voices\tGET\tYes\tList available TTS voices\nGET /sessions\tGET\tYes\tList active sessions\nGET /history\tGET\tYes\tGet conversation history\nDELETE /history\tDELETE\tYes\tClear conversation history\nGET /context\tGET\tYes\tGet current user context\nPUT /context\tPUT\tYes\tSet user context (query param text)\nPOST /context\tPOST\tYes\tSet user context (JSON body {\"text\": \"...\"})\nDELETE /context\tDELETE\tYes\tClear user context\nWebSocket /voice\tWS\tYes\tVoice relay connection\nWebSocket Protocol\n\nEndpoint: ws://<host>:<port>/voice?token=<RELAY_AUTH_TOKEN>\n\nClient → Server\nMessage\tFormat\tDescription\n{\"type\":\"start\",\"config\":{...}}\tJSON\tStart session. Config: voice, systemPrompt, echo, sttProvider, ttsProvider\nBinary frames\tbytes\tRaw PCM audio (16kHz, 16-bit, mono)\n{\"type\":\"text_input\",\"text\":\"...\"}\tJSON\tLocal speech mode — send text directly\n{\"type\":\"end_speech\"}\tJSON\tSignal end of speech, triggers processing\n{\"type\":\"interrupt\"}\tJSON\tCancel current TTS playback\n{\"type\":\"ping\"}\tJSON\tKeepalive\n{\"type\":\"set_context\",\"text\":\"...\"}\tJSON\tSet user context (sanitized before saving)\n{\"type\":\"auth\",\"token\":\"...\"}\tJSON\tAuthenticate (alternative to query param)\nServer → Client\nMessage\tFormat\tDescription\n{\"type\":\"ready\"}\tJSON\tSession ready\n{\"type\":\"auth_ok\"} / {\"type\":\"auth_failed\"}\tJSON\tAuth result\n{\"type\":\"processing\",\"stage\":\"...\"}\tJSON\tStage: transcribing, thinking, speaking, filtered\n{\"type\":\"transcript\",\"text\":\"...\",\"final\":true}\tJSON\tSTT result\n{\"type\":\"response_text\",\"text\":\"...\"}\tJSON\tLLM text response\n{\"type\":\"response_start\",\"format\":\"pcm_16000\"}\tJSON\tAudio stream starting\nBinary frames\tbytes\tTTS audio (PCM 16kHz, 16-bit, mono)\n{\"type\":\"response_end\"}\tJSON\tAudio stream done\n{\"type\":\"tts_cancelled\"}\tJSON\tTTS playback was interrupted\n{\"type\":\"context_updated\",\"text\":\"...\"}\tJSON\tContext saved — text contains the sanitized version\n{\"type\":\"context_cleared\"}\tJSON\tContext was cleared\nFeatures\nMulti-provider STT/TTS: ElevenLabs, OpenAI, and Deepgram support\nIndependent voice input/output configuration: Choose STT and TTS providers separately — full control over how your voice is transcribed and how the AI speaks back\nOn-device speech: Apple speech frameworks for STT and/or TTS — zero API costs, mix with cloud providers freely\nCost optimization: Use free on-device transcription with premium cloud voices, or go fully local for zero spend\nVoice response rules: AI responses enforced short (1-3 sentences, max_tokens 150)\nInput length limiting: Configurable max transcript length (default 300 chars)\nConfidence filtering: Low-confidence STT results are discarded\nEcho detection: Prevents feedback loops (TTS → mic → STT)\nEcho test mode: Test audio pipeline without LLM (server-wide or per-session)\nAudio normalization: Peak normalization with capped gain for echo mode playback\nAudio chunking: Long recordings auto-split for reliable transcription\nHallucination detection: Filters repetitive/nonsense STT output\nInterrupt/TTS cancellation: Cancel in-progress TTS for all providers\nPairing system: Rate-limited one-time codes for secure device pairing\nSession isolation: Each call gets its own clack:<uuid> session\nConversation history: Shared across calls, 50 messages max, persistent\nToken auth: Constant-time HMAC verification\nKeepalive pings: Prevents client timeout during long LLM responses\nSilence detection: Default threshold 220, configurable range 20–1000\nAuto-restart: systemd restarts on crash\nVoice Configuration\n\n20 built-in ElevenLabs voices available. Default: Will. Pass voice name or ID in session config:\n\n{\"type\": \"start\", \"config\": {\"voice\": \"aria\"}}\n\n\nAvailable aliases: will, aria, roger, sarah, laura, charlie, george, callum, river, liam, charlotte, alice, matilda, jessica, eric, chris, brian, daniel, lily, bill.\n\nEnvironment Variables\nVariable\tDefault\tDescription\nRELAY_AUTH_TOKEN\t—\tRequired. Client auth token (32-char)\nOPENCLAW_GATEWAY_URL\thttp://127.0.0.1:18789\tOpenClaw Gateway URL\nOPENCLAW_GATEWAY_TOKEN\t—\tGateway bearer token\nSTT_PROVIDER\televenlabs\tSTT provider (elevenlabs, openai, deepgram)\nTTS_PROVIDER\televenlabs\tTTS provider (elevenlabs, openai, deepgram)\nTTS_VOICE\tWill\tDefault voice (name or ID)\nVOICE_RELAY_PORT\t9878\tServer port\nCLACK_ECHO_MODE\tfalse\tEnable echo test mode server-wide\nCLACK_MAX_INPUT_CHARS\t300\tMax transcript length (chars)\nCLACK_HISTORY_DIR\t/var/lib/clack/history\tHistory file storage directory\nCLACK_MAX_HISTORY\t50\tMax conversation history messages\nCLACK_AGENT_NAME\tStorm\tAgent name shown in the iOS app\n\nProvider API keys (ELEVENLABS_API_KEY, OPENAI_API_KEY, DEEPGRAM_API_KEY) are stored in config.json with restricted file permissions, not as environment variables. The setup script manages these interactively."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/fbn3799/clack",
    "publisherUrl": "https://clawhub.ai/fbn3799/clack",
    "owner": "fbn3799",
    "version": "1.5.3",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/clack",
    "downloadUrl": "https://openagent3.xyz/downloads/clack",
    "agentUrl": "https://openagent3.xyz/skills/clack/agent",
    "manifestUrl": "https://openagent3.xyz/skills/clack/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/clack/agent.md"
  }
}