{
  "schemaVersion": "1.0",
  "item": {
    "slug": "mlx-local-inference",
    "name": "MLX Local Inference Stack",
    "source": "tencent",
    "type": "skill",
    "category": "通讯协作",
    "sourceUrl": "https://clawhub.ai/bendusy/mlx-local-inference",
    "canonicalUrl": "https://clawhub.ai/bendusy/mlx-local-inference",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/mlx-local-inference",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mlx-local-inference",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "README.md",
      "README_CN.md",
      "SKILL.md",
      "references/asr-qwen3.md",
      "references/asr-whisper.md",
      "references/embedding-qwen3.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-23T16:43:11.935Z",
      "expiresAt": "2026-04-30T16:43:11.935Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=4claw-imageboard",
        "contentDisposition": "attachment; filename=\"4claw-imageboard-1.0.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/mlx-local-inference"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/mlx-local-inference",
    "agentPageUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "MLX Local Inference Stack",
        "body": "Full local AI inference on Apple Silicon Macs. All services expose OpenAI-compatible APIs."
      },
      {
        "title": "Services Overview",
        "body": "ServicePortAccessModelsLLM + Whisper + Embedding8787LAN (0.0.0.0)qwen3-14b, gemma-3-12b, whisper-large-v3-turbo, qwen3-embedding-0.6b/4bASR (Qwen3-ASR)8788localhost onlyQwen3-ASR-1.7B-8bitTranscribe Daemon—file-basedUses ASR + LLM\n\nLaunchAgents: com.mlx-server (8787), com.mlx-audio-server (8788), com.mlx-transcribe-daemon"
      },
      {
        "title": "Models",
        "body": "Model IDParamsBest Forqwen3-14b14B 4bitChinese, deep reasoning (built-in think mode)gemma-3-12b12B 4bitEnglish, code generation"
      },
      {
        "title": "API",
        "body": "curl -X POST http://localhost:8787/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"qwen3-14b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}],\n    \"temperature\": 0.7,\n    \"max_tokens\": 2048\n  }'\n\nAdd \"stream\": true for streaming."
      },
      {
        "title": "Python",
        "body": "from openai import OpenAI\nclient = OpenAI(base_url=\"http://localhost:8787/v1\", api_key=\"unused\")\nresponse = client.chat.completions.create(\n    model=\"qwen3-14b\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n    temperature=0.7, max_tokens=2048\n)\nprint(response.choices[0].message.content)"
      },
      {
        "title": "Qwen3 Think Mode",
        "body": "Qwen3 may include <think>...</think> chain-of-thought tags. Strip them:\n\nimport re\ntext = re.sub(r'<think>.*?</think>\\s*', '', text, flags=re.DOTALL)"
      },
      {
        "title": "Model Selection Guide",
        "body": "ScenarioRecommendedChinese textqwen3-14bCantoneseqwen3-14bEnglish writinggemma-3-12bCode generationEitherDeep reasoningqwen3-14b (think mode)Quick Q&Agemma-3-12b"
      },
      {
        "title": "Qwen3-ASR (best for Chinese/Cantonese)",
        "body": "curl -X POST http://127.0.0.1:8788/v1/audio/transcriptions \\\n  -F \"file=@audio.wav\" \\\n  -F \"model=mlx-community/Qwen3-ASR-1.7B-8bit\" \\\n  -F \"language=zh\""
      },
      {
        "title": "Whisper (multilingual, 99 languages)",
        "body": "curl -X POST http://localhost:8787/v1/audio/transcriptions \\\n  -F \"file=@audio.wav\" \\\n  -F \"model=whisper-large-v3-turbo\""
      },
      {
        "title": "ASR Model Comparison",
        "body": "Qwen3-ASR (port 8788)Whisper (port 8787)Chinese/CantoneseStrongAverageMultilingualNoYes (99 langs)LAN accessNo (localhost)YesLoadingOn-demandAlways loaded"
      },
      {
        "title": "Supported audio formats",
        "body": "wav, mp3, m4a, flac, ogg, webm"
      },
      {
        "title": "Long audio",
        "body": "Split into 10-min chunks first:\n\nffmpeg -y -ss 0 -t 600 -i long.wav -ar 16000 -ac 1 chunk_000.wav"
      },
      {
        "title": "Models",
        "body": "Model IDSizeUse Caseqwen3-embedding-0.6b0.6B 4bitFast retrieval, low latencyqwen3-embedding-4b4B 4bitHigh-accuracy semantic matching"
      },
      {
        "title": "API",
        "body": "curl -X POST http://localhost:8787/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3-embedding-0.6b\", \"input\": \"text to embed\"}'"
      },
      {
        "title": "Batch",
        "body": "curl -X POST http://localhost:8787/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3-embedding-4b\", \"input\": [\"text 1\", \"text 2\"]}'"
      },
      {
        "title": "Default Model: PaddleOCR-VL-1.5-6bit",
        "body": "ItemValueModel IDpaddleocr-vl-6bitSpeed~185 t/sMemory~3.3 GBPromptOCR:"
      },
      {
        "title": "CLI",
        "body": "cd ~/.mlx-server/venv\npython -m mlx_vlm.generate \\\n  --model mlx-community/PaddleOCR-VL-1.5-6bit \\\n  --image image.jpg \\\n  --prompt \"OCR:\" \\\n  --max-tokens 512 --temp 0.0"
      },
      {
        "title": "Python",
        "body": "from mlx_vlm import generate, load\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel, processor = load(\"mlx-community/PaddleOCR-VL-1.5-6bit\")\nconfig = load_config(\"mlx-community/PaddleOCR-VL-1.5-6bit\")\nprompt = apply_chat_template(processor, config, \"OCR:\", num_images=1)\nout = generate(model, processor, prompt, \"image.jpg\",\n               max_tokens=512, temperature=0.0, verbose=False)\nprint(out.text if hasattr(out, \"text\") else out)"
      },
      {
        "title": "Notes",
        "body": "Prompt must be exactly OCR: for PaddleOCR-VL\ntemperature=0.0 for deterministic output\nRGBA images must be converted to RGB first\nVenv: ~/.mlx-server/venv"
      },
      {
        "title": "Model: Qwen3-TTS (cached, not auto-served)",
        "body": "ItemValueModelQwen3-TTS-12Hz-1.7B-CustomVoice-8bitMemory~2GBFeatureCustom voice cloning"
      },
      {
        "title": "CLI",
        "body": "~/.mlx-server/venv/bin/mlx_audio.tts.generate \\\n  --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \\\n  --text \"你好，这是一段测试语音\""
      },
      {
        "title": "As API (via mlx_audio.server on port 8788)",
        "body": "curl -X POST http://127.0.0.1:8788/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit\",\n    \"input\": \"你好世界\"\n  }' --output speech.wav"
      },
      {
        "title": "6. Transcribe Daemon — Automatic Batch Transcription",
        "body": "Drop audio files into ~/transcribe/ for automatic processing:\n\nDaemon detects file (polls every 15s)\nPhase 1: Transcribe via Qwen3-ASR → filename_raw.md\nPhase 2: Correct via Qwen3-14B LLM → filename_corrected.md\nMove results to ~/transcribe/done/"
      },
      {
        "title": "LLM Correction Rules",
        "body": "Fix homophone errors (的/得/地, 在/再)\nPreserve Cantonese characters (嘅、唔、咁、喺、冇、佢)\nAdd punctuation and paragraphs\nRemove filler words"
      },
      {
        "title": "Supported formats",
        "body": "wav, mp3, m4a, flac, ogg, webm"
      },
      {
        "title": "Service Management",
        "body": "# LLM + Whisper + Embedding server (port 8787)\nlaunchctl kickstart -k gui/$(id -u)/com.mlx-server\n\n# ASR server (port 8788)\nlaunchctl kickstart -k gui/$(id -u)/com.mlx-audio-server\n\n# Transcribe daemon\nlaunchctl kickstart gui/$(id -u)/com.mlx-transcribe-daemon\n\n# Logs\ntail -f ~/.mlx-server/logs/server.log\ntail -f ~/.mlx-server/logs/mlx-audio-server.err.log\ntail -f ~/.mlx-server/logs/transcribe-daemon.err.log"
      },
      {
        "title": "Requirements",
        "body": "Apple Silicon Mac (M1/M2/M3/M4)\nPython 3.10+ with mlx, mlx-lm, mlx-audio, mlx-vlm\nRecommended: 32GB+ RAM for running multiple models"
      }
    ],
    "body": "MLX Local Inference Stack\n\nFull local AI inference on Apple Silicon Macs. All services expose OpenAI-compatible APIs.\n\nServices Overview\nService\tPort\tAccess\tModels\nLLM + Whisper + Embedding\t8787\tLAN (0.0.0.0)\tqwen3-14b, gemma-3-12b, whisper-large-v3-turbo, qwen3-embedding-0.6b/4b\nASR (Qwen3-ASR)\t8788\tlocalhost only\tQwen3-ASR-1.7B-8bit\nTranscribe Daemon\t—\tfile-based\tUses ASR + LLM\n\nLaunchAgents: com.mlx-server (8787), com.mlx-audio-server (8788), com.mlx-transcribe-daemon\n\n1. LLM — Local Chat Completions\nModels\nModel ID\tParams\tBest For\nqwen3-14b\t14B 4bit\tChinese, deep reasoning (built-in think mode)\ngemma-3-12b\t12B 4bit\tEnglish, code generation\nAPI\ncurl -X POST http://localhost:8787/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"qwen3-14b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}],\n    \"temperature\": 0.7,\n    \"max_tokens\": 2048\n  }'\n\n\nAdd \"stream\": true for streaming.\n\nPython\nfrom openai import OpenAI\nclient = OpenAI(base_url=\"http://localhost:8787/v1\", api_key=\"unused\")\nresponse = client.chat.completions.create(\n    model=\"qwen3-14b\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n    temperature=0.7, max_tokens=2048\n)\nprint(response.choices[0].message.content)\n\nQwen3 Think Mode\n\nQwen3 may include <think>...</think> chain-of-thought tags. Strip them:\n\nimport re\ntext = re.sub(r'<think>.*?</think>\\s*', '', text, flags=re.DOTALL)\n\nModel Selection Guide\nScenario\tRecommended\nChinese text\tqwen3-14b\nCantonese\tqwen3-14b\nEnglish writing\tgemma-3-12b\nCode generation\tEither\nDeep reasoning\tqwen3-14b (think mode)\nQuick Q&A\tgemma-3-12b\n2. ASR — Speech-to-Text\nQwen3-ASR (best for Chinese/Cantonese)\ncurl -X POST http://127.0.0.1:8788/v1/audio/transcriptions \\\n  -F \"file=@audio.wav\" \\\n  -F \"model=mlx-community/Qwen3-ASR-1.7B-8bit\" \\\n  -F \"language=zh\"\n\nWhisper (multilingual, 99 languages)\ncurl -X POST http://localhost:8787/v1/audio/transcriptions \\\n  -F \"file=@audio.wav\" \\\n  -F \"model=whisper-large-v3-turbo\"\n\nASR Model Comparison\n\tQwen3-ASR (port 8788)\tWhisper (port 8787)\nChinese/Cantonese\tStrong\tAverage\nMultilingual\tNo\tYes (99 langs)\nLAN access\tNo (localhost)\tYes\nLoading\tOn-demand\tAlways loaded\nSupported audio formats\n\nwav, mp3, m4a, flac, ogg, webm\n\nLong audio\n\nSplit into 10-min chunks first:\n\nffmpeg -y -ss 0 -t 600 -i long.wav -ar 16000 -ac 1 chunk_000.wav\n\n3. Embeddings — Text Vectorization\nModels\nModel ID\tSize\tUse Case\nqwen3-embedding-0.6b\t0.6B 4bit\tFast retrieval, low latency\nqwen3-embedding-4b\t4B 4bit\tHigh-accuracy semantic matching\nAPI\ncurl -X POST http://localhost:8787/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3-embedding-0.6b\", \"input\": \"text to embed\"}'\n\nBatch\ncurl -X POST http://localhost:8787/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3-embedding-4b\", \"input\": [\"text 1\", \"text 2\"]}'\n\n4. OCR — Image Text Extraction\nDefault Model: PaddleOCR-VL-1.5-6bit\nItem\tValue\nModel ID\tpaddleocr-vl-6bit\nSpeed\t~185 t/s\nMemory\t~3.3 GB\nPrompt\tOCR:\nCLI\ncd ~/.mlx-server/venv\npython -m mlx_vlm.generate \\\n  --model mlx-community/PaddleOCR-VL-1.5-6bit \\\n  --image image.jpg \\\n  --prompt \"OCR:\" \\\n  --max-tokens 512 --temp 0.0\n\nPython\nfrom mlx_vlm import generate, load\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel, processor = load(\"mlx-community/PaddleOCR-VL-1.5-6bit\")\nconfig = load_config(\"mlx-community/PaddleOCR-VL-1.5-6bit\")\nprompt = apply_chat_template(processor, config, \"OCR:\", num_images=1)\nout = generate(model, processor, prompt, \"image.jpg\",\n               max_tokens=512, temperature=0.0, verbose=False)\nprint(out.text if hasattr(out, \"text\") else out)\n\nNotes\nPrompt must be exactly OCR: for PaddleOCR-VL\ntemperature=0.0 for deterministic output\nRGBA images must be converted to RGB first\nVenv: ~/.mlx-server/venv\n5. TTS — Text-to-Speech\nModel: Qwen3-TTS (cached, not auto-served)\nItem\tValue\nModel\tQwen3-TTS-12Hz-1.7B-CustomVoice-8bit\nMemory\t~2GB\nFeature\tCustom voice cloning\nCLI\n~/.mlx-server/venv/bin/mlx_audio.tts.generate \\\n  --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \\\n  --text \"你好，这是一段测试语音\"\n\nAs API (via mlx_audio.server on port 8788)\ncurl -X POST http://127.0.0.1:8788/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit\",\n    \"input\": \"你好世界\"\n  }' --output speech.wav\n\n6. Transcribe Daemon — Automatic Batch Transcription\n\nDrop audio files into ~/transcribe/ for automatic processing:\n\nDaemon detects file (polls every 15s)\nPhase 1: Transcribe via Qwen3-ASR → filename_raw.md\nPhase 2: Correct via Qwen3-14B LLM → filename_corrected.md\nMove results to ~/transcribe/done/\nLLM Correction Rules\nFix homophone errors (的/得/地, 在/再)\nPreserve Cantonese characters (嘅、唔、咁、喺、冇、佢)\nAdd punctuation and paragraphs\nRemove filler words\nSupported formats\n\nwav, mp3, m4a, flac, ogg, webm\n\nService Management\n# LLM + Whisper + Embedding server (port 8787)\nlaunchctl kickstart -k gui/$(id -u)/com.mlx-server\n\n# ASR server (port 8788)\nlaunchctl kickstart -k gui/$(id -u)/com.mlx-audio-server\n\n# Transcribe daemon\nlaunchctl kickstart gui/$(id -u)/com.mlx-transcribe-daemon\n\n# Logs\ntail -f ~/.mlx-server/logs/server.log\ntail -f ~/.mlx-server/logs/mlx-audio-server.err.log\ntail -f ~/.mlx-server/logs/transcribe-daemon.err.log\n\nRequirements\nApple Silicon Mac (M1/M2/M3/M4)\nPython 3.10+ with mlx, mlx-lm, mlx-audio, mlx-vlm\nRecommended: 32GB+ RAM for running multiple models"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/bendusy/mlx-local-inference",
    "publisherUrl": "https://clawhub.ai/bendusy/mlx-local-inference",
    "owner": "bendusy",
    "version": "2.2.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/mlx-local-inference",
    "downloadUrl": "https://openagent3.xyz/downloads/mlx-local-inference",
    "agentUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mlx-local-inference/agent.md"
  }
}