{
  "schemaVersion": "1.0",
  "item": {
    "slug": "mlx-swift-lm",
    "name": "MLX Swift LM Expert",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "canonicalUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/mlx-swift-lm",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mlx-swift-lm",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "references/embeddings.md",
      "references/training.md",
      "references/tool-calling.md",
      "references/lora-adapters.md",
      "references/model-container.md",
      "references/kv-cache.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/mlx-swift-lm"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/mlx-swift-lm",
    "agentPageUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "1. Overview & Triggers",
        "body": "mlx-swift-lm is a Swift package for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Apple Silicon using MLX. It supports local inference, fine-tuning via LoRA/DoRA, and embeddings generation."
      },
      {
        "title": "When to Use This Skill",
        "body": "Running LLM/VLM inference on macOS/iOS with Apple Silicon\nStreaming text generation from local models\nVision tasks with images/video (VLMs)\nTool calling / function calling with models\nLoRA adapter training and fine-tuning\nText embeddings for RAG/semantic search"
      },
      {
        "title": "Architecture Overview",
        "body": "MLXLMCommon     - Core infrastructure (ModelContainer, ChatSession, KVCache, etc.)\nMLXLLM          - Text-only LLM support (Llama, Qwen, Gemma, Phi, DeepSeek, etc. - examples, not exhaustive)\nMLXVLM          - Vision-Language Models (Qwen2-VL, PaliGemma, Gemma3, etc. - examples, not exhaustive)\nEmbedders       - Embedding models (BGE, Nomic, MiniLM)"
      },
      {
        "title": "2. Key File Reference",
        "body": "PurposeFile PathThread-safe model wrapperLibraries/MLXLMCommon/ModelContainer.swiftSimplified chat APILibraries/MLXLMCommon/ChatSession.swiftGeneration & streamingLibraries/MLXLMCommon/Evaluate.swiftKV cache typesLibraries/MLXLMCommon/KVCache.swiftModel configurationLibraries/MLXLMCommon/ModelConfiguration.swiftChat message typesLibraries/MLXLMCommon/Chat.swiftTool call processingLibraries/MLXLMCommon/Tool/ToolCallFormat.swiftConcurrency utilitiesLibraries/MLXLMCommon/Utilities/SerialAccessContainer.swiftLLM factory & registryLibraries/MLXLLM/LLMModelFactory.swiftVLM factory & registryLibraries/MLXVLM/VLMModelFactory.swiftLoRA configurationLibraries/MLXLMCommon/Adapters/LoRA/LoRAContainer.swiftLoRA trainingLibraries/MLXLLM/LoraTrain.swift"
      },
      {
        "title": "LLM Chat (Simplest API)",
        "body": "import MLXLLM\nimport MLXLMCommon\n\n// Load model (downloads from HuggingFace automatically)\nlet modelContainer = try await LLMModelFactory.shared.loadContainer(\n    configuration: .init(id: \"mlx-community/Qwen3-4B-4bit\")\n)\n\n// Create chat session\nlet session = ChatSession(modelContainer)\n\n// Single response\nlet response = try await session.respond(to: \"What is Swift?\")\nprint(response)\n\n// Streaming response\nfor try await chunk in session.streamResponse(to: \"Explain concurrency\") {\n    print(chunk, terminator: \"\")\n}"
      },
      {
        "title": "VLM with Image",
        "body": "import MLXVLM\nimport MLXLMCommon\n\nlet modelContainer = try await VLMModelFactory.shared.loadContainer(\n    configuration: .init(id: \"mlx-community/Qwen2-VL-2B-Instruct-4bit\")\n)\n\nlet session = ChatSession(modelContainer)\n\n// With image (video is also an optional parameter)\nlet image = UserInput.Image.url(imageURL)\nlet response = try await session.respond(\n    to: \"Describe this image\",\n    image: image,\n    video: nil  // Optional video parameter\n)"
      },
      {
        "title": "Embeddings",
        "body": "import Embedders\n\n// Note: Embedders uses loadModelContainer() helper (not a factory pattern)\nlet container = try await loadModelContainer(\n    configuration: ModelConfiguration(id: \"mlx-community/bge-small-en-v1.5-mlx\")\n)\n\nlet embeddings = await container.perform { model, tokenizer, pooler in\n    let tokens = tokenizer.encode(text: \"Hello world\")\n    let input = MLXArray(tokens).expandedDimensions(axis: 0)\n    let output = model(input)\n    let pooled = pooler(output, normalize: true)\n    eval(pooled)\n    return pooled\n}"
      },
      {
        "title": "ChatSession API (Recommended)",
        "body": "ChatSession manages conversation history and KV cache automatically:\n\nlet session = ChatSession(\n    modelContainer,\n    instructions: \"You are a helpful assistant\",  // System prompt\n    generateParameters: GenerateParameters(\n        maxTokens: 500,\n        temperature: 0.7\n    )\n)\n\n// Multi-turn conversation (history preserved automatically)\nlet r1 = try await session.respond(to: \"What is 2+2?\")\nlet r2 = try await session.respond(to: \"And if you multiply that by 3?\")\n\n// Clear session to start fresh\nawait session.clear()"
      },
      {
        "title": "Streaming with generate()",
        "body": "For lower-level control, use generate() directly:\n\nlet input = try await modelContainer.prepare(input: UserInput(prompt: .text(\"Hello\")))\nlet stream = try await modelContainer.generate(input: input, parameters: GenerateParameters())\n\nfor await generation in stream {\n    switch generation {\n    case .chunk(let text):\n        print(text, terminator: \"\")\n    case .info(let info):\n        print(\"\\n\\(info.tokensPerSecond) tok/s\")\n    case .toolCall(let call):\n        // Handle tool call\n        break\n    }\n}"
      },
      {
        "title": "Tool Calling",
        "body": "// 1. Define tool\nstruct WeatherInput: Codable { let location: String }\nstruct WeatherOutput: Codable { let temperature: Double; let conditions: String }\n\nlet weatherTool = Tool<WeatherInput, WeatherOutput>(\n    name: \"get_weather\",\n    description: \"Get current weather\",\n    parameters: [.required(\"location\", type: .string, description: \"City name\")]\n) { input in\n    WeatherOutput(temperature: 22.0, conditions: \"Sunny\")\n}\n\n// 2. Include tool schema in request\nlet input = UserInput(\n    prompt: .text(\"What's the weather in Tokyo?\"),\n    tools: [weatherTool.schema]\n)\n\n// 3. Handle tool calls in generation stream\nfor await generation in try await modelContainer.generate(input: input, parameters: params) {\n    switch generation {\n    case .chunk(let text): print(text)\n    case .toolCall(let call):\n        let result = try await call.execute(with: weatherTool)\n        print(\"Weather: \\(result.conditions)\")\n    case .info: break\n    }\n}\n\nSee references/tool-calling.md for multi-turn and feeding results back."
      },
      {
        "title": "GenerateParameters",
        "body": "let params = GenerateParameters(\n    maxTokens: 1000,           // nil = unlimited\n    maxKVSize: 4096,           // Sliding window (uses RotatingKVCache)\n    kvBits: 4,                 // Quantized cache (4 or 8 bit)\n    temperature: 0.7,          // 0 = greedy/argmax\n    topP: 0.9,                 // Nucleus sampling\n    repetitionPenalty: 1.1,    // Penalize repeats\n    repetitionContextSize: 20  // Window for penalty\n)"
      },
      {
        "title": "Prompt Caching / History Re-hydration",
        "body": "Restore chat from persisted history:\n\nlet history: [Chat.Message] = [\n    .system(\"You are helpful\"),\n    .user(\"Hello\"),\n    .assistant(\"Hi there!\")\n]\n\nlet session = ChatSession(\n    modelContainer,\n    history: history\n)\n// Continues from this point"
      },
      {
        "title": "Image Input Types",
        "body": "// From URL (file or remote)\nlet image = UserInput.Image.url(fileURL)\n\n// From CIImage\nlet image = UserInput.Image.ciImage(ciImage)\n\n// From MLXArray directly\nlet image = UserInput.Image.array(mlxArray)"
      },
      {
        "title": "Video Input",
        "body": "// From URL (file or remote)\nlet video = UserInput.Video.url(videoURL)\n\n// From AVFoundation asset\nlet video = UserInput.Video.avAsset(avAsset)\n\n// From pre-extracted frames\nlet video = UserInput.Video.frames(videoFrames)\n\nlet response = try await session.respond(\n    to: \"What happens in this video?\",\n    video: video\n)"
      },
      {
        "title": "Multiple Images",
        "body": "let images: [UserInput.Image] = [\n    .url(url1),\n    .url(url2)\n]\n\nlet response = try await session.respond(\n    to: \"Compare these two images\",\n    images: images,\n    videos: []\n)"
      },
      {
        "title": "VLM-Specific Processing",
        "body": "let session = ChatSession(\n    modelContainer,\n    processing: UserInput.Processing(\n        resize: CGSize(width: 512, height: 512)  // Resize images\n    )\n)"
      },
      {
        "title": "DO",
        "body": "// DO: Use ChatSession for multi-turn conversations\nlet session = ChatSession(modelContainer)\n\n// DO: Use AsyncStream APIs (modern, Swift concurrency)\nfor try await chunk in session.streamResponse(to: prompt) { ... }\n\n// DO: Check Task.isCancelled in long-running loops\nfor try await generation in stream {\n    if Task.isCancelled { break }\n    // process generation\n}\n\n// DO: Use ModelContainer.perform() for thread-safe access\nawait modelContainer.perform { context in\n    // Access model, tokenizer safely\n    let tokens = try context.tokenizer.applyChatTemplate(messages: messages)\n    return tokens\n}\n\n// DO: When breaking early from generation, use generateTask() to get a task handle\n// This is the lower-level API used internally by ChatSession\nlet (stream, task) = generateTask(...)  // Returns (AsyncStream, Task)\n\nfor await item in stream {\n    if shouldStop { break }\n}\nawait task.value  // Ensures KV cache cleanup before next generation\n\ngenerateTask() is defined in Evaluate.swift. Most users should use ChatSession which handles this internally."
      },
      {
        "title": "DON'T",
        "body": "// DON'T: Share MLXArray across tasks (not Sendable)\nlet array = MLXArray(...)\nTask { array.sum() }  // Wrong!\n\n// DON'T: Use deprecated callback-based generation\n// Old:\ngenerate(input: input, parameters: params) { tokens in ... }  // Deprecated\n// New:\nfor await generation in try generate(input: input, parameters: params, context: context) { ... }\n\n// DON'T: Use old perform(model, tokenizer) signature\n// Old:\nmodelContainer.perform { model, tokenizer in ... }  // Deprecated\n// New:\nmodelContainer.perform { context in ... }\n\n// DON'T: Forget to eval() MLXArrays before returning from perform()\nawait modelContainer.perform { context in\n    let result = context.model(input)\n    eval(result)  // Required before returning\n    return result.item(Float.self)\n}"
      },
      {
        "title": "Thread Safety",
        "body": "ModelContainer is Sendable and thread-safe\nChatSession is NOT thread-safe (use from single task)\nMLXArray is NOT Sendable - don't pass across isolation boundaries\nUse SendableBox for transferring non-Sendable data in consuming contexts"
      },
      {
        "title": "Memory Management",
        "body": "// For long contexts, use sliding window cache\nlet params = GenerateParameters(maxKVSize: 4096)\n\n// For memory efficiency, use quantized cache\nlet params = GenerateParameters(kvBits: 4)  // or 8\n\n// Clear session cache when done\nawait session.clear()"
      },
      {
        "title": "7. Reference Links",
        "body": "For detailed documentation on specific topics, see:\n\nReferenceWhen to Usereferences/model-container.mdLoading models, ModelContainer API, ModelConfigurationreferences/kv-cache.mdCache types, memory optimization, cache serializationreferences/concurrency.mdThread safety, SerialAccessContainer, async patternsreferences/tool-calling.mdFunction calling, tool formats, ToolCallProcessorreferences/tokenizer-chat.mdTokenizer, Chat.Message, EOS tokensreferences/supported-models.mdModel families, registries, model-specific configreferences/lora-adapters.mdLoRA/DoRA/QLoRA, loading adaptersreferences/training.mdLoRATrain API, fine-tuningreferences/embeddings.mdEmbeddingModel, pooling, use cases"
      },
      {
        "title": "8. Deprecated Patterns Summary",
        "body": "Most common migrations (see individual reference files for topic-specific deprecations):\n\nIf you see...Use instead...generate(... didGenerate:) callbackgenerate(...) -> AsyncStreamperform { model, tokenizer in }perform { context in }TokenIterator(prompt: MLXArray)TokenIterator(input: LMInput)ModelRegistry typealiasLLMRegistry or VLMRegistrycreateAttentionMask(h:cache:[KVCache]?)createAttentionMask(h:cache:KVCache?)\n\nEach reference file contains a \"Deprecated Patterns\" section with topic-specific migrations."
      },
      {
        "title": "Automatic Behaviors (NO developer action needed)",
        "body": "The framework handles these automatically:\n\nFeatureDetailsEOS token loadingLoaded from config.jsonEOS token overridePriority: generation_config.json > config.json > defaultsEOS token mergingAll sources merged at generation timeEOS token detectionStops generation automatically when EOS encounteredChat template applicationApplied automatically via applyChatTemplate()Tool call format detectionInferred from model_type in config.jsonCache type selectionBased on GenerateParameters (maxKVSize, kvBits)Tokenizer loadingLoaded from tokenizer.json automaticallyModel weights loadingDownloaded and loaded from HuggingFace"
      },
      {
        "title": "Optional Configuration (Developer MAY configure)",
        "body": "FeatureWhen to ConfigureextraEOSTokensOnly if model has unlisted stop tokenstoolCallFormatOnly to override auto-detectionmaxKVSizeTo enable sliding window cachekvBitsTo enable quantized cache (4 or 8 bit)maxTokensTo limit output length"
      }
    ],
    "body": "mlx-swift-lm Skill\n1. Overview & Triggers\n\nmlx-swift-lm is a Swift package for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Apple Silicon using MLX. It supports local inference, fine-tuning via LoRA/DoRA, and embeddings generation.\n\nWhen to Use This Skill\nRunning LLM/VLM inference on macOS/iOS with Apple Silicon\nStreaming text generation from local models\nVision tasks with images/video (VLMs)\nTool calling / function calling with models\nLoRA adapter training and fine-tuning\nText embeddings for RAG/semantic search\nArchitecture Overview\nMLXLMCommon     - Core infrastructure (ModelContainer, ChatSession, KVCache, etc.)\nMLXLLM          - Text-only LLM support (Llama, Qwen, Gemma, Phi, DeepSeek, etc. - examples, not exhaustive)\nMLXVLM          - Vision-Language Models (Qwen2-VL, PaliGemma, Gemma3, etc. - examples, not exhaustive)\nEmbedders       - Embedding models (BGE, Nomic, MiniLM)\n\n2. Key File Reference\nPurpose\tFile Path\nThread-safe model wrapper\tLibraries/MLXLMCommon/ModelContainer.swift\nSimplified chat API\tLibraries/MLXLMCommon/ChatSession.swift\nGeneration & streaming\tLibraries/MLXLMCommon/Evaluate.swift\nKV cache types\tLibraries/MLXLMCommon/KVCache.swift\nModel configuration\tLibraries/MLXLMCommon/ModelConfiguration.swift\nChat message types\tLibraries/MLXLMCommon/Chat.swift\nTool call processing\tLibraries/MLXLMCommon/Tool/ToolCallFormat.swift\nConcurrency utilities\tLibraries/MLXLMCommon/Utilities/SerialAccessContainer.swift\nLLM factory & registry\tLibraries/MLXLLM/LLMModelFactory.swift\nVLM factory & registry\tLibraries/MLXVLM/VLMModelFactory.swift\nLoRA configuration\tLibraries/MLXLMCommon/Adapters/LoRA/LoRAContainer.swift\nLoRA training\tLibraries/MLXLLM/LoraTrain.swift\n3. Quick Start\nLLM Chat (Simplest API)\nimport MLXLLM\nimport MLXLMCommon\n\n// Load model (downloads from HuggingFace automatically)\nlet modelContainer = try await LLMModelFactory.shared.loadContainer(\n    configuration: .init(id: \"mlx-community/Qwen3-4B-4bit\")\n)\n\n// Create chat session\nlet session = ChatSession(modelContainer)\n\n// Single response\nlet response = try await session.respond(to: \"What is Swift?\")\nprint(response)\n\n// Streaming response\nfor try await chunk in session.streamResponse(to: \"Explain concurrency\") {\n    print(chunk, terminator: \"\")\n}\n\nVLM with Image\nimport MLXVLM\nimport MLXLMCommon\n\nlet modelContainer = try await VLMModelFactory.shared.loadContainer(\n    configuration: .init(id: \"mlx-community/Qwen2-VL-2B-Instruct-4bit\")\n)\n\nlet session = ChatSession(modelContainer)\n\n// With image (video is also an optional parameter)\nlet image = UserInput.Image.url(imageURL)\nlet response = try await session.respond(\n    to: \"Describe this image\",\n    image: image,\n    video: nil  // Optional video parameter\n)\n\nEmbeddings\nimport Embedders\n\n// Note: Embedders uses loadModelContainer() helper (not a factory pattern)\nlet container = try await loadModelContainer(\n    configuration: ModelConfiguration(id: \"mlx-community/bge-small-en-v1.5-mlx\")\n)\n\nlet embeddings = await container.perform { model, tokenizer, pooler in\n    let tokens = tokenizer.encode(text: \"Hello world\")\n    let input = MLXArray(tokens).expandedDimensions(axis: 0)\n    let output = model(input)\n    let pooled = pooler(output, normalize: true)\n    eval(pooled)\n    return pooled\n}\n\n4. Primary Workflow: LLM Inference\nChatSession API (Recommended)\n\nChatSession manages conversation history and KV cache automatically:\n\nlet session = ChatSession(\n    modelContainer,\n    instructions: \"You are a helpful assistant\",  // System prompt\n    generateParameters: GenerateParameters(\n        maxTokens: 500,\n        temperature: 0.7\n    )\n)\n\n// Multi-turn conversation (history preserved automatically)\nlet r1 = try await session.respond(to: \"What is 2+2?\")\nlet r2 = try await session.respond(to: \"And if you multiply that by 3?\")\n\n// Clear session to start fresh\nawait session.clear()\n\nStreaming with generate()\n\nFor lower-level control, use generate() directly:\n\nlet input = try await modelContainer.prepare(input: UserInput(prompt: .text(\"Hello\")))\nlet stream = try await modelContainer.generate(input: input, parameters: GenerateParameters())\n\nfor await generation in stream {\n    switch generation {\n    case .chunk(let text):\n        print(text, terminator: \"\")\n    case .info(let info):\n        print(\"\\n\\(info.tokensPerSecond) tok/s\")\n    case .toolCall(let call):\n        // Handle tool call\n        break\n    }\n}\n\nTool Calling\n// 1. Define tool\nstruct WeatherInput: Codable { let location: String }\nstruct WeatherOutput: Codable { let temperature: Double; let conditions: String }\n\nlet weatherTool = Tool<WeatherInput, WeatherOutput>(\n    name: \"get_weather\",\n    description: \"Get current weather\",\n    parameters: [.required(\"location\", type: .string, description: \"City name\")]\n) { input in\n    WeatherOutput(temperature: 22.0, conditions: \"Sunny\")\n}\n\n// 2. Include tool schema in request\nlet input = UserInput(\n    prompt: .text(\"What's the weather in Tokyo?\"),\n    tools: [weatherTool.schema]\n)\n\n// 3. Handle tool calls in generation stream\nfor await generation in try await modelContainer.generate(input: input, parameters: params) {\n    switch generation {\n    case .chunk(let text): print(text)\n    case .toolCall(let call):\n        let result = try await call.execute(with: weatherTool)\n        print(\"Weather: \\(result.conditions)\")\n    case .info: break\n    }\n}\n\n\nSee references/tool-calling.md for multi-turn and feeding results back.\n\nGenerateParameters\nlet params = GenerateParameters(\n    maxTokens: 1000,           // nil = unlimited\n    maxKVSize: 4096,           // Sliding window (uses RotatingKVCache)\n    kvBits: 4,                 // Quantized cache (4 or 8 bit)\n    temperature: 0.7,          // 0 = greedy/argmax\n    topP: 0.9,                 // Nucleus sampling\n    repetitionPenalty: 1.1,    // Penalize repeats\n    repetitionContextSize: 20  // Window for penalty\n)\n\nPrompt Caching / History Re-hydration\n\nRestore chat from persisted history:\n\nlet history: [Chat.Message] = [\n    .system(\"You are helpful\"),\n    .user(\"Hello\"),\n    .assistant(\"Hi there!\")\n]\n\nlet session = ChatSession(\n    modelContainer,\n    history: history\n)\n// Continues from this point\n\n5. Secondary Workflow: VLM Inference\nImage Input Types\n// From URL (file or remote)\nlet image = UserInput.Image.url(fileURL)\n\n// From CIImage\nlet image = UserInput.Image.ciImage(ciImage)\n\n// From MLXArray directly\nlet image = UserInput.Image.array(mlxArray)\n\nVideo Input\n// From URL (file or remote)\nlet video = UserInput.Video.url(videoURL)\n\n// From AVFoundation asset\nlet video = UserInput.Video.avAsset(avAsset)\n\n// From pre-extracted frames\nlet video = UserInput.Video.frames(videoFrames)\n\nlet response = try await session.respond(\n    to: \"What happens in this video?\",\n    video: video\n)\n\nMultiple Images\nlet images: [UserInput.Image] = [\n    .url(url1),\n    .url(url2)\n]\n\nlet response = try await session.respond(\n    to: \"Compare these two images\",\n    images: images,\n    videos: []\n)\n\nVLM-Specific Processing\nlet session = ChatSession(\n    modelContainer,\n    processing: UserInput.Processing(\n        resize: CGSize(width: 512, height: 512)  // Resize images\n    )\n)\n\n6. Best Practices\nDO\n// DO: Use ChatSession for multi-turn conversations\nlet session = ChatSession(modelContainer)\n\n// DO: Use AsyncStream APIs (modern, Swift concurrency)\nfor try await chunk in session.streamResponse(to: prompt) { ... }\n\n// DO: Check Task.isCancelled in long-running loops\nfor try await generation in stream {\n    if Task.isCancelled { break }\n    // process generation\n}\n\n// DO: Use ModelContainer.perform() for thread-safe access\nawait modelContainer.perform { context in\n    // Access model, tokenizer safely\n    let tokens = try context.tokenizer.applyChatTemplate(messages: messages)\n    return tokens\n}\n\n// DO: When breaking early from generation, use generateTask() to get a task handle\n// This is the lower-level API used internally by ChatSession\nlet (stream, task) = generateTask(...)  // Returns (AsyncStream, Task)\n\nfor await item in stream {\n    if shouldStop { break }\n}\nawait task.value  // Ensures KV cache cleanup before next generation\n\n\ngenerateTask() is defined in Evaluate.swift. Most users should use ChatSession which handles this internally.\n\nDON'T\n// DON'T: Share MLXArray across tasks (not Sendable)\nlet array = MLXArray(...)\nTask { array.sum() }  // Wrong!\n\n// DON'T: Use deprecated callback-based generation\n// Old:\ngenerate(input: input, parameters: params) { tokens in ... }  // Deprecated\n// New:\nfor await generation in try generate(input: input, parameters: params, context: context) { ... }\n\n// DON'T: Use old perform(model, tokenizer) signature\n// Old:\nmodelContainer.perform { model, tokenizer in ... }  // Deprecated\n// New:\nmodelContainer.perform { context in ... }\n\n// DON'T: Forget to eval() MLXArrays before returning from perform()\nawait modelContainer.perform { context in\n    let result = context.model(input)\n    eval(result)  // Required before returning\n    return result.item(Float.self)\n}\n\nThread Safety\nModelContainer is Sendable and thread-safe\nChatSession is NOT thread-safe (use from single task)\nMLXArray is NOT Sendable - don't pass across isolation boundaries\nUse SendableBox for transferring non-Sendable data in consuming contexts\nMemory Management\n// For long contexts, use sliding window cache\nlet params = GenerateParameters(maxKVSize: 4096)\n\n// For memory efficiency, use quantized cache\nlet params = GenerateParameters(kvBits: 4)  // or 8\n\n// Clear session cache when done\nawait session.clear()\n\n7. Reference Links\n\nFor detailed documentation on specific topics, see:\n\nReference\tWhen to Use\nreferences/model-container.md\tLoading models, ModelContainer API, ModelConfiguration\nreferences/kv-cache.md\tCache types, memory optimization, cache serialization\nreferences/concurrency.md\tThread safety, SerialAccessContainer, async patterns\nreferences/tool-calling.md\tFunction calling, tool formats, ToolCallProcessor\nreferences/tokenizer-chat.md\tTokenizer, Chat.Message, EOS tokens\nreferences/supported-models.md\tModel families, registries, model-specific config\nreferences/lora-adapters.md\tLoRA/DoRA/QLoRA, loading adapters\nreferences/training.md\tLoRATrain API, fine-tuning\nreferences/embeddings.md\tEmbeddingModel, pooling, use cases\n8. Deprecated Patterns Summary\n\nMost common migrations (see individual reference files for topic-specific deprecations):\n\nIf you see...\tUse instead...\ngenerate(... didGenerate:) callback\tgenerate(...) -> AsyncStream\nperform { model, tokenizer in }\tperform { context in }\nTokenIterator(prompt: MLXArray)\tTokenIterator(input: LMInput)\nModelRegistry typealias\tLLMRegistry or VLMRegistry\ncreateAttentionMask(h:cache:[KVCache]?)\tcreateAttentionMask(h:cache:KVCache?)\n\nEach reference file contains a \"Deprecated Patterns\" section with topic-specific migrations.\n\n9. Automatic vs Manual Configuration\nAutomatic Behaviors (NO developer action needed)\n\nThe framework handles these automatically:\n\nFeature\tDetails\nEOS token loading\tLoaded from config.json\nEOS token override\tPriority: generation_config.json > config.json > defaults\nEOS token merging\tAll sources merged at generation time\nEOS token detection\tStops generation automatically when EOS encountered\nChat template application\tApplied automatically via applyChatTemplate()\nTool call format detection\tInferred from model_type in config.json\nCache type selection\tBased on GenerateParameters (maxKVSize, kvBits)\nTokenizer loading\tLoaded from tokenizer.json automatically\nModel weights loading\tDownloaded and loaded from HuggingFace\nOptional Configuration (Developer MAY configure)\nFeature\tWhen to Configure\nextraEOSTokens\tOnly if model has unlisted stop tokens\ntoolCallFormat\tOnly to override auto-detection\nmaxKVSize\tTo enable sliding window cache\nkvBits\tTo enable quantized cache (4 or 8 bit)\nmaxTokens\tTo limit output length"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "publisherUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "owner": "ronaldmannak",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/mlx-swift-lm",
    "downloadUrl": "https://openagent3.xyz/downloads/mlx-swift-lm",
    "agentUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.md"
  }
}