# Send MLX Swift LM Expert to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "mlx-swift-lm",
    "name": "MLX Swift LM Expert",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "canonicalUrl": "https://clawhub.ai/ronaldmannak/mlx-swift-lm",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/mlx-swift-lm",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=mlx-swift-lm",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "references/embeddings.md",
      "references/training.md",
      "references/tool-calling.md",
      "references/lora-adapters.md",
      "references/model-container.md",
      "references/kv-cache.md"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/mlx-swift-lm"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/mlx-swift-lm",
    "downloadUrl": "https://openagent3.xyz/downloads/mlx-swift-lm",
    "agentUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent",
    "manifestUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/mlx-swift-lm/agent.md"
  }
}
```
## Documentation

### 1. Overview & Triggers

mlx-swift-lm is a Swift package for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Apple Silicon using MLX. It supports local inference, fine-tuning via LoRA/DoRA, and embeddings generation.

### When to Use This Skill

Running LLM/VLM inference on macOS/iOS with Apple Silicon
Streaming text generation from local models
Vision tasks with images/video (VLMs)
Tool calling / function calling with models
LoRA adapter training and fine-tuning
Text embeddings for RAG/semantic search

### Architecture Overview

MLXLMCommon     - Core infrastructure (ModelContainer, ChatSession, KVCache, etc.)
MLXLLM          - Text-only LLM support (Llama, Qwen, Gemma, Phi, DeepSeek, etc. - examples, not exhaustive)
MLXVLM          - Vision-Language Models (Qwen2-VL, PaliGemma, Gemma3, etc. - examples, not exhaustive)
Embedders       - Embedding models (BGE, Nomic, MiniLM)

### 2. Key File Reference

PurposeFile PathThread-safe model wrapperLibraries/MLXLMCommon/ModelContainer.swiftSimplified chat APILibraries/MLXLMCommon/ChatSession.swiftGeneration & streamingLibraries/MLXLMCommon/Evaluate.swiftKV cache typesLibraries/MLXLMCommon/KVCache.swiftModel configurationLibraries/MLXLMCommon/ModelConfiguration.swiftChat message typesLibraries/MLXLMCommon/Chat.swiftTool call processingLibraries/MLXLMCommon/Tool/ToolCallFormat.swiftConcurrency utilitiesLibraries/MLXLMCommon/Utilities/SerialAccessContainer.swiftLLM factory & registryLibraries/MLXLLM/LLMModelFactory.swiftVLM factory & registryLibraries/MLXVLM/VLMModelFactory.swiftLoRA configurationLibraries/MLXLMCommon/Adapters/LoRA/LoRAContainer.swiftLoRA trainingLibraries/MLXLLM/LoraTrain.swift

### LLM Chat (Simplest API)

import MLXLLM
import MLXLMCommon

// Load model (downloads from HuggingFace automatically)
let modelContainer = try await LLMModelFactory.shared.loadContainer(
    configuration: .init(id: "mlx-community/Qwen3-4B-4bit")
)

// Create chat session
let session = ChatSession(modelContainer)

// Single response
let response = try await session.respond(to: "What is Swift?")
print(response)

// Streaming response
for try await chunk in session.streamResponse(to: "Explain concurrency") {
    print(chunk, terminator: "")
}

### VLM with Image

import MLXVLM
import MLXLMCommon

let modelContainer = try await VLMModelFactory.shared.loadContainer(
    configuration: .init(id: "mlx-community/Qwen2-VL-2B-Instruct-4bit")
)

let session = ChatSession(modelContainer)

// With image (video is also an optional parameter)
let image = UserInput.Image.url(imageURL)
let response = try await session.respond(
    to: "Describe this image",
    image: image,
    video: nil  // Optional video parameter
)

### Embeddings

import Embedders

// Note: Embedders uses loadModelContainer() helper (not a factory pattern)
let container = try await loadModelContainer(
    configuration: ModelConfiguration(id: "mlx-community/bge-small-en-v1.5-mlx")
)

let embeddings = await container.perform { model, tokenizer, pooler in
    let tokens = tokenizer.encode(text: "Hello world")
    let input = MLXArray(tokens).expandedDimensions(axis: 0)
    let output = model(input)
    let pooled = pooler(output, normalize: true)
    eval(pooled)
    return pooled
}

### ChatSession API (Recommended)

ChatSession manages conversation history and KV cache automatically:

let session = ChatSession(
    modelContainer,
    instructions: "You are a helpful assistant",  // System prompt
    generateParameters: GenerateParameters(
        maxTokens: 500,
        temperature: 0.7
    )
)

// Multi-turn conversation (history preserved automatically)
let r1 = try await session.respond(to: "What is 2+2?")
let r2 = try await session.respond(to: "And if you multiply that by 3?")

// Clear session to start fresh
await session.clear()

### Streaming with generate()

For lower-level control, use generate() directly:

let input = try await modelContainer.prepare(input: UserInput(prompt: .text("Hello")))
let stream = try await modelContainer.generate(input: input, parameters: GenerateParameters())

for await generation in stream {
    switch generation {
    case .chunk(let text):
        print(text, terminator: "")
    case .info(let info):
        print("\\n\\(info.tokensPerSecond) tok/s")
    case .toolCall(let call):
        // Handle tool call
        break
    }
}

### Tool Calling

// 1. Define tool
struct WeatherInput: Codable { let location: String }
struct WeatherOutput: Codable { let temperature: Double; let conditions: String }

let weatherTool = Tool<WeatherInput, WeatherOutput>(
    name: "get_weather",
    description: "Get current weather",
    parameters: [.required("location", type: .string, description: "City name")]
) { input in
    WeatherOutput(temperature: 22.0, conditions: "Sunny")
}

// 2. Include tool schema in request
let input = UserInput(
    prompt: .text("What's the weather in Tokyo?"),
    tools: [weatherTool.schema]
)

// 3. Handle tool calls in generation stream
for await generation in try await modelContainer.generate(input: input, parameters: params) {
    switch generation {
    case .chunk(let text): print(text)
    case .toolCall(let call):
        let result = try await call.execute(with: weatherTool)
        print("Weather: \\(result.conditions)")
    case .info: break
    }
}

See references/tool-calling.md for multi-turn and feeding results back.

### GenerateParameters

let params = GenerateParameters(
    maxTokens: 1000,           // nil = unlimited
    maxKVSize: 4096,           // Sliding window (uses RotatingKVCache)
    kvBits: 4,                 // Quantized cache (4 or 8 bit)
    temperature: 0.7,          // 0 = greedy/argmax
    topP: 0.9,                 // Nucleus sampling
    repetitionPenalty: 1.1,    // Penalize repeats
    repetitionContextSize: 20  // Window for penalty
)

### Prompt Caching / History Re-hydration

Restore chat from persisted history:

let history: [Chat.Message] = [
    .system("You are helpful"),
    .user("Hello"),
    .assistant("Hi there!")
]

let session = ChatSession(
    modelContainer,
    history: history
)
// Continues from this point

### Image Input Types

// From URL (file or remote)
let image = UserInput.Image.url(fileURL)

// From CIImage
let image = UserInput.Image.ciImage(ciImage)

// From MLXArray directly
let image = UserInput.Image.array(mlxArray)

### Video Input

// From URL (file or remote)
let video = UserInput.Video.url(videoURL)

// From AVFoundation asset
let video = UserInput.Video.avAsset(avAsset)

// From pre-extracted frames
let video = UserInput.Video.frames(videoFrames)

let response = try await session.respond(
    to: "What happens in this video?",
    video: video
)

### Multiple Images

let images: [UserInput.Image] = [
    .url(url1),
    .url(url2)
]

let response = try await session.respond(
    to: "Compare these two images",
    images: images,
    videos: []
)

### VLM-Specific Processing

let session = ChatSession(
    modelContainer,
    processing: UserInput.Processing(
        resize: CGSize(width: 512, height: 512)  // Resize images
    )
)

### DO

// DO: Use ChatSession for multi-turn conversations
let session = ChatSession(modelContainer)

// DO: Use AsyncStream APIs (modern, Swift concurrency)
for try await chunk in session.streamResponse(to: prompt) { ... }

// DO: Check Task.isCancelled in long-running loops
for try await generation in stream {
    if Task.isCancelled { break }
    // process generation
}

// DO: Use ModelContainer.perform() for thread-safe access
await modelContainer.perform { context in
    // Access model, tokenizer safely
    let tokens = try context.tokenizer.applyChatTemplate(messages: messages)
    return tokens
}

// DO: When breaking early from generation, use generateTask() to get a task handle
// This is the lower-level API used internally by ChatSession
let (stream, task) = generateTask(...)  // Returns (AsyncStream, Task)

for await item in stream {
    if shouldStop { break }
}
await task.value  // Ensures KV cache cleanup before next generation

generateTask() is defined in Evaluate.swift. Most users should use ChatSession which handles this internally.

### DON'T

// DON'T: Share MLXArray across tasks (not Sendable)
let array = MLXArray(...)
Task { array.sum() }  // Wrong!

// DON'T: Use deprecated callback-based generation
// Old:
generate(input: input, parameters: params) { tokens in ... }  // Deprecated
// New:
for await generation in try generate(input: input, parameters: params, context: context) { ... }

// DON'T: Use old perform(model, tokenizer) signature
// Old:
modelContainer.perform { model, tokenizer in ... }  // Deprecated
// New:
modelContainer.perform { context in ... }

// DON'T: Forget to eval() MLXArrays before returning from perform()
await modelContainer.perform { context in
    let result = context.model(input)
    eval(result)  // Required before returning
    return result.item(Float.self)
}

### Thread Safety

ModelContainer is Sendable and thread-safe
ChatSession is NOT thread-safe (use from single task)
MLXArray is NOT Sendable - don't pass across isolation boundaries
Use SendableBox for transferring non-Sendable data in consuming contexts

### Memory Management

// For long contexts, use sliding window cache
let params = GenerateParameters(maxKVSize: 4096)

// For memory efficiency, use quantized cache
let params = GenerateParameters(kvBits: 4)  // or 8

// Clear session cache when done
await session.clear()

### 7. Reference Links

For detailed documentation on specific topics, see:

ReferenceWhen to Usereferences/model-container.mdLoading models, ModelContainer API, ModelConfigurationreferences/kv-cache.mdCache types, memory optimization, cache serializationreferences/concurrency.mdThread safety, SerialAccessContainer, async patternsreferences/tool-calling.mdFunction calling, tool formats, ToolCallProcessorreferences/tokenizer-chat.mdTokenizer, Chat.Message, EOS tokensreferences/supported-models.mdModel families, registries, model-specific configreferences/lora-adapters.mdLoRA/DoRA/QLoRA, loading adaptersreferences/training.mdLoRATrain API, fine-tuningreferences/embeddings.mdEmbeddingModel, pooling, use cases

### 8. Deprecated Patterns Summary

Most common migrations (see individual reference files for topic-specific deprecations):

If you see...Use instead...generate(... didGenerate:) callbackgenerate(...) -> AsyncStreamperform { model, tokenizer in }perform { context in }TokenIterator(prompt: MLXArray)TokenIterator(input: LMInput)ModelRegistry typealiasLLMRegistry or VLMRegistrycreateAttentionMask(h:cache:[KVCache]?)createAttentionMask(h:cache:KVCache?)

Each reference file contains a "Deprecated Patterns" section with topic-specific migrations.

### Automatic Behaviors (NO developer action needed)

The framework handles these automatically:

FeatureDetailsEOS token loadingLoaded from config.jsonEOS token overridePriority: generation_config.json > config.json > defaultsEOS token mergingAll sources merged at generation timeEOS token detectionStops generation automatically when EOS encounteredChat template applicationApplied automatically via applyChatTemplate()Tool call format detectionInferred from model_type in config.jsonCache type selectionBased on GenerateParameters (maxKVSize, kvBits)Tokenizer loadingLoaded from tokenizer.json automaticallyModel weights loadingDownloaded and loaded from HuggingFace

### Optional Configuration (Developer MAY configure)

FeatureWhen to ConfigureextraEOSTokensOnly if model has unlisted stop tokenstoolCallFormatOnly to override auto-detectionmaxKVSizeTo enable sliding window cachekvBitsTo enable quantized cache (4 or 8 bit)maxTokensTo limit output length
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: ronaldmannak
- Version: 1.0.0
## Source health
- Status: healthy
- Source download looks usable.
- Yavira can redirect you to the upstream package for this source.
- Health scope: source
- Reason: direct_download_ok
- Checked at: 2026-04-30T16:55:25.780Z
- Expires at: 2026-05-07T16:55:25.780Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/mlx-swift-lm)
- [Send to Agent page](https://openagent3.xyz/skills/mlx-swift-lm/agent)
- [JSON manifest](https://openagent3.xyz/skills/mlx-swift-lm/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/mlx-swift-lm/agent.md)
- [Download page](https://openagent3.xyz/downloads/mlx-swift-lm)