{
  "schemaVersion": "1.0",
  "item": {
    "slug": "homelab-cluster",
    "name": "Homelab Cluster Management",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/mlesnews/homelab-cluster",
    "canonicalUrl": "https://clawhub.ai/mlesnews/homelab-cluster",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/homelab-cluster",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=homelab-cluster",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/homelab-cluster"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/homelab-cluster",
    "agentPageUrl": "https://openagent3.xyz/skills/homelab-cluster/agent",
    "manifestUrl": "https://openagent3.xyz/skills/homelab-cluster/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/homelab-cluster/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Homelab Cluster Management",
        "body": "Manage a compound AI compute cluster spanning multiple tiers of GPU and CPU inference nodes.\nBuilt and battle-tested by Lumina Homelab."
      },
      {
        "title": "When to Use",
        "body": "Use this skill when your agent needs to:\n\nMonitor health of distributed model endpoints\nRoute inference requests to the best available model\nRecover downed nodes automatically\nPlan GPU memory allocation across models\nDeploy models across heterogeneous hardware"
      },
      {
        "title": "Architecture Pattern",
        "body": "A homelab cluster typically spans 2-3 tiers:\n\nTierTypical HardwareRuntimeRoleLocalPrimary GPU (RTX 4090/5090)OllamaFast inference, embeddingsRemoteSecondary GPU (RTX 3090/4090)llama.cpp or OllamaDistributed inferenceNAS/CPUSynology, RPi, any CPU nodeOllamaLightweight models, fallback\n\nA LiteLLM proxy sits in front, providing a unified OpenAI-compatible API across all tiers."
      },
      {
        "title": "Health Monitoring",
        "body": "Check all endpoints with configurable per-endpoint timeouts:\n\n# Define endpoints with tier labels\nENDPOINTS = {\n    \"local/ollama\": {\"url\": \"http://localhost:11434/api/tags\", \"tier\": \"LOCAL\"},\n    \"remote/mark-i\": {\"url\": \"http://REMOTE_IP:3009/v1/models\", \"tier\": \"REMOTE\", \"timeout\": 8},\n    \"gateway/litellm\": {\"url\": \"http://localhost:8080/health/liveliness\", \"tier\": \"GATEWAY\"},\n}\n\n# For each endpoint: GET with timeout, check HTTP 200\n# Classify: HEALTHY / DEGRADED / DOWN per tier\n# Overall prognosis based on tier health\n\nKey lesson: Use /health/liveliness for LiteLLM, not /health — the latter probes all model routes and hangs if any are unreachable."
      },
      {
        "title": "Expert MoE Routing",
        "body": "Route requests to the optimal model based on task classification:\n\nTask Categories:\n  code     → Coder model (Qwen2.5-Coder-7B or similar)\n  reason   → Reasoning model (DeepSeek-R1-Distill or similar)\n  chat     → General model (Qwen2.5-14B or similar)\n  vision   → Vision model (Qwen2.5-VL or similar)\n  fast     → Smallest available model for quick responses\n  embed    → Embedding model (nomic-embed-text or similar)\n\nRouter logic:\n  1. Classify task from prompt\n  2. Check health of preferred model\n  3. Fallback to next-best if unavailable\n  4. Return model endpoint + metadata"
      },
      {
        "title": "Critical: Use Docker Volumes, Not Bind Mounts",
        "body": "For models larger than ~1.5GB on Windows Docker hosts:\n\n# Create a Docker volume for model storage\ndocker volume create models-vol\n\n# Copy models INTO the volume\ndocker run --rm -v models-vol:/models -v /host/path:/src alpine cp /src/model.gguf /models/\n\n# Run container FROM volume (not bind mount)\ndocker run -d --gpus all -v models-vol:/models -p 3009:8000 \\\n  -e MODEL_PATH=/models/model.gguf your-llamacpp-image\n\nWhy: Windows bind mounts use gRPC-FUSE/9P bridge which hangs during GPU tensor loading for large files. Docker volumes use native Linux ext4 and bypass this entirely."
      },
      {
        "title": "Sequential Container Startup",
        "body": "Never start multiple GPU containers simultaneously:\n\n# WRONG — causes CUDA initialization deadlock\ndocker start mark-i mark-iii mark-iv mark-vi &\n\n# RIGHT — sequential with health check between each\nfor container in mark-v mark-iii mark-iv mark-vi mark-i; do\n  docker restart $container\n  sleep 5\n  # Verify health before starting next\n  curl -s http://localhost:PORT/v1/models || echo \"Warning: $container slow to start\"\ndone"
      },
      {
        "title": "GPU Memory Planning",
        "body": "Plan your model lineup to fit within VRAM:\n\nExample for 24GB GPU:\n  14B model (Q4_K_M)  →  9.0 GB, 28 GPU layers\n  7B coder            →  4.4 GB, full GPU\n  8B reasoning        →  4.6 GB, full GPU\n  1.5B fast coder     →  1.1 GB, full GPU\n  1.7B fast chat      →  1.0 GB, full GPU\n  ─────────────────────────────\n  Total:               20.1 GB (~84% utilized)\n\n  Remaining: CPU-only containers for 32B+ models"
      },
      {
        "title": "Automatic Node Recovery",
        "body": "When a remote node goes down (Docker Desktop crash, reboot, etc.):\n\nRecovery sequence:\n  1. Health check fails for remote tier\n  2. Check if SSH is responsive (node is up but Docker is down)\n  3. If SSH works: restart Docker Desktop via SSH\n  4. If SSH fails: create RDP session to wake the machine\n  5. Wait for Docker + sequential container restart\n  6. Re-check health\n\nImportant: Never store recovery credentials in plaintext. Use a vault (Azure Key Vault, HashiCorp Vault, etc.) and pipe secrets through stdin, never as CLI arguments."
      },
      {
        "title": "LiteLLM Gateway Configuration",
        "body": "Unified API across all tiers:\n\nmodel_list:\n  # Local Ollama models\n  - model_name: local/chat\n    litellm_params:\n      model: ollama/qwen2.5:32b\n      api_base: http://localhost:11434\n\n  # Remote llama.cpp models (need openai/ prefix)\n  - model_name: remote/mark-i\n    litellm_params:\n      model: openai/qwen2.5-14b-instruct\n      api_base: http://REMOTE_IP:3009/v1\n      api_key: \"not-needed\"\n\n  # NAS Ollama models\n  - model_name: nas/coder\n    litellm_params:\n      model: ollama/qwen2.5-coder:7b\n      api_base: http://NAS_IP:11434\n\nKey: llama.cpp endpoints need the openai/ prefix in model name and /v1 in api_base for LiteLLM compatibility."
      },
      {
        "title": "Links",
        "body": "Lumina Homelab: luminahomelab.ai\nX/Twitter: @HK47LUMINA\nGitHub: mlesnews"
      }
    ],
    "body": "Homelab Cluster Management\n\nManage a compound AI compute cluster spanning multiple tiers of GPU and CPU inference nodes. Built and battle-tested by Lumina Homelab.\n\nWhen to Use\n\nUse this skill when your agent needs to:\n\nMonitor health of distributed model endpoints\nRoute inference requests to the best available model\nRecover downed nodes automatically\nPlan GPU memory allocation across models\nDeploy models across heterogeneous hardware\nArchitecture Pattern\n\nA homelab cluster typically spans 2-3 tiers:\n\nTier\tTypical Hardware\tRuntime\tRole\nLocal\tPrimary GPU (RTX 4090/5090)\tOllama\tFast inference, embeddings\nRemote\tSecondary GPU (RTX 3090/4090)\tllama.cpp or Ollama\tDistributed inference\nNAS/CPU\tSynology, RPi, any CPU node\tOllama\tLightweight models, fallback\n\nA LiteLLM proxy sits in front, providing a unified OpenAI-compatible API across all tiers.\n\nHealth Monitoring\n\nCheck all endpoints with configurable per-endpoint timeouts:\n\n# Define endpoints with tier labels\nENDPOINTS = {\n    \"local/ollama\": {\"url\": \"http://localhost:11434/api/tags\", \"tier\": \"LOCAL\"},\n    \"remote/mark-i\": {\"url\": \"http://REMOTE_IP:3009/v1/models\", \"tier\": \"REMOTE\", \"timeout\": 8},\n    \"gateway/litellm\": {\"url\": \"http://localhost:8080/health/liveliness\", \"tier\": \"GATEWAY\"},\n}\n\n# For each endpoint: GET with timeout, check HTTP 200\n# Classify: HEALTHY / DEGRADED / DOWN per tier\n# Overall prognosis based on tier health\n\n\nKey lesson: Use /health/liveliness for LiteLLM, not /health — the latter probes all model routes and hangs if any are unreachable.\n\nExpert MoE Routing\n\nRoute requests to the optimal model based on task classification:\n\nTask Categories:\n  code     → Coder model (Qwen2.5-Coder-7B or similar)\n  reason   → Reasoning model (DeepSeek-R1-Distill or similar)\n  chat     → General model (Qwen2.5-14B or similar)\n  vision   → Vision model (Qwen2.5-VL or similar)\n  fast     → Smallest available model for quick responses\n  embed    → Embedding model (nomic-embed-text or similar)\n\nRouter logic:\n  1. Classify task from prompt\n  2. Check health of preferred model\n  3. Fallback to next-best if unavailable\n  4. Return model endpoint + metadata\n\nDocker Deployment (llama.cpp on Remote Nodes)\nCritical: Use Docker Volumes, Not Bind Mounts\n\nFor models larger than ~1.5GB on Windows Docker hosts:\n\n# Create a Docker volume for model storage\ndocker volume create models-vol\n\n# Copy models INTO the volume\ndocker run --rm -v models-vol:/models -v /host/path:/src alpine cp /src/model.gguf /models/\n\n# Run container FROM volume (not bind mount)\ndocker run -d --gpus all -v models-vol:/models -p 3009:8000 \\\n  -e MODEL_PATH=/models/model.gguf your-llamacpp-image\n\n\nWhy: Windows bind mounts use gRPC-FUSE/9P bridge which hangs during GPU tensor loading for large files. Docker volumes use native Linux ext4 and bypass this entirely.\n\nSequential Container Startup\n\nNever start multiple GPU containers simultaneously:\n\n# WRONG — causes CUDA initialization deadlock\ndocker start mark-i mark-iii mark-iv mark-vi &\n\n# RIGHT — sequential with health check between each\nfor container in mark-v mark-iii mark-iv mark-vi mark-i; do\n  docker restart $container\n  sleep 5\n  # Verify health before starting next\n  curl -s http://localhost:PORT/v1/models || echo \"Warning: $container slow to start\"\ndone\n\nGPU Memory Planning\n\nPlan your model lineup to fit within VRAM:\n\nExample for 24GB GPU:\n  14B model (Q4_K_M)  →  9.0 GB, 28 GPU layers\n  7B coder            →  4.4 GB, full GPU\n  8B reasoning        →  4.6 GB, full GPU\n  1.5B fast coder     →  1.1 GB, full GPU\n  1.7B fast chat      →  1.0 GB, full GPU\n  ─────────────────────────────\n  Total:               20.1 GB (~84% utilized)\n\n  Remaining: CPU-only containers for 32B+ models\n\nAutomatic Node Recovery\n\nWhen a remote node goes down (Docker Desktop crash, reboot, etc.):\n\nRecovery sequence:\n  1. Health check fails for remote tier\n  2. Check if SSH is responsive (node is up but Docker is down)\n  3. If SSH works: restart Docker Desktop via SSH\n  4. If SSH fails: create RDP session to wake the machine\n  5. Wait for Docker + sequential container restart\n  6. Re-check health\n\n\nImportant: Never store recovery credentials in plaintext. Use a vault (Azure Key Vault, HashiCorp Vault, etc.) and pipe secrets through stdin, never as CLI arguments.\n\nLiteLLM Gateway Configuration\n\nUnified API across all tiers:\n\nmodel_list:\n  # Local Ollama models\n  - model_name: local/chat\n    litellm_params:\n      model: ollama/qwen2.5:32b\n      api_base: http://localhost:11434\n\n  # Remote llama.cpp models (need openai/ prefix)\n  - model_name: remote/mark-i\n    litellm_params:\n      model: openai/qwen2.5-14b-instruct\n      api_base: http://REMOTE_IP:3009/v1\n      api_key: \"not-needed\"\n\n  # NAS Ollama models\n  - model_name: nas/coder\n    litellm_params:\n      model: ollama/qwen2.5-coder:7b\n      api_base: http://NAS_IP:11434\n\n\nKey: llama.cpp endpoints need the openai/ prefix in model name and /v1 in api_base for LiteLLM compatibility.\n\nLinks\nLumina Homelab: luminahomelab.ai\nX/Twitter: @HK47LUMINA\nGitHub: mlesnews"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/mlesnews/homelab-cluster",
    "publisherUrl": "https://clawhub.ai/mlesnews/homelab-cluster",
    "owner": "mlesnews",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/homelab-cluster",
    "downloadUrl": "https://openagent3.xyz/downloads/homelab-cluster",
    "agentUrl": "https://openagent3.xyz/skills/homelab-cluster/agent",
    "manifestUrl": "https://openagent3.xyz/skills/homelab-cluster/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/homelab-cluster/agent.md"
  }
}