Tencent SkillHub · Developer Tools

Homelab Cluster Management

Manage multi-tier AI inference clusters for homelabs. Health monitoring, expert MoE routing, automatic node recovery, and model deployment across Ollama and llama.cpp nodes. Covers GPU memory planning, Docker volume strategies for large models, sequential startup patterns to avoid CUDA deadlocks, and unified API gateways via LiteLLM.

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.0

Provenance

Publisher: mlesnews
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 11 sections Open source page

Homelab Cluster Management

Manage a compound AI compute cluster spanning multiple tiers of GPU and CPU inference nodes. Built and battle-tested by Lumina Homelab.

When to Use

Use this skill when your agent needs to: Monitor health of distributed model endpoints Route inference requests to the best available model Recover downed nodes automatically Plan GPU memory allocation across models Deploy models across heterogeneous hardware

Architecture Pattern

A homelab cluster typically spans 2-3 tiers: TierTypical HardwareRuntimeRoleLocalPrimary GPU (RTX 4090/5090)OllamaFast inference, embeddingsRemoteSecondary GPU (RTX 3090/4090)llama.cpp or OllamaDistributed inferenceNAS/CPUSynology, RPi, any CPU nodeOllamaLightweight models, fallback A LiteLLM proxy sits in front, providing a unified OpenAI-compatible API across all tiers.

Health Monitoring

Check all endpoints with configurable per-endpoint timeouts: # Define endpoints with tier labels ENDPOINTS = { "local/ollama": {"url": "http://localhost:11434/api/tags", "tier": "LOCAL"}, "remote/mark-i": {"url": "http://REMOTE_IP:3009/v1/models", "tier": "REMOTE", "timeout": 8}, "gateway/litellm": {"url": "http://localhost:8080/health/liveliness", "tier": "GATEWAY"}, } # For each endpoint: GET with timeout, check HTTP 200 # Classify: HEALTHY / DEGRADED / DOWN per tier # Overall prognosis based on tier health Key lesson: Use /health/liveliness for LiteLLM, not /health — the latter probes all model routes and hangs if any are unreachable.

Expert MoE Routing

Route requests to the optimal model based on task classification: Task Categories: code → Coder model (Qwen2.5-Coder-7B or similar) reason → Reasoning model (DeepSeek-R1-Distill or similar) chat → General model (Qwen2.5-14B or similar) vision → Vision model (Qwen2.5-VL or similar) fast → Smallest available model for quick responses embed → Embedding model (nomic-embed-text or similar) Router logic: 1. Classify task from prompt 2. Check health of preferred model 3. Fallback to next-best if unavailable 4. Return model endpoint + metadata

Critical: Use Docker Volumes, Not Bind Mounts

For models larger than ~1.5GB on Windows Docker hosts: # Create a Docker volume for model storage docker volume create models-vol # Copy models INTO the volume docker run --rm -v models-vol:/models -v /host/path:/src alpine cp /src/model.gguf /models/ # Run container FROM volume (not bind mount) docker run -d --gpus all -v models-vol:/models -p 3009:8000 \ -e MODEL_PATH=/models/model.gguf your-llamacpp-image Why: Windows bind mounts use gRPC-FUSE/9P bridge which hangs during GPU tensor loading for large files. Docker volumes use native Linux ext4 and bypass this entirely.

Sequential Container Startup

Never start multiple GPU containers simultaneously: # WRONG — causes CUDA initialization deadlock docker start mark-i mark-iii mark-iv mark-vi & # RIGHT — sequential with health check between each for container in mark-v mark-iii mark-iv mark-vi mark-i; do docker restart $container sleep 5 # Verify health before starting next curl -s http://localhost:PORT/v1/models || echo "Warning: $container slow to start" done

GPU Memory Planning

Plan your model lineup to fit within VRAM: Example for 24GB GPU: 14B model (Q4_K_M) → 9.0 GB, 28 GPU layers 7B coder → 4.4 GB, full GPU 8B reasoning → 4.6 GB, full GPU 1.5B fast coder → 1.1 GB, full GPU 1.7B fast chat → 1.0 GB, full GPU ───────────────────────────── Total: 20.1 GB (~84% utilized) Remaining: CPU-only containers for 32B+ models

Automatic Node Recovery

When a remote node goes down (Docker Desktop crash, reboot, etc.): Recovery sequence: 1. Health check fails for remote tier 2. Check if SSH is responsive (node is up but Docker is down) 3. If SSH works: restart Docker Desktop via SSH 4. If SSH fails: create RDP session to wake the machine 5. Wait for Docker + sequential container restart 6. Re-check health Important: Never store recovery credentials in plaintext. Use a vault (Azure Key Vault, HashiCorp Vault, etc.) and pipe secrets through stdin, never as CLI arguments.

LiteLLM Gateway Configuration

Unified API across all tiers: model_list: # Local Ollama models - model_name: local/chat litellm_params: model: ollama/qwen2.5:32b api_base: http://localhost:11434 # Remote llama.cpp models (need openai/ prefix) - model_name: remote/mark-i litellm_params: model: openai/qwen2.5-14b-instruct api_base: http://REMOTE_IP:3009/v1 api_key: "not-needed" # NAS Ollama models - model_name: nas/coder litellm_params: model: ollama/qwen2.5-coder:7b api_base: http://NAS_IP:11434 Key: llama.cpp endpoints need the openai/ prefix in model name and /v1 in api_base for LiteLLM compatibility.

Links

Lumina Homelab: luminahomelab.ai X/Twitter: @HK47LUMINA GitHub: mlesnews

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

1 Docs

SKILL.md Primary doc

Install for OpenClaw

Requirements

Package facts

Validation

Install with your agent

Trust & source

Release facts

Provenance

Documentation

Homelab Cluster Management

When to Use

Architecture Pattern

Health Monitoring

Expert MoE Routing

Critical: Use Docker Volumes, Not Bind Mounts

Sequential Container Startup

GPU Memory Planning

Automatic Node Recovery

LiteLLM Gateway Configuration

Links

Package contents

Related skills

API Architect

The Dev Team

Web Platform Engineer