Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Hardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.
Use this skill immediately when the user asks any of: "what local models can I run?" "which LLMs fit my hardware?" "recommend a local model" "what's the best model for my GPU?" "can I run Llama 70B locally?" "configure local models" "set up Ollama models" "what models fit my VRAM?" "help me pick a local model for coding" Also use this skill when: The user wants to configure models.providers.ollama or models.providers.lmstudio The user mentions running models locally and you need to know what fits A model recommendation is needed and the user has local inference capability (Ollama, vLLM, LM Studio)
llmfit --json system Returns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).
llmfit recommend --json --limit 5 Returns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.
llmfit recommend --json --use-case coding --limit 3 llmfit recommend --json --use-case reasoning --limit 3 llmfit recommend --json --use-case chat --limit 3 Valid use cases: general, coding, reasoning, chat, multimodal, embedding.
llmfit recommend --json --min-fit good --limit 10 Valid fit levels (best to worst): perfect, good, marginal.
{ "system": { "cpu_name": "Apple M2 Max", "cpu_cores": 12, "total_ram_gb": 32.0, "available_ram_gb": 24.5, "has_gpu": true, "gpu_name": "Apple M2 Max", "gpu_vram_gb": 32.0, "gpu_count": 1, "backend": "Metal", "unified_memory": true } }
Each model in the models array includes: FieldMeaningnameHuggingFace model ID (e.g. meta-llama/Llama-3.1-8B-Instruct)providerModel provider (Meta, Alibaba, Google, etc.)params_bParameter count in billionsscoreComposite score 0β100 (higher is better)score_componentsBreakdown: quality, speed, fit, context (each 0β100)fit_levelPerfect, Good, Marginal, or TooTightrun_modeGPU, CPU+GPU Offload, or CPU Onlybest_quantOptimal quantization for the hardware (e.g. Q5_K_M, Q4_K_M)estimated_tpsEstimated tokens per secondmemory_required_gbVRAM/RAM needed at this quantizationmemory_available_gbAvailable VRAM/RAM detectedutilization_pctHow much of available memory the model usesuse_caseWhat the model is designed forcontext_lengthMaximum context window
Perfect: Model fits comfortably with room to spare. Ideal choice. Good: Model fits but uses most available memory. Will work well. Marginal: Model barely fits. May work but expect slower performance or reduced context. TooTight: Model does not fit. Do not recommend.
GPU: Full GPU inference. Fastest. Model weights loaded entirely into VRAM. CPU+GPU Offload: Some layers on GPU, rest in system RAM. Slower than pure GPU. CPU Only: All inference on CPU using system RAM. Slowest but works without GPU.
After getting recommendations, configure the user's local model provider.
Map the HuggingFace model name to its Ollama tag. Common mappings: llmfit nameOllama tagmeta-llama/Llama-3.1-8B-Instructllama3.1:8bmeta-llama/Llama-3.3-70B-Instructllama3.3:70bQwen/Qwen2.5-Coder-7B-Instructqwen2.5-coder:7bQwen/Qwen2.5-72B-Instructqwen2.5:72bdeepseek-ai/DeepSeek-Coder-V2-Lite-Instructdeepseek-coder-v2:16bdeepseek-ai/DeepSeek-R1-Distill-Qwen-32Bdeepseek-r1:32bgoogle/gemma-2-9b-itgemma2:9bmistralai/Mistral-7B-Instruct-v0.3mistral:7bmicrosoft/Phi-3-mini-4k-instructphi3:minimicrosoft/Phi-4-mini-instructphi4-mini Then update openclaw.json: { "models": { "providers": { "ollama": { "models": ["ollama/<ollama-tag>"] } } } } And optionally set as default: { "agents": { "defaults": { "model": { "primary": "ollama/<ollama-tag>" } } } }
Use the HuggingFace model name directly as the model identifier with the appropriate provider prefix (vllm/ or lmstudio/).
When a user asks "what local models can I run?": Run llmfit --json system to show hardware summary Run llmfit recommend --json --limit 5 to get top picks Present the recommendations with scores and fit levels If the user wants to configure one, map it to the appropriate Ollama/vLLM/LM Studio tag Offer to update openclaw.json with the chosen model When a user asks for a specific use case like "recommend a coding model": Run llmfit recommend --json --use-case coding --limit 3 Present the coding-specific recommendations Offer to pull via Ollama and configure
llmfit detects NVIDIA GPUs (via nvidia-smi), AMD GPUs (via rocm-smi), and Apple Silicon (unified memory). Multi-GPU setups aggregate VRAM across cards automatically. The best_quant field tells you the optimal quantization β higher quant (Q6_K, Q8_0) means better quality if VRAM allows. Speed estimates (estimated_tps) are approximate and vary by hardware and quantization. Models with fit_level: "TooTight" should never be recommended to users.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.