Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
在 GPU 服务器上部署 vLLM 模型服务。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源模型。
在 GPU 服务器上部署 vLLM 模型服务。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源模型。
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
在 GPU 服务器上快速部署 vLLM 模型服务。
🖥️ 多服务器支持 - 配置多个 GPU 服务器,灵活选择 🔍 自动检查 - 一键检查 GPU 状态和端口占用 🤖 模型库 - 预置流行模型配置 ⚡ 快速部署 - 简单命令即可启动服务
创建 ~/.config/gpu-deploy/servers.json: { "servers": { "gpu1": { "host": "gpu1", "user": "lnsoft", "gpu_count": 4, "model_path": "/data/models/llm" }, "my-gpu": { "host": "192.168.1.100", "user": "ubuntu", "gpu_count": 2, "model_path": "/home/ubuntu/models" } }, "default_server": "gpu1" }
# 使用默认服务器 gpu-deploy check # 指定服务器 gpu-deploy check --server gpu1
# 部署预设模型 gpu-deploy deploy deepseek-r1-32b # 指定端口 gpu-deploy deploy deepseek-r1-32b --port 8112
启动 vLLM 模型服务。 gpu-deploy deploy <MODEL_NAME> [--server NAME] [--port PORT] 支持的模型: deepseek-r1-32b - DeepSeek-R1-Distill-Qwen-32B-AWQ llama-3-8b - Llama 3 8B qwen-7b - Qwen 7B mistral-7b - Mistral 7B
gpu-deploy list
gpu-deploy ps [--server NAME]
gpu-deploy stop [--server NAME] [--port PORT]
如果不想用封装脚本,也可以直接用原始命令:
ssh <user>@<host> nvidia-smi
ssh <user>@<host> "lsof -i :<port> 2>/dev/null || echo '端口可用'"
ssh <user>@<host> "tmux new-session -d -s vllm ' source /data/miniconda3/etc/profile.d/conda.sh && \ conda activate vllm && \ cd /data/models/llm && \ vllm serve /data/models/llm/deepseek/DeepSeek-R1-Distill-Qwen-32B-AWQ/ \ --tensor-parallel-size 4 \ --max-model-len 102400 \ --dtype half \ --port 8111 \ --served-model-name gpt-4o-mini '"
在 ~/.config/gpu-deploy/models.json 中添加: { "my-model": { "name": "My Awesome Model", "path": "/path/to/model", "tensor_parallel_size": 2, "max_model_len": 8192, "dtype": "half", "port": 8111, "served_model_name": "my-model" } }
部署前检查 - 总是先运行 check 确认资源可用 后台运行 - 建议使用 tmux/screen 保持服务运行 端口管理 - 不同模型使用不同端口 显存估算 - 7B 模型约需 8-10GB,32B 约需 10-14GB/卡
vLLM 文档: https://docs.vllm.ai 模型下载: https://huggingface.co/models 问题反馈: https://github.com/your-username/gpu-deploy-skill 由 OpenClaw 社区贡献 🦞
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.