Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification...
Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Production-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose.
Environment Auto-Check - Detects and repairs missing dependencies Model Parameter Detection - Auto-reads config.json for optimal settings VRAM Estimation - Calculates memory requirements before deployment Secure Token Handling - Never writes tokens to compose files Structured Output - All logs and test results saved per-model Deployment Reports - Human-readable summary for each deployment Health Verification - Automated health checks and functional tests Troubleshooting Guide - Common issues and solutions
Recommended (for production): Add to ~/.bash_profile: # HuggingFace authentication token (required for gated models) export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Model cache directory (optional) export HF_HOME="$HOME/models" # Apply changes source ~/.bash_profile Not required for testing: The skill will proceed without these set: HF_TOKEN: Optional β public models work without it; gated models fail at download with clear error HF_HOME: Optional β defaults to /root/.cache/huggingface/hub
Priority Order: Explicit parameter (highest) β Provided in task/request (e.g., hf_token: "xxx") Environment variable β Already set in shell or from parent process ~/.bash_profile β Source to load variables Default value (lowest) β HF_HOME defaults to /root/.cache/huggingface/hub VariableRequiredIf MissingHF_TOKENConditionalContinue without token (public models work; gated models fail at download with clear error)HF_HOMENoWarning + Default β Use /root/.cache/huggingface/hub Philosophy: Fail fast for configuration errors, fail at download time for authentication errors.
Location: <skill-dir>/scripts/
Validate and load environment variables before deployment. Usage: # Basic check (HF_TOKEN optional, HF_HOME optional with default) ./scripts/check-env.sh # Strict mode (HF_HOME required, fails if not set) ./scripts/check-env.sh --strict # Quiet mode (minimal output, for automation) ./scripts/check-env.sh --quiet # Test with environment variables HF_TOKEN="hf_xxx" HF_HOME="/models" ./scripts/check-env.sh Exit Codes: CodeMeaning0Environment check completed (variables loaded or defaulted)2Critical error (e.g., cannot source ~/.bash_profile) Note: This script is optional. You can also directly run source ~/.bash_profile.
Generate human-readable deployment report after successful deployment. Usage: ./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used] # Example: ./scripts/generate-report.sh \ "Qwen-Qwen3-0.6B" \ "vllm-qwen3-0-6b" \ "8001" \ "β Success" \ "3.6" \ "1.2" Parameters: ParameterRequiredDescriptionmodel-idYesModel ID (with / replaced by -)container-nameYesDocker container nameportYesHost port for API endpointstatusYesDeployment status (e.g., "β Success")model-load-timeNoModel loading time in secondsmemory-usedNoMemory consumption in GiB Output: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md Exit Codes: CodeMeaning0Report generated successfully1Missing required parameters2Output directory not found Integration: This script is automatically called in Phase 7 of the deployment workflow.
ParameterTypeRequiredDefaultDescriptionmodel_idStringYes-HuggingFace model IDdocker_imageStringNorocm/vllm-dev:nightlyvLLM Docker imagetensor_parallel_sizeIntegerNo1Number of GPUsportIntegerNo9999API server porthf_homeStringNo${HF_HOME} or /root/.cache/huggingface/hubModel cache directoryhf_tokenSecretConditional${HF_TOKEN}HuggingFace token (optional for public models, required for gated models)max_model_lenIntegerNoAuto-detectMaximum sequence lengthgpu_memory_utilizationFloatNo0.85GPU memory utilizationauto_installBooleanNotrueAuto-install dependencieslog_levelStringNoINFOLogging verbosity
All deployment artifacts MUST be saved to: $HOME/vllm-compose/<model-id-slash-to-dash>/ Convert model ID to directory name by replacing / with -: openai/gpt-oss-20b β $HOME/vllm-compose/openai-gpt-oss-20b/ Qwen/Qwen3-Coder-Next-FP8 β $HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/ Per-model directory structure: $HOME/vllm-compose/<model-id>/ βββ deployment.log # Full deployment logs (stdout + stderr) βββ test-results.json # Functional test results (JSON format) βββ docker-compose.yml # Generated Docker Compose file βββ .env # HF_TOKEN environment (chmod 600, optional) βββ DEPLOYMENT_REPORT.md # Human-readable deployment summary File requirements: deployment.log β Capture ALL container logs during deployment test-results.json β Save API response from functional test request DEPLOYMENT_REPORT.md β Generated in Phase 7 All three files MUST exist before marking deployment as complete
Step 0.1: Load Environment Variables # Source ~/.bash_profile to load HF_HOME and HF_TOKEN source ~/.bash_profile # If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub If HF_HOME is not defined in ~/.bash_profile, it defaults to /root/.cache/huggingface/hub. Step 0.2: Create Output Directory Create: $HOME/vllm-compose/<model-id>/ Step 0.3: Initialize Logging All output β $HOME/vllm-compose/<model-id>/deployment.log Step 0.4: System Checks Detect OS and package manager Check Python, pip, huggingface_hub Check Docker, docker compose Check ROCm tools (rocm-smi/amd-smi) Check GPU access (/dev/kfd, /dev/dri) Check disk space (20GB minimum)
Use HF_HOME from Phase 0 (environment variable or default): # Download model to HF_HOME huggingface-cli download <model_id> --local-dir "$HF_HOME/hub/models--<org>--<model>" # Or use snapshot_download via Python: python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')" Authentication Handling: ScenarioBehaviorPublic model + no tokenβ Download succeedsPublic model + token providedβ Download succeedsGated model + no tokenβ Download fails with "authentication required" errorGated model + invalid tokenβ Download fails with "invalid token" errorGated model + valid tokenβ Download succeeds On Authentication Failure: echo "ERROR: Model download failed - authentication required" echo "This model requires a valid HF_TOKEN." echo "" echo "Please add to ~/.bash_profile:" echo " export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"" echo "Then run: source ~/.bash_profile" exit 1 Locate model path in HF cache: $HF_HOME/hub/models--<org>--<model-name>/ Log download progress to deployment.log
Read config.json from model Auto-detect: max_model_len, hidden_size, num_attention_heads, num_hidden_layers, vocab_size, dtype Validate TP size divides attention heads Estimate VRAM requirement
Generate files in output directory: docker-compose.yml β $HOME/vllm-compose/<model-id>/docker-compose.yml Mount HF_HOME as volume (read-only for models) NO hardcoded tokens in compose file .env β $HOME/vllm-compose/<model-id>/.env (optional) Contains: HF_TOKEN=<value> Permissions: chmod 600 Only created if user explicitly requests persistent token storage Volume mount example: volumes: - ${HF_HOME}:/root/.cache/huggingface/hub:ro - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri Important: Docker Compose reads ${HF_HOME} from the host environment at runtime. Before running docker compose, source ~/.bash_profile: source ~/.bash_profile
Important: Before deploying, pull the latest image to ensure updates: docker pull rocm/vllm-dev:nightly Note: Default port is 9999. Before running docker compose, check if port is available: ss -tlnp | grep :<port>. If port is in use, specify a different port in docker-compose.yml. Pass HF_TOKEN at runtime: HF_TOKEN=$HF_TOKEN docker compose up -d Wait for container initialization
Check container status Test /health endpoint Test /v1/models endpoint
Run completion test via /v1/chat/completions API Save response to: $HOME/vllm-compose/<model-id>/test-results.json Verify response contains valid completion Log deployment complete β Append to deployment.log Deployment is complete only when both files exist: deployment.log test-results.json
Generate human-readable deployment report using the helper script. Step 7.1: Extract Deployment Metrics # Parse deployment.log for metrics MODEL_LOAD_TIME=$(grep -o "model loading took [0-9.]* seconds" deployment.log | grep -o '[0-9.]*' || echo "N/A") MEMORY_USED=$(grep -o "took [0-9.]* GiB memory" deployment.log | grep -o '[0-9.]*' || echo "N/A") Step 7.2: Generate Report # Execute the report generation script <skill-dir>/scripts/generate-report.sh \ "<model-id>" \ "<container-name>" \ "<port>" \ "<status>" \ "$MODEL_LOAD_TIME" \ "$MEMORY_USED" # Example: ./scripts/generate-report.sh \ "Qwen-Qwen3-0.6B" \ "vllm-qwen3-0-6b" \ "8001" \ "β Success" \ "3.6" \ "1.2" Output: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md Report Contents: Output structure verification (file checklist) Deployment summary table (health, test, metrics) Test results (request/response preview) Environment configuration Quick commands for operations Completion Criteria: DEPLOYMENT_REPORT.md exists in output directory Report contains all required sections All file checks show β
Never commit tokens to version control β Add .env to .gitignore Use .env files with chmod 600 β Restrict access to owner only Mask tokens in logs β Show only first 10 chars: ${TOKEN:0:10}... Pass tokens at runtime β HF_TOKEN=$HF_TOKEN docker compose up -d Store tokens in ~/.bash_profile β For production environments, set HF_TOKEN in user's shell config Set token for gated models β HF_TOKEN is validated at download time; set in ~/.bash_profile for production
IssueSolutionHF_TOKEN not setAdd export HF_TOKEN="hf_xxx" to ~/.bash_profile, then source ~/.bash_profile. Or provide via parameter.HF_HOME not setdefaults to /root/.cache/huggingface/hub. For production, add export HF_HOME="/path" to ~/.bash_profile.~/.bash_profile not foundCreate ~/.bash_profile and add environment variables.Changes not taking effectRun source ~/.bash_profile or restart terminal.HF_TOKEN provided but download still failsToken may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens
IssueSolutionAuthentication required (gated model)Set HF_TOKEN in ~/.bash_profile or provide via parameter. Ensure token has access to the model.Model not foundVerify model ID is correct (case-sensitive). Check model exists on HuggingFace.Download timeoutCheck network connection. Large models may take time.
IssueSolutionhf CLI not foundpip install huggingface_hubDocker Compose failsUse docker compose (no hyphen)GPU access failsAdd user to render group: sudo usermod -aG render $USERPort in useChange port parameterOOMReduce gpu_memory_utilization
cd $HOME/vllm-compose/<model-id> docker compose down
Check deployment status and logs: # View deployment directory ls -la $HOME/vllm-compose/<model-id>/ # View live logs tail -f $HOME/vllm-compose/<model-id>/deployment.log # View test results cat $HOME/vllm-compose/<model-id>/test-results.json # Check container status docker ps | grep <model-id> # Verify environment variables echo "HF_TOKEN: ${HF_TOKEN:0:10}..." echo "HF_HOME: $HF_HOME"
Step 1: Add environment variables to ~/.bash_profile # Required: HuggingFace token export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Recommended: Custom model storage path (production) export HF_HOME="/data/models/huggingface" # Apply changes source ~/.bash_profile Step 2: Verify environment is ready # Source ~/.bash_profile to load variables source ~/.bash_profile # Expected output: # === Environment Ready === # Summary: # HF_TOKEN: hf_xxxxxx... # HF_HOME: /data/models/huggingface Step 3: Run deployment # The skill will automatically: # 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN # 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults) # 3. Proceed without token for public models # 4. Fail at download time with clear error if gated model requires token
VersionChanges1.0.0Initial release
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.