{
  "schemaVersion": "1.0",
  "item": {
    "slug": "rocm-vllm-deployment",
    "name": "ROCm vLLM Deployment",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/alexhegit/rocm-vllm-deployment",
    "canonicalUrl": "https://clawhub.ai/alexhegit/rocm-vllm-deployment",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/rocm-vllm-deployment",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=rocm-vllm-deployment",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md",
      "scripts/check-env.sh",
      "scripts/generate-report.sh"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/rocm-vllm-deployment"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/rocm-vllm-deployment",
    "agentPageUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent",
    "manifestUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "ROCm vLLM Deployment Skill",
        "body": "Production-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose."
      },
      {
        "title": "Features",
        "body": "Environment Auto-Check - Detects and repairs missing dependencies\nModel Parameter Detection - Auto-reads config.json for optimal settings\nVRAM Estimation - Calculates memory requirements before deployment\nSecure Token Handling - Never writes tokens to compose files\nStructured Output - All logs and test results saved per-model\nDeployment Reports - Human-readable summary for each deployment\nHealth Verification - Automated health checks and functional tests\nTroubleshooting Guide - Common issues and solutions"
      },
      {
        "title": "Environment Prerequisites",
        "body": "Recommended (for production): Add to ~/.bash_profile:\n\n# HuggingFace authentication token (required for gated models)\nexport HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n\n# Model cache directory (optional)\nexport HF_HOME=\"$HOME/models\"\n\n# Apply changes\nsource ~/.bash_profile\n\nNot required for testing: The skill will proceed without these set:\n\nHF_TOKEN: Optional — public models work without it; gated models fail at download with clear error\nHF_HOME: Optional — defaults to /root/.cache/huggingface/hub"
      },
      {
        "title": "Environment Variable Detection",
        "body": "Priority Order:\n\nExplicit parameter (highest) — Provided in task/request (e.g., hf_token: \"xxx\")\nEnvironment variable — Already set in shell or from parent process\n~/.bash_profile — Source to load variables\nDefault value (lowest) — HF_HOME defaults to /root/.cache/huggingface/hub\n\nVariableRequiredIf MissingHF_TOKENConditionalContinue without token (public models work; gated models fail at download with clear error)HF_HOMENoWarning + Default — Use /root/.cache/huggingface/hub\n\nPhilosophy: Fail fast for configuration errors, fail at download time for authentication errors."
      },
      {
        "title": "Helper Scripts",
        "body": "Location: <skill-dir>/scripts/"
      },
      {
        "title": "check-env.sh",
        "body": "Validate and load environment variables before deployment.\n\nUsage:\n\n# Basic check (HF_TOKEN optional, HF_HOME optional with default)\n./scripts/check-env.sh\n\n# Strict mode (HF_HOME required, fails if not set)\n./scripts/check-env.sh --strict\n\n# Quiet mode (minimal output, for automation)\n./scripts/check-env.sh --quiet\n\n# Test with environment variables\nHF_TOKEN=\"hf_xxx\" HF_HOME=\"/models\" ./scripts/check-env.sh\n\nExit Codes:\n\nCodeMeaning0Environment check completed (variables loaded or defaulted)2Critical error (e.g., cannot source ~/.bash_profile)\n\nNote: This script is optional. You can also directly run source ~/.bash_profile."
      },
      {
        "title": "generate-report.sh",
        "body": "Generate human-readable deployment report after successful deployment.\n\nUsage:\n\n./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used]\n\n# Example:\n./scripts/generate-report.sh \\\n  \"Qwen-Qwen3-0.6B\" \\\n  \"vllm-qwen3-0-6b\" \\\n  \"8001\" \\\n  \"✅ Success\" \\\n  \"3.6\" \\\n  \"1.2\"\n\nParameters:\n\nParameterRequiredDescriptionmodel-idYesModel ID (with / replaced by -)container-nameYesDocker container nameportYesHost port for API endpointstatusYesDeployment status (e.g., \"✅ Success\")model-load-timeNoModel loading time in secondsmemory-usedNoMemory consumption in GiB\n\nOutput: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md\n\nExit Codes:\n\nCodeMeaning0Report generated successfully1Missing required parameters2Output directory not found\n\nIntegration: This script is automatically called in Phase 7 of the deployment workflow."
      },
      {
        "title": "Input Schema",
        "body": "ParameterTypeRequiredDefaultDescriptionmodel_idStringYes-HuggingFace model IDdocker_imageStringNorocm/vllm-dev:nightlyvLLM Docker imagetensor_parallel_sizeIntegerNo1Number of GPUsportIntegerNo9999API server porthf_homeStringNo${HF_HOME} or /root/.cache/huggingface/hubModel cache directoryhf_tokenSecretConditional${HF_TOKEN}HuggingFace token (optional for public models, required for gated models)max_model_lenIntegerNoAuto-detectMaximum sequence lengthgpu_memory_utilizationFloatNo0.85GPU memory utilizationauto_installBooleanNotrueAuto-install dependencieslog_levelStringNoINFOLogging verbosity"
      },
      {
        "title": "Output Structure",
        "body": "All deployment artifacts MUST be saved to:\n\n$HOME/vllm-compose/<model-id-slash-to-dash>/\n\nConvert model ID to directory name by replacing / with -:\n\nopenai/gpt-oss-20b → $HOME/vllm-compose/openai-gpt-oss-20b/\nQwen/Qwen3-Coder-Next-FP8 → $HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/\n\nPer-model directory structure:\n\n$HOME/vllm-compose/<model-id>/\n├── deployment.log          # Full deployment logs (stdout + stderr)\n├── test-results.json       # Functional test results (JSON format)\n├── docker-compose.yml      # Generated Docker Compose file\n├── .env                    # HF_TOKEN environment (chmod 600, optional)\n└── DEPLOYMENT_REPORT.md    # Human-readable deployment summary\n\nFile requirements:\n\ndeployment.log — Capture ALL container logs during deployment\ntest-results.json — Save API response from functional test request\nDEPLOYMENT_REPORT.md — Generated in Phase 7\nAll three files MUST exist before marking deployment as complete"
      },
      {
        "title": "Phase 0: Environment Check & Auto-Repair",
        "body": "Step 0.1: Load Environment Variables\n\n# Source ~/.bash_profile to load HF_HOME and HF_TOKEN\nsource ~/.bash_profile\n\n# If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub\n\nIf HF_HOME is not defined in ~/.bash_profile, it defaults to /root/.cache/huggingface/hub.\n\nStep 0.2: Create Output Directory\n\nCreate: $HOME/vllm-compose/<model-id>/\n\nStep 0.3: Initialize Logging\n\nAll output → $HOME/vllm-compose/<model-id>/deployment.log\n\nStep 0.4: System Checks\n\nDetect OS and package manager\nCheck Python, pip, huggingface_hub\nCheck Docker, docker compose\nCheck ROCm tools (rocm-smi/amd-smi)\nCheck GPU access (/dev/kfd, /dev/dri)\nCheck disk space (20GB minimum)"
      },
      {
        "title": "Phase 1: Model Download",
        "body": "Use HF_HOME from Phase 0 (environment variable or default):\n\n# Download model to HF_HOME\nhuggingface-cli download <model_id> --local-dir \"$HF_HOME/hub/models--<org>--<model>\"\n\n# Or use snapshot_download via Python:\npython -c \"from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')\"\n\nAuthentication Handling:\n\nScenarioBehaviorPublic model + no token✅ Download succeedsPublic model + token provided✅ Download succeedsGated model + no token❌ Download fails with \"authentication required\" errorGated model + invalid token❌ Download fails with \"invalid token\" errorGated model + valid token✅ Download succeeds\n\nOn Authentication Failure:\n\necho \"ERROR: Model download failed - authentication required\"\necho \"This model requires a valid HF_TOKEN.\"\necho \"\"\necho \"Please add to ~/.bash_profile:\"\necho \"  export HF_TOKEN=\\\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\\"\"\necho \"Then run: source ~/.bash_profile\"\nexit 1\n\nLocate model path in HF cache: $HF_HOME/hub/models--<org>--<model-name>/\nLog download progress to deployment.log"
      },
      {
        "title": "Phase 2: Model Parameter Detection",
        "body": "Read config.json from model\nAuto-detect: max_model_len, hidden_size, num_attention_heads, num_hidden_layers, vocab_size, dtype\nValidate TP size divides attention heads\nEstimate VRAM requirement"
      },
      {
        "title": "Phase 3: Docker Compose Configuration",
        "body": "Generate files in output directory:\n\ndocker-compose.yml → $HOME/vllm-compose/<model-id>/docker-compose.yml\n\nMount HF_HOME as volume (read-only for models)\nNO hardcoded tokens in compose file\n\n\n\n.env → $HOME/vllm-compose/<model-id>/.env (optional)\n\nContains: HF_TOKEN=<value>\nPermissions: chmod 600\nOnly created if user explicitly requests persistent token storage\n\nVolume mount example:\n\nvolumes:\n  - ${HF_HOME}:/root/.cache/huggingface/hub:ro\n  - /dev/kfd:/dev/kfd\n  - /dev/dri:/dev/dri\n\nImportant: Docker Compose reads ${HF_HOME} from the host environment at runtime. Before running docker compose, source ~/.bash_profile: source ~/.bash_profile"
      },
      {
        "title": "Phase 4: Container Launch",
        "body": "Important: Before deploying, pull the latest image to ensure updates:\n\ndocker pull rocm/vllm-dev:nightly\n\nNote: Default port is 9999. Before running docker compose, check if port is available: ss -tlnp | grep :<port>. If port is in use, specify a different port in docker-compose.yml.\n\nPass HF_TOKEN at runtime: HF_TOKEN=$HF_TOKEN docker compose up -d\nWait for container initialization"
      },
      {
        "title": "Phase 5: Health Verification",
        "body": "Check container status\nTest /health endpoint\nTest /v1/models endpoint"
      },
      {
        "title": "Phase 6: Functional Testing",
        "body": "Run completion test via /v1/chat/completions API\nSave response to: $HOME/vllm-compose/<model-id>/test-results.json\nVerify response contains valid completion\nLog deployment complete → Append to deployment.log\nDeployment is complete only when both files exist:\n\ndeployment.log\ntest-results.json"
      },
      {
        "title": "Phase 7: Deployment Report",
        "body": "Generate human-readable deployment report using the helper script.\n\nStep 7.1: Extract Deployment Metrics\n\n# Parse deployment.log for metrics\nMODEL_LOAD_TIME=$(grep -o \"model loading took [0-9.]* seconds\" deployment.log | grep -o '[0-9.]*' || echo \"N/A\")\nMEMORY_USED=$(grep -o \"took [0-9.]* GiB memory\" deployment.log | grep -o '[0-9.]*' || echo \"N/A\")\n\nStep 7.2: Generate Report\n\n# Execute the report generation script\n<skill-dir>/scripts/generate-report.sh \\\n  \"<model-id>\" \\\n  \"<container-name>\" \\\n  \"<port>\" \\\n  \"<status>\" \\\n  \"$MODEL_LOAD_TIME\" \\\n  \"$MEMORY_USED\"\n\n# Example:\n./scripts/generate-report.sh \\\n  \"Qwen-Qwen3-0.6B\" \\\n  \"vllm-qwen3-0-6b\" \\\n  \"8001\" \\\n  \"✅ Success\" \\\n  \"3.6\" \\\n  \"1.2\"\n\nOutput: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md\n\nReport Contents:\n\nOutput structure verification (file checklist)\nDeployment summary table (health, test, metrics)\nTest results (request/response preview)\nEnvironment configuration\nQuick commands for operations\n\nCompletion Criteria:\n\nDEPLOYMENT_REPORT.md exists in output directory\nReport contains all required sections\nAll file checks show ✅"
      },
      {
        "title": "Security Best Practices",
        "body": "Never commit tokens to version control — Add .env to .gitignore\nUse .env files with chmod 600 — Restrict access to owner only\nMask tokens in logs — Show only first 10 chars: ${TOKEN:0:10}...\nPass tokens at runtime — HF_TOKEN=$HF_TOKEN docker compose up -d\nStore tokens in ~/.bash_profile — For production environments, set HF_TOKEN in user's shell config\nSet token for gated models — HF_TOKEN is validated at download time; set in ~/.bash_profile for production"
      },
      {
        "title": "Environment Variables",
        "body": "IssueSolutionHF_TOKEN not setAdd export HF_TOKEN=\"hf_xxx\" to ~/.bash_profile, then source ~/.bash_profile. Or provide via parameter.HF_HOME not setdefaults to /root/.cache/huggingface/hub. For production, add export HF_HOME=\"/path\" to ~/.bash_profile.~/.bash_profile not foundCreate ~/.bash_profile and add environment variables.Changes not taking effectRun source ~/.bash_profile or restart terminal.HF_TOKEN provided but download still failsToken may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens"
      },
      {
        "title": "Model Download",
        "body": "IssueSolutionAuthentication required (gated model)Set HF_TOKEN in ~/.bash_profile or provide via parameter. Ensure token has access to the model.Model not foundVerify model ID is correct (case-sensitive). Check model exists on HuggingFace.Download timeoutCheck network connection. Large models may take time."
      },
      {
        "title": "Deployment",
        "body": "IssueSolutionhf CLI not foundpip install huggingface_hubDocker Compose failsUse docker compose (no hyphen)GPU access failsAdd user to render group: sudo usermod -aG render $USERPort in useChange port parameterOOMReduce gpu_memory_utilization"
      },
      {
        "title": "Cleanup",
        "body": "cd $HOME/vllm-compose/<model-id>\ndocker compose down"
      },
      {
        "title": "Status Check",
        "body": "Check deployment status and logs:\n\n# View deployment directory\nls -la $HOME/vllm-compose/<model-id>/\n\n# View live logs\ntail -f $HOME/vllm-compose/<model-id>/deployment.log\n\n# View test results\ncat $HOME/vllm-compose/<model-id>/test-results.json\n\n# Check container status\ndocker ps | grep <model-id>\n\n# Verify environment variables\necho \"HF_TOKEN: ${HF_TOKEN:0:10}...\"\necho \"HF_HOME: $HF_HOME\""
      },
      {
        "title": "Quick Start (Production)",
        "body": "Step 1: Add environment variables to ~/.bash_profile\n\n# Required: HuggingFace token\nexport HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n\n# Recommended: Custom model storage path (production)\nexport HF_HOME=\"/data/models/huggingface\"\n\n# Apply changes\nsource ~/.bash_profile\n\nStep 2: Verify environment is ready\n\n# Source ~/.bash_profile to load variables\nsource ~/.bash_profile\n\n# Expected output:\n# === Environment Ready ===\n# Summary:\n#   HF_TOKEN: hf_xxxxxx...\n#   HF_HOME:  /data/models/huggingface\n\nStep 3: Run deployment\n\n# The skill will automatically:\n# 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN\n# 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults)\n# 3. Proceed without token for public models\n# 4. Fail at download time with clear error if gated model requires token"
      },
      {
        "title": "Version History",
        "body": "VersionChanges1.0.0Initial release"
      }
    ],
    "body": "ROCm vLLM Deployment Skill\n\nProduction-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose.\n\nFeatures\nEnvironment Auto-Check - Detects and repairs missing dependencies\nModel Parameter Detection - Auto-reads config.json for optimal settings\nVRAM Estimation - Calculates memory requirements before deployment\nSecure Token Handling - Never writes tokens to compose files\nStructured Output - All logs and test results saved per-model\nDeployment Reports - Human-readable summary for each deployment\nHealth Verification - Automated health checks and functional tests\nTroubleshooting Guide - Common issues and solutions\nEnvironment Prerequisites\n\nRecommended (for production): Add to ~/.bash_profile:\n\n# HuggingFace authentication token (required for gated models)\nexport HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n\n# Model cache directory (optional)\nexport HF_HOME=\"$HOME/models\"\n\n# Apply changes\nsource ~/.bash_profile\n\n\nNot required for testing: The skill will proceed without these set:\n\nHF_TOKEN: Optional — public models work without it; gated models fail at download with clear error\nHF_HOME: Optional — defaults to /root/.cache/huggingface/hub\nEnvironment Variable Detection\n\nPriority Order:\n\nExplicit parameter (highest) — Provided in task/request (e.g., hf_token: \"xxx\")\nEnvironment variable — Already set in shell or from parent process\n~/.bash_profile — Source to load variables\nDefault value (lowest) — HF_HOME defaults to /root/.cache/huggingface/hub\nVariable\tRequired\tIf Missing\nHF_TOKEN\tConditional\tContinue without token (public models work; gated models fail at download with clear error)\nHF_HOME\tNo\tWarning + Default — Use /root/.cache/huggingface/hub\n\nPhilosophy: Fail fast for configuration errors, fail at download time for authentication errors.\n\nHelper Scripts\n\nLocation: <skill-dir>/scripts/\n\ncheck-env.sh\n\nValidate and load environment variables before deployment.\n\nUsage:\n\n# Basic check (HF_TOKEN optional, HF_HOME optional with default)\n./scripts/check-env.sh\n\n# Strict mode (HF_HOME required, fails if not set)\n./scripts/check-env.sh --strict\n\n# Quiet mode (minimal output, for automation)\n./scripts/check-env.sh --quiet\n\n# Test with environment variables\nHF_TOKEN=\"hf_xxx\" HF_HOME=\"/models\" ./scripts/check-env.sh\n\n\nExit Codes:\n\nCode\tMeaning\n0\tEnvironment check completed (variables loaded or defaulted)\n2\tCritical error (e.g., cannot source ~/.bash_profile)\n\nNote: This script is optional. You can also directly run source ~/.bash_profile.\n\ngenerate-report.sh\n\nGenerate human-readable deployment report after successful deployment.\n\nUsage:\n\n./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used]\n\n# Example:\n./scripts/generate-report.sh \\\n  \"Qwen-Qwen3-0.6B\" \\\n  \"vllm-qwen3-0-6b\" \\\n  \"8001\" \\\n  \"✅ Success\" \\\n  \"3.6\" \\\n  \"1.2\"\n\n\nParameters:\n\nParameter\tRequired\tDescription\nmodel-id\tYes\tModel ID (with / replaced by -)\ncontainer-name\tYes\tDocker container name\nport\tYes\tHost port for API endpoint\nstatus\tYes\tDeployment status (e.g., \"✅ Success\")\nmodel-load-time\tNo\tModel loading time in seconds\nmemory-used\tNo\tMemory consumption in GiB\n\nOutput: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md\n\nExit Codes:\n\nCode\tMeaning\n0\tReport generated successfully\n1\tMissing required parameters\n2\tOutput directory not found\n\nIntegration: This script is automatically called in Phase 7 of the deployment workflow.\n\nInput Schema\nParameter\tType\tRequired\tDefault\tDescription\nmodel_id\tString\tYes\t-\tHuggingFace model ID\ndocker_image\tString\tNo\trocm/vllm-dev:nightly\tvLLM Docker image\ntensor_parallel_size\tInteger\tNo\t1\tNumber of GPUs\nport\tInteger\tNo\t9999\tAPI server port\nhf_home\tString\tNo\t${HF_HOME} or /root/.cache/huggingface/hub\tModel cache directory\nhf_token\tSecret\tConditional\t${HF_TOKEN}\tHuggingFace token (optional for public models, required for gated models)\nmax_model_len\tInteger\tNo\tAuto-detect\tMaximum sequence length\ngpu_memory_utilization\tFloat\tNo\t0.85\tGPU memory utilization\nauto_install\tBoolean\tNo\ttrue\tAuto-install dependencies\nlog_level\tString\tNo\tINFO\tLogging verbosity\nOutput Structure\n\nAll deployment artifacts MUST be saved to:\n\n$HOME/vllm-compose/<model-id-slash-to-dash>/\n\n\nConvert model ID to directory name by replacing / with -:\n\nopenai/gpt-oss-20b → $HOME/vllm-compose/openai-gpt-oss-20b/\nQwen/Qwen3-Coder-Next-FP8 → $HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/\n\nPer-model directory structure:\n\n$HOME/vllm-compose/<model-id>/\n├── deployment.log          # Full deployment logs (stdout + stderr)\n├── test-results.json       # Functional test results (JSON format)\n├── docker-compose.yml      # Generated Docker Compose file\n├── .env                    # HF_TOKEN environment (chmod 600, optional)\n└── DEPLOYMENT_REPORT.md    # Human-readable deployment summary\n\n\nFile requirements:\n\ndeployment.log — Capture ALL container logs during deployment\ntest-results.json — Save API response from functional test request\nDEPLOYMENT_REPORT.md — Generated in Phase 7\nAll three files MUST exist before marking deployment as complete\nExecution Workflow\nPhase 0: Environment Check & Auto-Repair\n\nStep 0.1: Load Environment Variables\n\n# Source ~/.bash_profile to load HF_HOME and HF_TOKEN\nsource ~/.bash_profile\n\n# If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub\n\n\nIf HF_HOME is not defined in ~/.bash_profile, it defaults to /root/.cache/huggingface/hub.\n\nStep 0.2: Create Output Directory\n\nCreate: $HOME/vllm-compose/<model-id>/\n\nStep 0.3: Initialize Logging\n\nAll output → $HOME/vllm-compose/<model-id>/deployment.log\n\nStep 0.4: System Checks\n\nDetect OS and package manager\nCheck Python, pip, huggingface_hub\nCheck Docker, docker compose\nCheck ROCm tools (rocm-smi/amd-smi)\nCheck GPU access (/dev/kfd, /dev/dri)\nCheck disk space (20GB minimum)\nPhase 1: Model Download\n\nUse HF_HOME from Phase 0 (environment variable or default):\n\n# Download model to HF_HOME\nhuggingface-cli download <model_id> --local-dir \"$HF_HOME/hub/models--<org>--<model>\"\n\n# Or use snapshot_download via Python:\npython -c \"from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')\"\n\n\nAuthentication Handling:\n\nScenario\tBehavior\nPublic model + no token\t✅ Download succeeds\nPublic model + token provided\t✅ Download succeeds\nGated model + no token\t❌ Download fails with \"authentication required\" error\nGated model + invalid token\t❌ Download fails with \"invalid token\" error\nGated model + valid token\t✅ Download succeeds\n\nOn Authentication Failure:\n\necho \"ERROR: Model download failed - authentication required\"\necho \"This model requires a valid HF_TOKEN.\"\necho \"\"\necho \"Please add to ~/.bash_profile:\"\necho \"  export HF_TOKEN=\\\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\\"\"\necho \"Then run: source ~/.bash_profile\"\nexit 1\n\nLocate model path in HF cache: $HF_HOME/hub/models--<org>--<model-name>/\nLog download progress to deployment.log\nPhase 2: Model Parameter Detection\nRead config.json from model\nAuto-detect: max_model_len, hidden_size, num_attention_heads, num_hidden_layers, vocab_size, dtype\nValidate TP size divides attention heads\nEstimate VRAM requirement\nPhase 3: Docker Compose Configuration\n\nGenerate files in output directory:\n\ndocker-compose.yml → $HOME/vllm-compose/<model-id>/docker-compose.yml\n\nMount HF_HOME as volume (read-only for models)\nNO hardcoded tokens in compose file\n\n.env → $HOME/vllm-compose/<model-id>/.env (optional)\n\nContains: HF_TOKEN=<value>\nPermissions: chmod 600\nOnly created if user explicitly requests persistent token storage\n\nVolume mount example:\n\nvolumes:\n  - ${HF_HOME}:/root/.cache/huggingface/hub:ro\n  - /dev/kfd:/dev/kfd\n  - /dev/dri:/dev/dri\n\n\nImportant: Docker Compose reads ${HF_HOME} from the host environment at runtime. Before running docker compose, source ~/.bash_profile: source ~/.bash_profile\n\nPhase 4: Container Launch\n\nImportant: Before deploying, pull the latest image to ensure updates:\n\ndocker pull rocm/vllm-dev:nightly\n\n\nNote: Default port is 9999. Before running docker compose, check if port is available: ss -tlnp | grep :<port>. If port is in use, specify a different port in docker-compose.yml.\n\nPass HF_TOKEN at runtime: HF_TOKEN=$HF_TOKEN docker compose up -d\nWait for container initialization\nPhase 5: Health Verification\nCheck container status\nTest /health endpoint\nTest /v1/models endpoint\nPhase 6: Functional Testing\nRun completion test via /v1/chat/completions API\nSave response to: $HOME/vllm-compose/<model-id>/test-results.json\nVerify response contains valid completion\nLog deployment complete → Append to deployment.log\nDeployment is complete only when both files exist:\ndeployment.log\ntest-results.json\nPhase 7: Deployment Report\n\nGenerate human-readable deployment report using the helper script.\n\nStep 7.1: Extract Deployment Metrics\n\n# Parse deployment.log for metrics\nMODEL_LOAD_TIME=$(grep -o \"model loading took [0-9.]* seconds\" deployment.log | grep -o '[0-9.]*' || echo \"N/A\")\nMEMORY_USED=$(grep -o \"took [0-9.]* GiB memory\" deployment.log | grep -o '[0-9.]*' || echo \"N/A\")\n\n\nStep 7.2: Generate Report\n\n# Execute the report generation script\n<skill-dir>/scripts/generate-report.sh \\\n  \"<model-id>\" \\\n  \"<container-name>\" \\\n  \"<port>\" \\\n  \"<status>\" \\\n  \"$MODEL_LOAD_TIME\" \\\n  \"$MEMORY_USED\"\n\n# Example:\n./scripts/generate-report.sh \\\n  \"Qwen-Qwen3-0.6B\" \\\n  \"vllm-qwen3-0-6b\" \\\n  \"8001\" \\\n  \"✅ Success\" \\\n  \"3.6\" \\\n  \"1.2\"\n\n\nOutput: $HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md\n\nReport Contents:\n\nOutput structure verification (file checklist)\nDeployment summary table (health, test, metrics)\nTest results (request/response preview)\nEnvironment configuration\nQuick commands for operations\n\nCompletion Criteria:\n\nDEPLOYMENT_REPORT.md exists in output directory\nReport contains all required sections\nAll file checks show ✅\nSecurity Best Practices\nNever commit tokens to version control — Add .env to .gitignore\nUse .env files with chmod 600 — Restrict access to owner only\nMask tokens in logs — Show only first 10 chars: ${TOKEN:0:10}...\nPass tokens at runtime — HF_TOKEN=$HF_TOKEN docker compose up -d\nStore tokens in ~/.bash_profile — For production environments, set HF_TOKEN in user's shell config\nSet token for gated models — HF_TOKEN is validated at download time; set in ~/.bash_profile for production\nTroubleshooting\nEnvironment Variables\nIssue\tSolution\nHF_TOKEN not set\tAdd export HF_TOKEN=\"hf_xxx\" to ~/.bash_profile, then source ~/.bash_profile. Or provide via parameter.\nHF_HOME not set\tdefaults to /root/.cache/huggingface/hub. For production, add export HF_HOME=\"/path\" to ~/.bash_profile.\n~/.bash_profile not found\tCreate ~/.bash_profile and add environment variables.\nChanges not taking effect\tRun source ~/.bash_profile or restart terminal.\nHF_TOKEN provided but download still fails\tToken may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens\nModel Download\nIssue\tSolution\nAuthentication required (gated model)\tSet HF_TOKEN in ~/.bash_profile or provide via parameter. Ensure token has access to the model.\nModel not found\tVerify model ID is correct (case-sensitive). Check model exists on HuggingFace.\nDownload timeout\tCheck network connection. Large models may take time.\nDeployment\nIssue\tSolution\nhf CLI not found\tpip install huggingface_hub\nDocker Compose fails\tUse docker compose (no hyphen)\nGPU access fails\tAdd user to render group: sudo usermod -aG render $USER\nPort in use\tChange port parameter\nOOM\tReduce gpu_memory_utilization\nCleanup\ncd $HOME/vllm-compose/<model-id>\ndocker compose down\n\nStatus Check\n\nCheck deployment status and logs:\n\n# View deployment directory\nls -la $HOME/vllm-compose/<model-id>/\n\n# View live logs\ntail -f $HOME/vllm-compose/<model-id>/deployment.log\n\n# View test results\ncat $HOME/vllm-compose/<model-id>/test-results.json\n\n# Check container status\ndocker ps | grep <model-id>\n\n# Verify environment variables\necho \"HF_TOKEN: ${HF_TOKEN:0:10}...\"\necho \"HF_HOME: $HF_HOME\"\n\nQuick Start (Production)\n\nStep 1: Add environment variables to ~/.bash_profile\n\n# Required: HuggingFace token\nexport HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n\n# Recommended: Custom model storage path (production)\nexport HF_HOME=\"/data/models/huggingface\"\n\n# Apply changes\nsource ~/.bash_profile\n\n\nStep 2: Verify environment is ready\n\n# Source ~/.bash_profile to load variables\nsource ~/.bash_profile\n\n# Expected output:\n# === Environment Ready ===\n# Summary:\n#   HF_TOKEN: hf_xxxxxx...\n#   HF_HOME:  /data/models/huggingface\n\n\nStep 3: Run deployment\n\n# The skill will automatically:\n# 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN\n# 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults)\n# 3. Proceed without token for public models\n# 4. Fail at download time with clear error if gated model requires token\n\nVersion History\nVersion\tChanges\n1.0.0\tInitial release"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/alexhegit/rocm-vllm-deployment",
    "publisherUrl": "https://clawhub.ai/alexhegit/rocm-vllm-deployment",
    "owner": "alexhegit",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment",
    "downloadUrl": "https://openagent3.xyz/downloads/rocm-vllm-deployment",
    "agentUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent",
    "manifestUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/rocm-vllm-deployment/agent.md"
  }
}