Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. O...
Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. O...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Tuna is a hybrid GPU inference orchestrator. It lets you deploy, serve, and manage LLM models (Llama, Qwen, Mistral, DeepSeek, Gemma, and any HuggingFace model) on serverless GPUs from Modal, RunPod, Cerebrium, Google Cloud Run, Baseten, or Azure Container Apps, with optional spot instance fallback on AWS via SkyPilot. Every deployment gets an OpenAI-compatible /v1/chat/completions endpoint. The key idea: serverless GPUs handle requests immediately (fast cold start, pay-per-second) while a cheaper spot GPU boots in the background. Once spot is ready, traffic shifts there. If spot gets preempted, traffic falls back to serverless automatically. This gives you 3β5x cost savings over pure serverless with zero downtime.
# 1. Install tuna uv pip install tandemn-tuna # 2. Deploy a model (auto-picks cheapest serverless provider for the GPU) tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --service-name my-llm # 3. Query your endpoint (shown in deploy output) curl http://<router-ip>:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello!"}]}' For serverless-only (no spot, no AWS needed): tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only
Deploy a model across serverless + spot infrastructure. This is the main command. tuna deploy --model <HuggingFace-model-ID> --gpu <GPU> [options] Required arguments: --model β HuggingFace model ID (e.g., Qwen/Qwen3-0.6B, meta-llama/Llama-3-70b) --gpu β GPU type (e.g., T4, L4, L40S, A100, H100, B200) Common options: --service-name β Name for the deployment (auto-generated if omitted) --serverless-provider β Force a specific provider: modal, runpod, cloudrun, baseten, azure, cerebrium (default: cheapest available) --serverless-only β Serverless only, no spot backend or router (no AWS needed) --gpu-count β Number of GPUs (default: 1) --tp-size β Tensor parallel size (default: 1) --max-model-len β Max sequence length (default: 4096) --spots-cloud β Cloud for spot GPUs: aws or azure (default: aws) --region β Cloud region for spot instances --concurrency β Override serverless concurrency limit --no-scale-to-zero β Keep at least 1 spot replica running --public β Make endpoint publicly accessible (no auth) --scaling-policy β Path to YAML with scaling parameters Provider-specific options: --gcp-project, --gcp-region β For Cloud Run --azure-subscription, --azure-resource-group, --azure-region, --azure-environment β For Azure Examples: # Deploy Llama 3 on Modal with hybrid spot tuna deploy --model meta-llama/Llama-3-8b --gpu A100 --serverless-provider modal # Deploy on RunPod, serverless-only tuna deploy --model mistralai/Mistral-7B-Instruct-v0.3 --gpu L40S --serverless-provider runpod --serverless-only # Deploy on Azure with an existing environment tuna deploy --model Qwen/Qwen3-0.6B --gpu T4 --serverless-provider azure --azure-environment my-env # Deploy a large model with tensor parallelism tuna deploy --model meta-llama/Llama-3-70b --gpu H100 --gpu-count 4 --tp-size 4
Show GPU pricing from all serverless providers, optionally including spot prices. tuna show-gpus [--gpu <GPU>] [--provider <provider>] [--spot] Examples: # Show all GPU prices across all providers tuna show-gpus # Show H100 pricing specifically tuna show-gpus --gpu H100 # Show Modal's prices only tuna show-gpus --provider modal # Include AWS spot prices for comparison tuna show-gpus --spot
Run preflight checks to verify credentials, CLIs, and quotas for a provider before deploying. tuna check --provider <provider> [--gpu <GPU>] Examples: # Check Modal setup tuna check --provider modal # Check Azure with specific GPU tuna check --provider azure --gpu T4 --azure-subscription <id> --azure-resource-group <rg>
tuna status --service-name <name>
tuna cost --service-name <name>
tuna list [--status active|destroyed|failed]
# Destroy a specific deployment tuna destroy --service-name <name> # Destroy all deployments tuna destroy --all
Each serverless provider needs its own credentials. Run tuna check --provider <name> to verify setup.
pip install modal # or: uv pip install tandemn-tuna[modal] modal token new # opens browser to authenticate No environment variables needed β token is stored in Modal's config.
export RUNPOD_API_KEY="your-api-key" Get your API key from the RunPod console.
pip install google-cloud-run # or: uv pip install tandemn-tuna[cloudrun] gcloud auth login gcloud auth application-default login Optionally set GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_REGION, or pass --gcp-project and --gcp-region.
pip install truss # or: uv pip install tandemn-tuna[baseten] export BASETEN_API_KEY="your-api-key" truss login --api-key $BASETEN_API_KEY
pip install azure-mgmt-appcontainers azure-identity # or: uv pip install tandemn-tuna[azure] az login az provider register --namespace Microsoft.App az provider register --namespace Microsoft.OperationalInsights Pass --azure-subscription, --azure-resource-group, and --azure-region on deploy, or set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_REGION env vars. First deploy creates a GPU environment (~30 min); subsequent deploys reuse it (~2 min). Use --azure-environment to specify an existing environment.
pip install cerebrium # or: uv pip install tandemn-tuna[cerebrium] cerebrium login export CEREBRIUM_API_KEY="your-api-key" Note: Hobby plan gives T4, A10, L4, L40S. A100 and H100 require Enterprise.
Spot is included automatically in hybrid deploys. Just configure AWS: aws configure # set access key, secret key, region Use --serverless-only to skip spot if you don't have AWS set up.
When the user wants to deploy a model for quick testing: Use --serverless-only to skip spot setup. Pick a small GPU like L4 or T4. Example: tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only When the user wants the cheapest deployment: First run tuna show-gpus --spot to compare serverless and spot prices. Then deploy with hybrid mode (the default) to get spot savings. The auto provider selector already picks the cheapest serverless option for the chosen GPU. When the user wants to compare GPU prices: tuna show-gpus tuna show-gpus --gpu A100 tuna show-gpus --spot # includes AWS spot prices When the user asks "which providers support H100?" or a specific GPU: tuna show-gpus --gpu H100 When the user wants to deploy on a specific provider: Use --serverless-provider <name>. Run tuna check --provider <name> first to verify credentials. When the user wants to deploy a large model (70B+): Use multiple GPUs with tensor parallelism: tuna deploy --model meta-llama/Llama-3-70b --gpu H100 --gpu-count 4 --tp-size 4 When the user wants to check if their setup is ready: tuna check --provider modal tuna check --provider runpod When the user wants to see what's currently deployed: tuna list tuna list --status active When the user wants to tear down everything: tuna destroy --all
All GPU types that tuna supports across its providers: GPUVRAMArchitectureAvailable OnT416 GBTuringModal, RunPod, Baseten, Azure, Cerebrium, SpotA1024 GBAmpereCerebriumA10G24 GBAmpereModal, Baseten, SpotA400016 GBAmpereRunPodA500024 GBAmpereRunPodRTX 409024 GBAdaRunPodL424 GBAdaModal, RunPod, Cloud Run, Baseten, Cerebrium, SpotA4048 GBAmpereRunPodA600048 GBAmpereRunPodL4048 GBAdaRunPodL40S48 GBAdaModal, RunPod, Cerebrium, SpotA100 (40 GB)40 GBAmpereModal, Cerebrium, SpotA100 (80 GB)80 GBAmpereModal, RunPod, Azure, Baseten, Cerebrium, SpotH10080 GBHopperModal, RunPod, Baseten, Cerebrium, SpotH200141 GBHopperSpotB200192 GBBlackwellModal, BasetenRTX PRO 600032 GBBlackwellCloud Run Use tuna show-gpus for current pricing across all providers.
Preflight check fails (tuna check): The output tells you exactly what's wrong β missing CLI tool, expired credentials, unregistered provider, insufficient quota. Fix the reported issue and re-run tuna check. Deploy fails: Run tuna check --provider <provider> --gpu <gpu> to validate the environment Add -v for verbose logs: tuna deploy -v ... Check tuna status --service-name <name> for deployment state Spot instance not available: Spot GPUs depend on cloud availability. If spot fails to launch, the serverless backend keeps serving β no downtime. Try a different region with --region, or use --serverless-only. "No provider supports GPU X": Run tuna show-gpus --gpu <GPU> to see which providers offer that GPU. Not all GPUs are available on all providers. Azure environment takes too long: First Azure deploy creates a GPU environment (~30 min). Subsequent deploys reuse it (~2 min). Use --azure-environment to specify an existing one.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.