Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.
Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Model-agnostic prompt benchmarking across 9 providers. Pass any model ID โ provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.
Comparing LLM models across providers requires manual testing: No systematic way to measure performance across models Cost differences are significant but not easily comparable Quality varies by use case and provider Manual API testing is time-consuming and error-prone
Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.
For 10,000 requests/day with average 28 input + 115 output tokens: Claude Opus 4.6: ~$30.15/day ($903/month) Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month) DeepSeek Chat: ~$0.14/day ($4.20/month) Monthly cost difference (Opus vs Flash-Lite): $901.50
Pass any model ID โ provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes. ProviderExample ModelsPrefixRequired KeyAnthropicclaude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001claude-ANTHROPIC_API_KEYOpenAIgpt-5.2-pro, gpt-5.2, gpt-5.1gpt-, o1, o3OPENAI_API_KEYGooglegemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-litegemini-GOOGLE_API_KEYMistralmistral-large-latest, mistral-small-latestmistral-, mixtral-MISTRAL_API_KEYDeepSeekdeepseek-chat, deepseek-reasonerdeepseek-DEEPSEEK_API_KEYxAIgrok-4-1-fast, grok-3-betagrok-XAI_API_KEYMiniMaxMiniMax-M2.1MiniMax, minimaxMINIMAX_API_KEYQwenqwen3.5-plus, qwen3-max-instructqwenDASHSCOPE_API_KEYMeta Llamameta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instructmeta-llama/, llama-OPENROUTER_API_KEY
ModelInputOutputclaude-opus-4-6$15.00$75.00claude-sonnet-4-6$3.00$15.00claude-haiku-4-5-20251001$1.00$5.00gpt-5.2-pro$21.00$168.00gpt-5.2$1.75$14.00gpt-5.1$2.00$8.00gemini-2.5-pro$1.25$10.00gemini-2.5-flash$0.30$2.50gemini-2.5-flash-lite$0.10$0.40mistral-large-latest$2.00$6.00mistral-small-latest$0.10$0.30deepseek-chat$0.27$1.10deepseek-reasoner$0.55$2.19grok-4-1-fast$5.00$25.00grok-3-beta$3.00$15.00MiniMax-M2.1$0.40$1.60qwen3.5-plus$0.57$2.29qwen3-max-instruct$1.60$6.40meta-llama/llama-4-maverick$0.20$0.60meta-llama/llama-3.3-70b-instruct$0.59$0.79 Note: Unlisted models still work โ cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.
Every test measures: โก Latency โ Response time in milliseconds ๐ฐ Cost โ Exact API cost per request (input + output tokens) ๐ฏ Quality โ Response quality score (0โ100) ๐ Token Usage โ Input and output token counts ๐ Consistency โ Variance across multiple test runs โ Error Tracking โ API failures, timeouts, rate limits
Get instant answers to: Which model is fastest for your prompt? Which is most cost-effective? Which produces best quality responses? How much can you save by switching providers?
PROMPT: "Write a professional customer service response about a delayed shipment" โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ GEMINI 2.5 FLASH-LITE (Google) ๐ฐ MOST AFFORDABLE โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Latency: 523ms โ โ Cost: $0.000025 โ โ Quality: 65/100 โ โ Tokens: 28 in / 87 out โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ DEEPSEEK CHAT (DeepSeek) ๐ก BUDGET PICK โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Latency: 710ms โ โ Cost: $0.000048 โ โ Quality: 70/100 โ โ Tokens: 28 in / 92 out โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ CLAUDE HAIKU 4.5 (Anthropic) ๐ BALANCED PERFORMER โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Latency: 891ms โ โ Cost: $0.000145 โ โ Quality: 78/100 โ โ Tokens: 28 in / 102 out โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ GPT-5.2 (OpenAI) ๐ก EXCELLENT QUALITY โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Latency: 645ms โ โ Cost: $0.000402 โ โ Quality: 88/100 โ โ Tokens: 28 in / 98 out โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ CLAUDE OPUS 4.6 (Anthropic) ๐ HIGHEST QUALITY โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Latency: 1,234ms โ โ Cost: $0.001875 โ โ Quality: 94/100 โ โ Tokens: 28 in / 125 out โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ฏ RECOMMENDATIONS: 1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) โ 99.98% cheaper than Opus 2. Budget pick: DeepSeek Chat ($0.000048/request) โ strong quality at low cost 3. Best quality: Claude Opus 4.6 (94/100) โ state-of-the-art reasoning & analysis 4. Smart pick: Claude Haiku 4.5 ($0.000145/request) โ 81% cheaper, 83% quality match 5. Speed + Quality: GPT-5.2 ($0.000402/request) โ excellent quality at mid-range cost ๐ก Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg): - Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50) - Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50) - Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)
Evaluate models before production selection Compare cost vs quality tradeoffs Benchmark API latency across providers
Test prompt variations across models Measure quality scores consistently Compare performance metrics
Analyze LLM API spending by model Compare provider pricing structures Identify cost-efficient alternatives
Measure latency and response times Test consistency across multiple runs Evaluate quality scores
Click "Subscribe" on ClawhHub to get access.
Add keys for the providers you want to test: # Anthropic (Claude models) export ANTHROPIC_API_KEY="sk-ant-..." # OpenAI (GPT models) export OPENAI_API_KEY="sk-..." # Google (Gemini models) export GOOGLE_API_KEY="AI..." # DeepSeek export DEEPSEEK_API_KEY="..." # xAI (Grok models) export XAI_API_KEY="..." # MiniMax export MINIMAX_API_KEY="..." # Alibaba (Qwen models) export DASHSCOPE_API_KEY="..." # OpenRouter (Meta Llama models) export OPENROUTER_API_KEY="..." # Mistral export MISTRAL_API_KEY="..." You only need keys for the providers you plan to test.
# Install only what you need pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama pip install google-generativeai # Gemini pip install mistralai # Mistral # Or install everything pip install anthropic openai google-generativeai mistralai
Option A: Python import os from prompt_performance_tester import PromptPerformanceTester tester = PromptPerformanceTester() # reads API keys from environment results = tester.test_prompt( prompt_text="Write a professional email apologizing for a delayed shipment", models=[ "claude-haiku-4-5-20251001", "gpt-5.2", "gemini-2.5-flash", "deepseek-chat", ], num_runs=3, max_tokens=500 ) print(tester.format_results(results)) print(f"๐ Best quality: {results.best_model}") print(f"๐ฐ Cheapest: {results.cheapest_model}") print(f"โก Fastest: {results.fastest_model}") Option B: CLI # Test across multiple models prompt-tester test "Your prompt here" \ --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \ --runs 3 # Export results prompt-tester test "Your prompt here" --export results.json
Keys stored in environment variables only โ never hardcoded or logged Never transmitted to UnisAI servers HTTPS encryption for all provider API calls
Your prompts are sent only to the AI providers you select for testing Each provider has their own data retention policy (see their privacy pages) No data stored on UnisAI infrastructure
Python: 3.9+ Dependencies: anthropic, openai, google-generativeai, mistralai (install only what you need) Platform: macOS, Linux, Windows
Lazy client initialization โ SDK clients only loaded for providers actually tested Prefix-based routing โ PROVIDER_MAP detects provider from model name; no hardcoded whitelist OpenAI-compat path โ DeepSeek, xAI, MiniMax, Qwen, and OpenRouter all use the openai SDK with a custom base_url Pricing table โ used for cost calculation only; unknown models get cost=0 with a warning
Every test captures: Latency: Total response time (ms) Cost: Input + output cost based on known pricing (USD) Quality: Heuristic response score based on length, completeness (0โ100) Tokens: Exact input/output token counts per provider Consistency: Standard deviation across multiple runs Errors: Timeouts, rate limits, API failures
Q: Do I need API keys for all 9 providers? A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY. Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees. Q: How accurate are the cost calculations? A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 โ the model still runs, the cost just won't be shown. Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models. Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages. Q: Can I use this in production/CI/CD? A: Yes. Import PromptPerformanceTester directly from Python or call via CLI. Q: What if my prompt is very long? A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.
Model-agnostic architecture โ any model ID works via prefix detection 9 providers, 20 known models with pricing DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers Claude 4.6 series (opus-4-6, sonnet-4-6) Lazy client initialization โ only loads SDKs for providers actually used Fixed UnisAI branding throughout
Batch testing: Test 100+ prompts simultaneously Historical tracking: Track model performance over time Webhook integrations: Slack, Discord, email notifications
A/B testing framework: Scientific prompt experimentation Fine-tuning insights: Which models to fine-tune for your use case Custom benchmarks: Create your own evaluation criteria Auto-optimization: AI-powered prompt improvement suggestions
Email: support@unisai.vercel.app Website: https://unisai.vercel.app Bug Reports: support@unisai.vercel.app
This skill is distributed via ClawhHub under the following terms.
Use for your own business and projects Test prompts for internal applications Modify source code for personal use
Redistribute outside ClawhHub registry Resell or sublicense Use UnisAI trademark without permission Full Terms: See LICENSE.md
Fixes & Polish Bumped version to 1.1.8 SKILL.md fully rewritten โ cleaned up formatting, removed stale content Removed old IP watermark reference (PROPRIETARY_SKILL_VEDANT_2024) from docs Corrected watermark to PROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughout Fixed all UnisAI branding (was UniAI in v1.1.0 changelog) Updated pricing table to include all 20 known models Cleaned up FAQ, Quick Start, and Use Cases sections
๐๏ธ Model-Agnostic Architecture Provider auto-detected from model name prefix โ no hardcoded whitelist Any new model works automatically without code changes Added DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers (9 total) Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6) Lazy client initialization โ only loads SDKs for providers actually tested Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter
๐ Latest Models Update GPT-5.2 Series โ Added Instant, Thinking, and Pro variants Gemini 2.5 Series โ Updated to 2.5 Pro, Flash, and Flash-Lite Claude 4.5 pricing updates 10 total models across 3 providers
โจ Major Features Multi-provider support โ Claude, GPT, Gemini Cross-provider cost comparison Enhanced recommendations engine Rebranded to UnisAI
Initial Release Claude-only prompt testing (Haiku, Sonnet, Opus) Performance metrics: latency, cost, quality, consistency Basic recommendations engine Last Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained ยฉ 2026 UnisAI. All rights reserved.
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.