โ† All skills
Tencent SkillHub ยท Productivity

AI Intelligence Hub - Real-time Model Capability Tracking

Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, SKILL.md, scripts/run.py, benchmarks/latest.json, benchmarks/2026-03-01.json, examples/daily-optimization.sh

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 21 sections Open source page

๐Ÿง  Model Benchmarks - Global AI Intelligence Hub

"Know thy models, optimize thy costs" โ€” Real-time AI capability tracking for intelligent compute routing

๐ŸŽฏ What It Does

Transform your OpenClaw deployment from guessing to data-driven model selection: ๐Ÿ” Real-time Intelligence โ€” Pulls latest capability data from LMSYS Arena, BigCode, HuggingFace leaderboards ๐Ÿ“Š Standardized Scoring โ€” Unified 0-100 capability scores across coding, reasoning, creative tasks ๐Ÿ’ฐ Cost Efficiency โ€” Calculates performance-per-dollar ratios to find hidden gems ๐ŸŽฏ Smart Recommendations โ€” Suggests optimal models for specific task types ๐Ÿ“ˆ Trend Analysis โ€” Tracks model performance changes over time

๐Ÿš€ Why You Need This

Problem: OpenClaw users often overpay for AI by using expensive models for simple tasks, or underperform by using cheap models for complex work. Solution: This skill provides real-time model intelligence to route tasks optimally: ็ฟป่ฏ‘ไปปๅŠก: Gemini 2.0 Flash (445x cost efficiency vs Claude) ๅคๆ‚็ผ–็จ‹: Claude 3.5 Sonnet (92/100 coding score) ็ฎ€ๅ•้—ฎ็ญ”: GPT-4o Mini (85x cheaper than GPT-4) Result: Users report 60-95% cost reduction with maintained or improved quality.

Install & First Run

# Fetch latest model intelligence python3 skills/model-benchmarks/scripts/run.py fetch # Find best model for your task python3 skills/model-benchmarks/scripts/run.py recommend --task coding # Check any model's capabilities python3 skills/model-benchmarks/scripts/run.py query --model gpt-4o

Sample Output

๐Ÿ† Top 3 recommendations for coding: 1. gemini-2.0-flash Task Score: 81.5/100 Cost Efficiency: 445.33 Avg Price: $0.19/1M tokens 2. claude-3.5-sonnet Task Score: 92.0/100 Cost Efficiency: 10.28 Avg Price: $9.00/1M tokens

With OpenClaw Model Routing

# Get optimal model, then configure OpenClaw BEST_MODEL=$(python3 skills/model-benchmarks/scripts/run.py recommend --task coding --json | jq -r '.models[0]') openclaw config set agents.defaults.model.primary "$BEST_MODEL"

Daily Intelligence Updates

# Add to crontab for fresh data 0 8 * * * cd ~/.openclaw/workspace && python3 skills/model-benchmarks/scripts/run.py fetch

Cost Monitoring Dashboard

# Generate cost efficiency report python3 skills/model-benchmarks/scripts/run.py analyze --export-csv > model_costs.csv

๐Ÿ“Š Supported Data Sources

PlatformCoverageUpdate FrequencyCapabilities TrackedLMSYS Chatbot Arena100+ modelsDailyGeneral, Reasoning, CreativeBigCode Leaderboard50+ modelsWeeklyCoding (HumanEval, MBPP)Open LLM Leaderboard200+ modelsDailyKnowledge, ComprehensionAlpaca Eval80+ modelsWeeklyInstruction Following

๐ŸŽฏ Task-to-Model Mapping

The skill intelligently maps your tasks to optimal models: Task TypePrimary CapabilityRecommended ModelscodingCoding + ReasoningGemini 2.0 Flash, Claude 3.5 SonnetwritingCreative + GeneralClaude 3.5 Sonnet, GPT-4oanalysisReasoning + ComprehensionGPT-4o, Claude 3.5 SonnettranslationGeneral + KnowledgeGemini 2.0 Flash, GPT-4o MinimathReasoning + KnowledgeGPT-4o, Claude 3.5 SonnetsimpleGeneralGemini 2.0 Flash, GPT-4o Mini

Cost Optimization Workflow

Profile your tasks โ€” What do you do most often? Get recommendations โ€” Run analysis for each task type Configure routing โ€” Set up model fallbacks Monitor & adjust โ€” Weekly intelligence updates

Finding Hidden Gems

# Discover undervalued models python3 skills/model-benchmarks/scripts/run.py analyze --sort-by efficiency --limit 10

Trend Analysis

# Compare model performance over time python3 skills/model-benchmarks/scripts/run.py trends --model gpt-4o --days 30

Custom Benchmark Sources

Edit BENCHMARK_SOURCES in scripts/run.py to add new evaluation platforms.

Task-Specific Scoring

Customize TASK_CAPABILITY_MAP to weight capabilities for your specific use cases.

Enterprise Integration

Slack alerts for model price changes API endpoints for programmatic access Custom dashboards with exported JSON data

๐Ÿ“ˆ Real-World Results

Startups using this skill report: ๐Ÿ—๏ธ Dev Teams: 78% cost reduction by routing simple tasks to Gemini 2.0 Flash ๐Ÿ“ Content Agencies: 65% savings using task-specific model routing ๐Ÿ”ฌ Research Labs: 45% efficiency gain with capability-driven model selection

๐Ÿ›ก๏ธ Privacy & Security

No personal data collected โ€” Only public benchmark results Local processing โ€” All analysis runs on your machine Optional caching โ€” Benchmark data cached locally for faster queries No external dependencies โ€” Uses only Python standard library

๐Ÿ”ฎ Roadmap

v1.1: Real-time price monitoring from OpenRouter/Anthropic APIs v1.2: Custom benchmark suite for your specific tasks v1.3: Multi-provider cost comparison (OpenRouter vs Direct APIs) v2.0: Predictive model performance based on task characteristics

๐Ÿค Contributing

Found a new benchmark platform? Want to improve the scoring algorithm? Fork the skill on GitHub Add your enhancement Submit a pull request Help the OpenClaw community optimize their AI costs!

๐Ÿ“ž Support

Documentation: Full API reference in scripts/run.py --help Issues: Report bugs or request features via GitHub Community: Join discussions on OpenClaw Discord Examples: More integration examples in examples/ directory Make every token count โ€” choose your models wisely! ๐Ÿง 

Category context

Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs2 Scripts2 Config
  • SKILL.md Primary doc
  • README.md Docs
  • examples/daily-optimization.sh Scripts
  • scripts/run.py Scripts
  • benchmarks/2026-03-01.json Config
  • benchmarks/latest.json Config