Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.
Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Enable competitive code generation where three isolated AI agents implement the same functionality, evaluate each other objectively, and deliver the optimal solution through data-driven selection.
task_description: String describing the coding task constraints: Optional constraints (time/space complexity, language, etc.)
final_solution: Directory containing the winning implementation comparison_report: Markdown analysis of all three approaches decision_rationale: Explanation of why the winner was selected
assertions: - final_solution/implementation exists and is runnable - comparison_report.md exists with objective metrics - decision_rationale.md explains selection logic - all three agent implementations are documented - evaluation scores are numeric and justified
graph TD A[User Task] --> B[Phase 1: Parallel Spawn] B --> C[Agent A: Simplicity] B --> D[Agent B: Speed] B --> E[Agent C: Robustness] C --> F[Phase 2: Cross-Evaluation] D --> F E --> F F --> G[6 Evaluation Reports] G --> H[Phase 3: Self-Scoring] H --> I[3 Scorecards] I --> J[Phase 4: Final Delivery] J --> K[Best Solution]
Agent Prompt Template: role: "Expert Software Engineer" focus: "{{agent_focus}}" # Simplicity / Speed / Robustness task: "{{task_description}}" constraints: - Complete runnable code in implementation/ - Checklist.md with ALL items checked - SUMMARY.md with competitive advantages - Must differ from other agents' approaches linter_rules: - code_compiles: true - tests_pass: true - no_todos: true - documented: true assertions: - implementation/main.* exists - tests exist and pass - Checklist.md is complete - SUMMARY.md explains unique approach
Evaluation Prompt Template: evaluator: "Agent {{from}}" target: "Agent {{to}}" task: "Objectively prove your solution is superior" dimensions: simplicity: weight: 20 metrics: - lines_of_code: count - cyclomatic_complexity: calculate - readability_score: 1-10 speed: weight: 25 metrics: - time_complexity: big_o - space_complexity: big_o - benchmark_results: run_if_possible stability: weight: 25 metrics: - error_handling_coverage: percentage - resource_cleanup: check - fault_tolerance: test corner_cases: weight: 20 metrics: - input_validation: comprehensive - boundary_conditions: covered - edge_cases: tested maintainability: weight: 10 metrics: - documentation_quality: 1-10 - code_structure: logical - extensibility: easy/hard assertions: - evaluation is objective with data - specific code snippets cited - numeric scores provided - persuasion argument is data-driven
Scoring Prompt Template: agent: "Agent {{name}}" task: "Fairly score yourself and competitors" self_evaluation: - dimension: simplicity max: 20 score: "{{self_score}}" justification: "{{why}}" - dimension: speed max: 25 score: "{{self_score}}" justification: "{{why}}" - dimension: stability max: 25 score: "{{self_score}}" justification: "{{why}}" - dimension: corner_cases max: 20 score: "{{self_score}}" justification: "{{why}}" - dimension: maintainability max: 10 score: "{{self_score}}" justification: "{{why}}" peer_evaluation: - target: "Agent {{other}}" scores: "{{numeric_scores}}" comparison: "{{objective_comparison}}" final_conclusion: best_implementation: "[A/B/C/Mixed]" reasoning: "{{data_driven_justification}}" recommendation: "{{delivery_strategy}}" assertions: - all scores are numeric - justifications are specific - no inflation or bias - conclusion is evidence-based
Decision Logic: def select_winner(scores): """ Select final solution based on competitive scores """ margins = calculate_score_margins(scores) if margins.winner - margins.second > 15: # Clear winner return SingleWinner(scores.winner) elif margins.winner - margins.second > 5: # Close competition, consider hybrid return HybridSolution(scores.top_two) else: # Very close, pick simplest return SimplestImplementation(scores.all) assertions: - final_solution is runnable - comparison_report explains all approaches - decision_rationale is transparent - attribution is given to winning agent
workspace/ βββ run_a/ β βββ implementation/ # Agent A code β βββ Checklist.md # Completion checklist β βββ SUMMARY.md # Approach summary β βββ evaluation/ # Evaluations of B, C β βββ SCORECARD.md # Self-scoring βββ run_b/ # Same structure βββ run_c/ # Same structure βββ final/ # Winning solution βββ COMPARISON_REPORT.md # Full analysis βββ DECISION_RATIONALE.md # Why winner selected
Checklist.md: Markdown with - [x] checkboxes SUMMARY.md: Markdown with sections EVALUATION_*.md: Markdown with tables SCORECARD.md: Markdown with score tables Implementation: Runnable code files
#!/bin/bash # scripts/lint.sh lint_agent_output() { local agent_dir="$1" local errors=0 # Check required files exist for file in Checklist.md SUMMARY.md implementation/main.*; do if [[ ! -f "${agent_dir}/${file}" ]]; then echo "ERROR: Missing ${file}" ((errors++)) fi done # Check Checklist is complete if grep -q "\[ \]" "${agent_dir}/Checklist.md"; then echo "ERROR: Checklist has unchecked items" ((errors++)) fi # Check code compiles (language-specific) # ... implementation-specific checks return $errors } # Run on all agents for agent in a b c; do lint_agent_output "workspace/run_${agent}" || exit 1 done
def assert_phase_complete(phase_name): """Assert that a phase has completed successfully""" assertions = { "phase1": [ "workspace/run_a/implementation exists", "workspace/run_b/implementation exists", "workspace/run_c/implementation exists", "All Checklist.md are complete" ], "phase2": [ "6 evaluation reports exist", "All evaluations have numeric scores" ], "phase3": [ "3 scorecards exist", "All scores are numeric", "Conclusions are provided" ], "phase4": [ "final/solution exists", "COMPARISON_REPORT.md exists", "DECISION_RATIONALE.md exists" ] } for assertion in assertions[phase_name]: assert evaluate(assertion), f"Assertion failed: {assertion}"
b3ehive: # Agent configuration agents: count: 3 model: openai-proxy/gpt-5.3-codex thinking: high focuses: - simplicity - speed - robustness # Evaluation weights (must sum to 100) evaluation: dimensions: simplicity: 20 speed: 25 stability: 25 corner_cases: 20 maintainability: 10 # Delivery strategy delivery: strategy: auto # auto / best / hybrid threshold: 15 # Point margin for clear winner # Quality gates quality: lint: true test: true coverage_threshold: 80
# Basic usage b3ehive "Implement a thread-safe rate limiter" # With constraints b3ehive "Implement quicksort" --lang python --max-lines 50 # Using OpenClaw CLI openclaw skills run b3ehive --task "Your task"
MIT Β© Weiyang (@weiyangzen)
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.