Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Expert guidance for systematic backtesting of trading strategies. Use when developing, testing, stress-testing, or validating quantitative trading strategies...
Expert guidance for systematic backtesting of trading strategies. Use when developing, testing, stress-testing, or validating quantitative trading strategies...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Goal: Find strategies that "break the least", not strategies that "profit the most" on paper. Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
Use this skill when: Developing or validating systematic trading strategies Evaluating whether a trading idea is robust enough for live implementation Troubleshooting why a backtest might be misleading Learning proper backtesting methodology Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias) Assessing parameter sensitivity and regime dependence Setting realistic expectations for slippage and execution costs
Define the edge in one sentence. Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity." If you can't articulate the edge clearly, don't proceed to testing.
Define with complete specificity: Entry: Exact conditions, timing, price type Exit: Stop loss, profit target, time-based exit Position sizing: Fixed $$, % of portfolio, volatility-adjusted Filters: Market cap, volume, sector, volatility conditions Universe: What instruments are eligible Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
Test over: Minimum 5 years (preferably 10+) Multiple market regimes (bull, bear, high/low volatility) Realistic costs: Commissions + conservative slippage Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
This is where 80% of testing time should be spent. Parameter sensitivity: Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline Test profit target at 80%, 90%, 100%, 110%, 120% of baseline Vary entry/exit timing by Β±15-30 minutes Look for "plateaus" of stable performance, not narrow spikes Execution friction: Increase slippage to 1.5-2x typical estimates Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick) Add realistic order rejection scenarios Test with pessimistic commission structures Time robustness: Analyze year-by-year performance Require positive expectancy in majority of years Ensure strategy doesn't rely on 1-2 exceptional periods Test in different market regimes separately Sample size: Absolute minimum: 30 trades Preferred: 100+ trades High confidence: 200+ trades
Walk-forward analysis: Optimize on training period (e.g., Year 1-3) Test on validation period (Year 4) Roll forward and repeat Compare in-sample vs out-of-sample performance Warning signs: Out-of-sample <50% of in-sample performance Need frequent parameter re-optimization Parameters change dramatically between periods
Questions to answer: Does edge survive pessimistic assumptions? Is performance stable across parameter variations? Does strategy work in multiple market regimes? Is sample size sufficient for statistical confidence? Are results realistic, not "too good to be true"? Decision criteria: β Deploy: Survives all stress tests with acceptable performance π Refine: Core logic sound but needs parameter adjustment β Abandon: Fails stress tests or relies on fragile assumptions
Add friction everywhere: Commissions higher than reality Slippage 1.5-2x typical Worst-case fills Order rejections Partial fills Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.
Look for parameter ranges where performance is stable, not optimal values that create performance spikes. Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad: Strategy only works with stop loss at exactly 2.13% Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
Wrong approach: Study hand-picked "market leaders" that worked Right approach: Test every stock that met criteria, including those that failed Selective examples create survivorship bias and overestimate strategy quality.
Intuition: Useful for generating hypotheses Validation: Must be purely data-driven Never let attachment to an idea influence interpretation of test results.
Recognize these patterns early to save time: Parameter sensitivity: Only works with exact parameter values Regime-specific: Great in some years, terrible in others Slippage sensitivity: Unprofitable when realistic costs added Small sample: Too few trades for statistical confidence Look-ahead bias: "Too good to be true" results Over-optimization: Many parameters, poor out-of-sample results See references/failed_tests.md for detailed examples and diagnostic framework.
File: references/methodology.md When to read: For detailed guidance on specific testing techniques. Contents: Stress testing methods Parameter sensitivity analysis Slippage and friction modeling Sample size requirements Market regime classification Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
File: references/failed_tests.md When to read: When strategy fails tests, or learning from past mistakes. Contents: Why failures are valuable Common failure patterns with examples Case study documentation framework Red flags checklist for evaluating backtests
Time allocation: Spend 20% generating ideas, 80% trying to break them. Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading. Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues. Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues). Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
This skill focuses on systematic/quantitative backtesting where: All rules are codified in advance No discretion or "feel" in execution Testing happens on all historical examples, not cherry-picked cases Context (news, macro) is deliberately stripped out Discretionary traders study differentlyβthis skill may not apply to setups requiring subjective judgment.
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.