← All skills
Tencent SkillHub · Data Analysis

time-sereis-analysis

Comprehensive time series data science skill covering feature engineering, model training, and competition-winning strategies for forecasting and prediction problems.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Comprehensive time series data science skill covering feature engineering, model training, and competition-winning strategies for forecasting and prediction problems.

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILLS.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 20 sections Open source page

Time Series Data Science - Complete Guide

Expert time series data scientist specializing in forecasting, sequential prediction, and competition-winning strategies. This skill covers the complete pipeline from EDA to production-ready models.

Key Lessons from Winning Solutions

Feature Engineering > Model Complexity Focus on 5-10 most predictive features, not all available Lag, rolling, and EWM features are often more valuable than the raw data Interaction features between top predictors can be game-changers Time-Based Validation is Critical NEVER use random splits for time series Train on past, validate on future (e.g., ts_index <= threshold) Leakage from future data will destroy real-world performance Weights Matter in Scoring If weights are provided, use them directly in training High-weight samples disproportionately affect score Sample weighting in model.fit() is better than custom loss Multi-Seed Ensemble for Robustness Train same model with different random seeds Average predictions reduces variance Common seeds: 42, 2024, or any fixed set

1. Lag Features

GROUP_COLS = ['entity_id', 'category', 'horizon'] for lag in [1, 3, 5, 10]: df[f'{col}_lag{lag}'] = df.groupby(GROUP_COLS)[col].shift(lag)

2. Rolling Statistics

for window in [5, 10, 20]: df[f'{col}_roll_mean{window}'] = df.groupby(GROUP_COLS)[col].transform( lambda x: x.rolling(window, min_periods=1).mean() ) df[f'{col}_roll_std{window}'] = df.groupby(GROUP_COLS)[col].transform( lambda x: x.rolling(window, min_periods=1).std() )

3. Exponential Weighted Mean (EWM)

for span in [5, 10]: df[f'{col}_ewm{span}'] = df.groupby(GROUP_COLS)[col].transform( lambda x: x.ewm(span=span, adjust=False).mean() )

4. Difference Features

df[f'{col}_diff1'] = df.groupby(GROUP_COLS)[col].diff(1) df[f'{col}_diff_pct'] = df.groupby(GROUP_COLS)[col].pct_change(1)

5. Interaction Features

# Difference between related features df['feat_diff'] = df['feature_a'] - df['feature_b'] # Ratio between features df['feat_ratio'] = df['feature_a'] / (df['feature_b'] + 1e-7) # Product interactions df['feat_product'] = df['feature_a'] * df['feature_b']

6. Target Encoding (for categories)

# Compute on training data only (ts_index <= threshold) train_only = df[df.ts_index <= VAL_THRESHOLD] enc_stats = { 'category': train_only.groupby('category')['target'].mean().to_dict(), 'global_mean': train_only['target'].mean() } # Apply to all data df['category_enc'] = df['category'].map(enc_stats['category']).fillna(enc_stats['global_mean'])

7. Temporal Signals

# Cyclical encoding for periodicity df['t_cycle'] = np.sin(2 * np.pi * df['ts_index'] / period) df['t_cycle_cos'] = np.cos(2 * np.pi * df['ts_index'] / period) # Normalized time position df['ts_normalized'] = df['ts_index'] / df['ts_index'].max() # Time bins df['ts_bin'] = pd.cut(df['ts_index'], bins=10, labels=False)

LightGBM Configuration (Competition-Tested)

lgb_cfg = { 'objective': 'regression', 'metric': 'rmse', 'learning_rate': 0.015, 'n_estimators': 4000, 'num_leaves': 80, 'min_child_samples': 200, 'feature_fraction': 0.6, 'bagging_fraction': 0.7, 'bagging_freq': 5, 'lambda_l1': 0.1, 'lambda_l2': 10.0, 'verbosity': -1 }

Multi-Seed Ensemble Training

val_pred = np.zeros(len(y_val)) test_pred = np.zeros(len(X_test)) for seed in [42, 2024]: model = lgb.LGBMRegressor(**lgb_cfg, random_state=seed) model.fit( X_train, y_train, sample_weight=w_train, # Use weights directly eval_set=[(X_val, y_val)], eval_sample_weight=[w_val], callbacks=[lgb.early_stopping(200, verbose=False)] ) val_pred += model.predict(X_val) / 2 test_pred += model.predict(X_test) / 2

Horizon-Specific Models

# Train separate model per forecast horizon for horizon in [1, 3, 10, 25]: train_h = df[df.horizon == horizon] test_h = test_df[test_df.horizon == horizon] # Build features, train model model = train_model(train_h, test_h) predictions[horizon] = model.predict(test_h)

Time-Based Split

VAL_THRESHOLD = int(df['ts_index'].max() * 0.85) train_mask = df['ts_index'] <= VAL_THRESHOLD val_mask = df['ts_index'] > VAL_THRESHOLD X_train = df.loc[train_mask, feature_cols] X_val = df.loc[val_mask, feature_cols]

Expanding Window Cross-Validation

from sklearn.model_selection import TimeSeriesSplit tscv = TimeSeriesSplit(n_splits=5) for train_idx, val_idx in tscv.split(df): # Train on expanding window pass

Custom Metrics

def weighted_rmse_score(y_true, y_pred, weights): """Weighted RMSE skill score (higher is better)""" denom = np.sum(weights * y_true**2) if denom <= 0: return 0.0 numer = np.sum(weights * (y_true - y_pred)**2) ratio = numer / denom return float(np.sqrt(1.0 - np.clip(ratio, 0.0, 1.0)))

EDA Checklist

Target Analysis Distribution by time period Distribution by category/horizon Trend and seasonality detection Missing Values Pattern analysis (random vs systematic) Group-based imputation strategy Weight Distribution Concentration analysis Impact on scoring metric Feature Correlations Correlation with target Multicollinearity between features Temporal Patterns Stationarity tests Rolling statistics visualization

Common Pitfalls to Avoid

PitfallSolutionRandom train/test splitUse time-based splitUsing future data for encodingCompute stats on train onlyIgnoring sample weightsUse sample_weight in fit()Too many featuresFocus on top 5-10 predictorsSingle modelMulti-seed ensembleOverfitting validationLarge early stopping patience

Competition Workflow

graph TD A[Load Data] --> B[Compute Encoding Stats on Train] B --> C[Build Features] C --> D[Time-Based Split] D --> E{For Each Horizon} E --> F[Train Multi-Seed Ensemble] F --> G[Validate & Score] G --> H[Generate Predictions] H --> I[Aggregate & Submit]

Quick Reference Commands

# Run complete pipeline python train_winning.py # Generate submission python generate_submission.py # Validate submission format python -c " import pandas as pd sub = pd.read_csv('submission.csv') print(f'Rows: {len(sub)}, Cols: {list(sub.columns)}') print(sub.head()) "

Integration with Other Workflows

Use with /data-analyst for comprehensive EDA Use with /data-scientist for advanced feature engineering Use with /fintech-engineer for financial risk analysis Combine predictions with /quant-analyst for portfolio strategies

Category context

Data access, storage, extraction, analysis, reporting, and insight generation.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
1 Docs
  • SKILLS.md Docs