Tencent SkillHub · Developer Tools

Data Cleaning & Annotation Workflow

Complete workflow for time series datasets (Energy, Manufacturing, Climate) on Kaggle to Data Annotation platform (data.smlcrm.com). Includes downloading, cl...

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Complete workflow for time series datasets (Energy, Manufacturing, Climate) on Kaggle to Data Annotation platform (data.smlcrm.com). Includes downloading, cl...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: SKILL.md, references/platform_guide.md, scripts/clean_dataset.py, scripts/download_kaggle.sh

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 1.0.0

Provenance

Publisher: Deyashmukh
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 15 sections Open source page

Simulacrum Data Annotation Workflow

Complete end-to-end workflow for time series dataset preparation and annotation on the Data Annotation platform (data.smlcrm.com).

What This Skill Does

This skill captures the precise workflow for processing time series datasets (Energy, Manufacturing, Climate) from discovery to CLEAN status: Find Dataset: Search Kaggle for Energy/Manufacturing/Climate time series data Download: Get CSV files via browser or Kaggle CLI Clean: Run Python/pandas script to handle missing values, duplicates, formatting Upload RAW: Upload original CSV with metadata (name, domain, source URL, description) Configure Headers: Set column types (Time, Target, Covariate, Group) and units Assign Groups: Select ALL variables (target + covariates), apply ALL group tags Upload Cleaned: Final upload → CLEAN status

Supported Domains

Energy: Power consumption, utilities, renewable energy, grid data Manufacturing: Industrial processes, steel production, emissions, equipment data Climate: CO2 emissions, environmental monitoring, weather correlation data

Quick Start

For the full pipeline from Kaggle to annotated dataset: 1. Find dataset on Kaggle 2. Download (browser or kaggle CLI) 3. Clean with scripts/clean_dataset.py 4. Upload RAW dataset to data.smlcrm.com (with metadata) 5. Click "Clean" and upload cleaned file 6. Configure column metadata (types, units) 7. Assign groups to variables 8. Upload cleaned dataset → CLEAN status

Step 1: Find and Download Dataset

From Kaggle (Browser Method): Navigate to kaggle.com/datasets Search for relevant dataset (e.g., "steel industry energy consumption", "manufacturing emissions", "climate CO2") Review data description, file list, and preview Click "Download" button Extract CSV file from downloaded zip Alternative: Kaggle CLI # Install if needed: pip install kaggle # Configure: kaggle competitions list scripts/download_kaggle.sh <dataset-name> [output-dir] # Example: scripts/download_kaggle.sh csafrit2/steel-industry-energy-consumption

Step 2: Clean the Dataset

Always run the cleaning script before upload: python3 scripts/clean_dataset.py <input.csv> [-o <output.csv>] What the script does: Strips whitespace from column names Removes duplicate rows Fills missing numeric values with median Fills missing categorical values with mode or 'Unknown' Converts timestamp columns to datetime format Outputs column summary for metadata configuration Output: Cleaned CSV file ready for upload Column summary printed to console (save this for metadata config)

Step 3: Upload Raw Dataset to Platform

Navigate to data.smlcrm.com/dashboard Click "Upload Dataset" button Fill in metadata for the RAW dataset: Name: Descriptive dataset name Domain: Category (Energy, Manufacturing, Climate, etc.) Source URL: Kaggle or original source URL Description: Brief summary of the dataset Upload the original/raw CSV file (not cleaned yet) Click Upload Result: Dataset appears in list with RAW status

Step 4: Upload Cleaned File & Configure Metadata

Find the RAW dataset in the list Click "Clean" button Upload the cleaned CSV file (from Step 2) Configure headers for each column: SettingDescriptionNameColumn name (editable)UnitsMeasurement units (kWh, °C, %, ratio, tCO2, etc.)TypeTime / Target / Covariate / Group Column Type Guide: Time: Timestamp/datetime columns (usually required) Target: Variable to predict (at least one required) Covariate: Input features/independent variables Group: Categorical segment variables (WeekStatus, Day_of_week, Load_Type, etc.) Bulk Configuration: Select multiple rows via checkboxes Use "Apply" dropdown to set type for selected columns Set units individually or in bulk Common Unit Patterns: Energy: kWh, MWh, MW Power: kVarh, kW Emissions: tCO2, kgCO2 Ratios: ratio, % Time: seconds, minutes, hours

Step 5: Assign Groups to Variables

Purpose: Group variables define how data is segmented for analysis. Exact Workflow: Select ALL variables by checking their checkboxes: Target variable(s) ALL covariate variables Apply ALL group tags to selected variables: Click first group tag (e.g., WeekStatus) → all selected get this group Click second group tag (e.g., Day_of_week) → all selected get this group Click third group tag (e.g., Load_Type) → all selected get this group Continue for all available group tags Result: All variables have all groups assigned (e.g., "WeekStatus × Day_of_week × Load_Type") Important: Assign groups to BOTH target variables AND all covariates.

Step 6: Final Upload

Click "Upload Cleaned Dataset" button Wait for processing Dataset status changes from RAW → CLEAN Verify data points count is correct

Example: Steel Industry Energy Dataset

Source: https://www.kaggle.com/datasets/csafrit2/steel-industry-energy-consumption Metadata: Name: Steel Industry Energy Consumption (South Korea) Domain: Energy Data Points: 350,400 Column Configuration: ColumnTypeUnitsTimestampsTime-Usage_kWhTargetkWhLagging_Current_Reactive.Power_kVarhCovariatekVarhLeading_Current_Reactive_Power_kVarhCovariatekVarhCO2(tCO2)CovariatetCO2Lagging_Current_Power_FactorCovariateratioLeading_Current_Power_FactorCovariateratioNSMCovariatesecondsWeekStatusGroup-Day_of_weekGroup-Load_TypeGroup- Group Assignment: Select: Usage_kWh, Lagging_Current_Reactive.Power_kVarh, Leading_Current_Reactive_Power_kVarh, CO2(tCO2), Lagging_Current_Power_Factor, Leading_Current_Power_Factor, NSM Click: WeekStatus → all selected get WeekStatus Click: Day_of_week → all selected get Day_of_week Click: Load_Type → all selected get Load_Type Final: All variables show "WeekStatus × Day_of_week × Load_Type"

Reference Materials

For detailed platform configuration guidance, see references/platform_guide.md.

Troubleshooting

"Next" button disabled: Check at least one Time column is set Check at least one Target column is set Verify all columns have types assigned Groups not appearing: Columns must be marked as "Group" type first Proceed to next step after setting Group types Upload fails: Re-run cleaning script Check CSV format (comma-delimited) Verify no empty column names

Scripts

ScriptPurposescripts/clean_dataset.pyClean and prepare CSV for uploadscripts/download_kaggle.shDownload datasets via Kaggle CLI

Platform URL

Data Annotation Platform: https://data.smlcrm.com

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

2 Docs2 Scripts

SKILL.md Primary doc
references/platform_guide.md Docs
scripts/clean_dataset.py Scripts
scripts/download_kaggle.sh Scripts