# Send Data Cleaning & Annotation Workflow to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "data-cleaning-annotation-workflow",
    "name": "Data Cleaning & Annotation Workflow",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/Deyashmukh/data-cleaning-annotation-workflow",
    "canonicalUrl": "https://clawhub.ai/Deyashmukh/data-cleaning-annotation-workflow",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/data-cleaning-annotation-workflow",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=data-cleaning-annotation-workflow",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "references/platform_guide.md",
      "scripts/clean_dataset.py",
      "scripts/download_kaggle.sh"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "data-cleaning-annotation-workflow",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-02T08:22:27.726Z",
      "expiresAt": "2026-05-09T08:22:27.726Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=data-cleaning-annotation-workflow",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=data-cleaning-annotation-workflow",
        "contentDisposition": "attachment; filename=\"data-cleaning-annotation-workflow-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "data-cleaning-annotation-workflow"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/data-cleaning-annotation-workflow"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/data-cleaning-annotation-workflow",
    "downloadUrl": "https://openagent3.xyz/downloads/data-cleaning-annotation-workflow",
    "agentUrl": "https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent",
    "manifestUrl": "https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent.md"
  }
}
```
## Documentation

### Simulacrum Data Annotation Workflow

Complete end-to-end workflow for time series dataset preparation and annotation on the Data Annotation platform (data.smlcrm.com).

### What This Skill Does

This skill captures the precise workflow for processing time series datasets (Energy, Manufacturing, Climate) from discovery to CLEAN status:

Find Dataset: Search Kaggle for Energy/Manufacturing/Climate time series data
Download: Get CSV files via browser or Kaggle CLI
Clean: Run Python/pandas script to handle missing values, duplicates, formatting
Upload RAW: Upload original CSV with metadata (name, domain, source URL, description)
Configure Headers: Set column types (Time, Target, Covariate, Group) and units
Assign Groups: Select ALL variables (target + covariates), apply ALL group tags
Upload Cleaned: Final upload → CLEAN status

### Supported Domains

Energy: Power consumption, utilities, renewable energy, grid data
Manufacturing: Industrial processes, steel production, emissions, equipment data
Climate: CO2 emissions, environmental monitoring, weather correlation data

### Quick Start

For the full pipeline from Kaggle to annotated dataset:

1. Find dataset on Kaggle
2. Download (browser or kaggle CLI)
3. Clean with scripts/clean_dataset.py
4. Upload RAW dataset to data.smlcrm.com (with metadata)
5. Click "Clean" and upload cleaned file
6. Configure column metadata (types, units)
7. Assign groups to variables
8. Upload cleaned dataset → CLEAN status

### Step 1: Find and Download Dataset

From Kaggle (Browser Method):

Navigate to kaggle.com/datasets
Search for relevant dataset (e.g., "steel industry energy consumption", "manufacturing emissions", "climate CO2")
Review data description, file list, and preview
Click "Download" button
Extract CSV file from downloaded zip

Alternative: Kaggle CLI

# Install if needed: pip install kaggle
# Configure: kaggle competitions list

scripts/download_kaggle.sh <dataset-name> [output-dir]
# Example: scripts/download_kaggle.sh csafrit2/steel-industry-energy-consumption

### Step 2: Clean the Dataset

Always run the cleaning script before upload:

python3 scripts/clean_dataset.py <input.csv> [-o <output.csv>]

What the script does:

Strips whitespace from column names
Removes duplicate rows
Fills missing numeric values with median
Fills missing categorical values with mode or 'Unknown'
Converts timestamp columns to datetime format
Outputs column summary for metadata configuration

Output:

Cleaned CSV file ready for upload
Column summary printed to console (save this for metadata config)

### Step 3: Upload Raw Dataset to Platform

Navigate to data.smlcrm.com/dashboard
Click "Upload Dataset" button
Fill in metadata for the RAW dataset:

Name: Descriptive dataset name
Domain: Category (Energy, Manufacturing, Climate, etc.)
Source URL: Kaggle or original source URL
Description: Brief summary of the dataset


Upload the original/raw CSV file (not cleaned yet)
Click Upload

Result: Dataset appears in list with RAW status

### Step 4: Upload Cleaned File & Configure Metadata

Find the RAW dataset in the list
Click "Clean" button
Upload the cleaned CSV file (from Step 2)
Configure headers for each column:

SettingDescriptionNameColumn name (editable)UnitsMeasurement units (kWh, °C, %, ratio, tCO2, etc.)TypeTime / Target / Covariate / Group

Column Type Guide:

Time: Timestamp/datetime columns (usually required)
Target: Variable to predict (at least one required)
Covariate: Input features/independent variables
Group: Categorical segment variables (WeekStatus, Day_of_week, Load_Type, etc.)

Bulk Configuration:

Select multiple rows via checkboxes
Use "Apply" dropdown to set type for selected columns
Set units individually or in bulk

Common Unit Patterns:

Energy: kWh, MWh, MW
Power: kVarh, kW
Emissions: tCO2, kgCO2
Ratios: ratio, %
Time: seconds, minutes, hours

### Step 5: Assign Groups to Variables

Purpose: Group variables define how data is segmented for analysis.

Exact Workflow:

Select ALL variables by checking their checkboxes:

Target variable(s)
ALL covariate variables



Apply ALL group tags to selected variables:

Click first group tag (e.g., WeekStatus) → all selected get this group
Click second group tag (e.g., Day_of_week) → all selected get this group
Click third group tag (e.g., Load_Type) → all selected get this group
Continue for all available group tags



Result: All variables have all groups assigned (e.g., "WeekStatus × Day_of_week × Load_Type")

Important: Assign groups to BOTH target variables AND all covariates.

### Step 6: Final Upload

Click "Upload Cleaned Dataset" button
Wait for processing
Dataset status changes from RAW → CLEAN
Verify data points count is correct

### Example: Steel Industry Energy Dataset

Source: https://www.kaggle.com/datasets/csafrit2/steel-industry-energy-consumption

Metadata:

Name: Steel Industry Energy Consumption (South Korea)
Domain: Energy
Data Points: 350,400

Column Configuration:

ColumnTypeUnitsTimestampsTime-Usage_kWhTargetkWhLagging_Current_Reactive.Power_kVarhCovariatekVarhLeading_Current_Reactive_Power_kVarhCovariatekVarhCO2(tCO2)CovariatetCO2Lagging_Current_Power_FactorCovariateratioLeading_Current_Power_FactorCovariateratioNSMCovariatesecondsWeekStatusGroup-Day_of_weekGroup-Load_TypeGroup-

Group Assignment:

Select: Usage_kWh, Lagging_Current_Reactive.Power_kVarh, Leading_Current_Reactive_Power_kVarh, CO2(tCO2), Lagging_Current_Power_Factor, Leading_Current_Power_Factor, NSM
Click: WeekStatus → all selected get WeekStatus
Click: Day_of_week → all selected get Day_of_week
Click: Load_Type → all selected get Load_Type
Final: All variables show "WeekStatus × Day_of_week × Load_Type"

### Reference Materials

For detailed platform configuration guidance, see references/platform_guide.md.

### Troubleshooting

"Next" button disabled:

Check at least one Time column is set
Check at least one Target column is set
Verify all columns have types assigned

Groups not appearing:

Columns must be marked as "Group" type first
Proceed to next step after setting Group types

Upload fails:

Re-run cleaning script
Check CSV format (comma-delimited)
Verify no empty column names

### Scripts

ScriptPurposescripts/clean_dataset.pyClean and prepare CSV for uploadscripts/download_kaggle.shDownload datasets via Kaggle CLI

### Platform URL

Data Annotation Platform: https://data.smlcrm.com
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: Deyashmukh
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-02T08:22:27.726Z
- Expires at: 2026-05-09T08:22:27.726Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/data-cleaning-annotation-workflow)
- [Send to Agent page](https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent)
- [JSON manifest](https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/data-cleaning-annotation-workflow/agent.md)
- [Download page](https://openagent3.xyz/downloads/data-cleaning-annotation-workflow)