← All skills
Tencent SkillHub Β· AI

Senior Data Engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, references/data_modeling_patterns.md, references/data_pipeline_architecture.md, references/dataops_best_practices.md, references/troubleshooting.md, references/workflows.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
2.1.1

Documentation

ClawHub primary doc Primary doc: SKILL.md 14 sections Open source page

Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

Table of Contents

Trigger Phrases Quick Start Workflows Building a Batch ETL Pipeline Implementing Real-Time Streaming Data Quality Framework Setup Architecture Decision Framework Tech Stack Reference Documentation Troubleshooting

Trigger Phrases

Activate this skill when you see: Pipeline Design: "Design a data pipeline for..." "Build an ETL/ELT process..." "How should I ingest data from..." "Set up data extraction from..." Architecture: "Should I use batch or streaming?" "Lambda vs Kappa architecture" "How to handle late-arriving data" "Design a data lakehouse" Data Modeling: "Create a dimensional model..." "Star schema vs snowflake" "Implement slowly changing dimensions" "Design a data vault" Data Quality: "Add data validation to..." "Set up data quality checks" "Monitor data freshness" "Implement data contracts" Performance: "Optimize this Spark job" "Query is running slow" "Reduce pipeline execution time" "Tune Airflow DAG"

Core Tools

# Generate pipeline orchestration config python scripts/pipeline_orchestrator.py generate \ --type airflow \ --source postgres \ --destination snowflake \ --schedule "0 5 * * *" # Validate data quality python scripts/data_quality_validator.py validate \ --input data/sales.parquet \ --schema schemas/sales.json \ --checks freshness,completeness,uniqueness # Optimize ETL performance python scripts/etl_performance_optimizer.py analyze \ --query queries/daily_aggregation.sql \ --engine spark \ --recommend

Workflows

β†’ See references/workflows.md for details

Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

Batch vs Streaming

CriteriaBatchStreamingLatency requirementHours to daysSeconds to minutesData volumeLarge historical datasetsContinuous event streamsProcessing complexityComplex transformations, MLSimple aggregations, filteringCost sensitivityMore cost-effectiveHigher infrastructure costError handlingEasier to reprocessRequires careful design Decision Tree: Is real-time insight required? β”œβ”€β”€ Yes β†’ Use streaming β”‚ └── Is exactly-once semantics needed? β”‚ β”œβ”€β”€ Yes β†’ Kafka + Flink/Spark Structured Streaming β”‚ └── No β†’ Kafka + consumer groups └── No β†’ Use batch └── Is data volume > 1TB daily? β”œβ”€β”€ Yes β†’ Spark/Databricks └── No β†’ dbt + warehouse compute

Lambda vs Kappa Architecture

AspectLambdaKappaComplexityTwo codebases (batch + stream)Single codebaseMaintenanceHigher (sync batch/stream logic)LowerReprocessingNative batch layerReplay from sourceUse caseML training + real-time servingPure event-driven When to choose Lambda: Need to train ML models on historical data Complex batch transformations not feasible in streaming Existing batch infrastructure When to choose Kappa: Event-sourced architecture All processing can be expressed as stream operations Starting fresh without legacy systems

Data Warehouse vs Data Lakehouse

FeatureWarehouse (Snowflake/BigQuery)Lakehouse (Delta/Iceberg)Best forBI, SQL analyticsML, unstructured dataStorage costHigher (proprietary format)Lower (open formats)FlexibilitySchema-on-writeSchema-on-readPerformanceExcellent for SQLGood, improvingEcosystemMature BI toolsGrowing ML tooling

Tech Stack

CategoryTechnologiesLanguagesPython, SQL, ScalaOrchestrationAirflow, Prefect, DagsterTransformationdbt, Spark, FlinkStreamingKafka, Kinesis, Pub/SubStorageS3, GCS, Delta Lake, IcebergWarehousesSnowflake, BigQuery, Redshift, DatabricksQualityGreat Expectations, dbt tests, Monte CarloMonitoringPrometheus, Grafana, Datadog

1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for: Lambda vs Kappa architecture patterns Batch processing with Spark and Airflow Stream processing with Kafka and Flink Exactly-once semantics implementation Error handling and dead letter queues

2. Data Modeling Patterns

See references/data_modeling_patterns.md for: Dimensional modeling (Star/Snowflake) Slowly Changing Dimensions (SCD Types 1-6) Data Vault modeling dbt best practices Partitioning and clustering

3. DataOps Best Practices

See references/dataops_best_practices.md for: Data testing frameworks Data contracts and schema validation CI/CD for data pipelines Observability and lineage Incident response

Troubleshooting

β†’ See references/troubleshooting.md for details

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
6 Docs
  • SKILL.md Primary doc
  • references/data_modeling_patterns.md Docs
  • references/data_pipeline_architecture.md Docs
  • references/dataops_best_practices.md Docs
  • references/troubleshooting.md Docs
  • references/workflows.md Docs