# Send Senior Data Engineer to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "senior-data-engineer",
    "name": "Senior Data Engineer",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/alirezarezvani/senior-data-engineer",
    "canonicalUrl": "https://clawhub.ai/alirezarezvani/senior-data-engineer",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/senior-data-engineer",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=senior-data-engineer",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "references/data_modeling_patterns.md",
      "references/data_pipeline_architecture.md",
      "references/dataops_best_practices.md",
      "references/troubleshooting.md",
      "references/workflows.md"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "senior-data-engineer",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-03T15:23:43.560Z",
      "expiresAt": "2026-05-10T15:23:43.560Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=senior-data-engineer",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=senior-data-engineer",
        "contentDisposition": "attachment; filename=\"senior-data-engineer-2.1.1.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "senior-data-engineer"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/senior-data-engineer"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/senior-data-engineer",
    "downloadUrl": "https://openagent3.xyz/downloads/senior-data-engineer",
    "agentUrl": "https://openagent3.xyz/skills/senior-data-engineer/agent",
    "manifestUrl": "https://openagent3.xyz/skills/senior-data-engineer/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/senior-data-engineer/agent.md"
  }
}
```
## Documentation

### Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

### Table of Contents

Trigger Phrases
Quick Start
Workflows

Building a Batch ETL Pipeline
Implementing Real-Time Streaming
Data Quality Framework Setup


Architecture Decision Framework
Tech Stack
Reference Documentation
Troubleshooting

### Trigger Phrases

Activate this skill when you see:

Pipeline Design:

"Design a data pipeline for..."
"Build an ETL/ELT process..."
"How should I ingest data from..."
"Set up data extraction from..."

Architecture:

"Should I use batch or streaming?"
"Lambda vs Kappa architecture"
"How to handle late-arriving data"
"Design a data lakehouse"

Data Modeling:

"Create a dimensional model..."
"Star schema vs snowflake"
"Implement slowly changing dimensions"
"Design a data vault"

Data Quality:

"Add data validation to..."
"Set up data quality checks"
"Monitor data freshness"
"Implement data contracts"

Performance:

"Optimize this Spark job"
"Query is running slow"
"Reduce pipeline execution time"
"Tune Airflow DAG"

### Core Tools

# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \\
  --type airflow \\
  --source postgres \\
  --destination snowflake \\
  --schedule "0 5 * * *"

# Validate data quality
python scripts/data_quality_validator.py validate \\
  --input data/sales.parquet \\
  --schema schemas/sales.json \\
  --checks freshness,completeness,uniqueness

# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \\
  --query queries/daily_aggregation.sql \\
  --engine spark \\
  --recommend

### Workflows

→ See references/workflows.md for details

### Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

### Batch vs Streaming

CriteriaBatchStreamingLatency requirementHours to daysSeconds to minutesData volumeLarge historical datasetsContinuous event streamsProcessing complexityComplex transformations, MLSimple aggregations, filteringCost sensitivityMore cost-effectiveHigher infrastructure costError handlingEasier to reprocessRequires careful design

Decision Tree:

Is real-time insight required?
├── Yes → Use streaming
│   └── Is exactly-once semantics needed?
│       ├── Yes → Kafka + Flink/Spark Structured Streaming
│       └── No → Kafka + consumer groups
└── No → Use batch
    └── Is data volume > 1TB daily?
        ├── Yes → Spark/Databricks
        └── No → dbt + warehouse compute

### Lambda vs Kappa Architecture

AspectLambdaKappaComplexityTwo codebases (batch + stream)Single codebaseMaintenanceHigher (sync batch/stream logic)LowerReprocessingNative batch layerReplay from sourceUse caseML training + real-time servingPure event-driven

When to choose Lambda:

Need to train ML models on historical data
Complex batch transformations not feasible in streaming
Existing batch infrastructure

When to choose Kappa:

Event-sourced architecture
All processing can be expressed as stream operations
Starting fresh without legacy systems

### Data Warehouse vs Data Lakehouse

FeatureWarehouse (Snowflake/BigQuery)Lakehouse (Delta/Iceberg)Best forBI, SQL analyticsML, unstructured dataStorage costHigher (proprietary format)Lower (open formats)FlexibilitySchema-on-writeSchema-on-readPerformanceExcellent for SQLGood, improvingEcosystemMature BI toolsGrowing ML tooling

### Tech Stack

CategoryTechnologiesLanguagesPython, SQL, ScalaOrchestrationAirflow, Prefect, DagsterTransformationdbt, Spark, FlinkStreamingKafka, Kinesis, Pub/SubStorageS3, GCS, Delta Lake, IcebergWarehousesSnowflake, BigQuery, Redshift, DatabricksQualityGreat Expectations, dbt tests, Monte CarloMonitoringPrometheus, Grafana, Datadog

### 1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for:

Lambda vs Kappa architecture patterns
Batch processing with Spark and Airflow
Stream processing with Kafka and Flink
Exactly-once semantics implementation
Error handling and dead letter queues

### 2. Data Modeling Patterns

See references/data_modeling_patterns.md for:

Dimensional modeling (Star/Snowflake)
Slowly Changing Dimensions (SCD Types 1-6)
Data Vault modeling
dbt best practices
Partitioning and clustering

### 3. DataOps Best Practices

See references/dataops_best_practices.md for:

Data testing frameworks
Data contracts and schema validation
CI/CD for data pipelines
Observability and lineage
Incident response

### Troubleshooting

→ See references/troubleshooting.md for details
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: alirezarezvani
- Version: 2.1.1
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-03T15:23:43.560Z
- Expires at: 2026-05-10T15:23:43.560Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/senior-data-engineer)
- [Send to Agent page](https://openagent3.xyz/skills/senior-data-engineer/agent)
- [JSON manifest](https://openagent3.xyz/skills/senior-data-engineer/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/senior-data-engineer/agent.md)
- [Download page](https://openagent3.xyz/downloads/senior-data-engineer)