โ† All skills
Tencent SkillHub ยท Developer Tools

Hadoop

Manage Hadoop clusters with HDFS operations, YARN job tuning, and distributed processing diagnostics.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Manage Hadoop clusters with HDFS operations, YARN job tuning, and distributed processing diagnostics.

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md, hdfs.md, memory-template.md, setup.md, troubleshooting.md, yarn.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Setup

If ~/hadoop/ doesn't exist or is empty, read setup.md and start the conversation naturally.

When to Use

User works with Hadoop ecosystem (HDFS, YARN, MapReduce, Hive). Agent handles cluster diagnostics, job optimization, storage management, and troubleshooting distributed processing failures.

Architecture

Memory lives in ~/hadoop/. See memory-template.md for structure. ~/hadoop/ โ”œโ”€โ”€ memory.md # Cluster configs, common issues, preferences โ”œโ”€โ”€ clusters/ # Per-cluster notes and configs โ”‚ โ””โ”€โ”€ {name}.md # Specific cluster context โ””โ”€โ”€ scripts/ # Custom diagnostic scripts

Quick Reference

TopicFileSetup processsetup.mdMemory templatememory-template.mdHDFS operationshdfs.mdYARN tuningyarn.mdTroubleshootingtroubleshooting.md

1. Verify Cluster State First

Before any operation, check cluster health: hdfs dfsadmin -report yarn node -list Never assume cluster is healthy. A single dead DataNode changes everything.

2. Storage Before Compute

HDFS issues cascade into job failures. Always check: hdfs dfs -df -h # Capacity hdfs fsck / -files -blocks # Block health A job failing with "No space left" is storage, not code.

3. Resource Calculator Awareness

YARN allocates based on configured scheduler. Know which is active: yarn rmadmin -getServiceState rm1 cat /etc/hadoop/conf/yarn-site.xml | grep scheduler Default (Capacity) vs Fair scheduler behave very differently.

4. Replication Factor Context

Default replication=3. For temp data, suggest 1-2 to save space: hdfs dfs -setrep -w 1 /tmp/scratch/ For critical data, verify replication is honored: hdfs fsck /data/critical -files -blocks -replicaDetails

5. Log Location Awareness

Hadoop logs scatter across machines. Key locations: ComponentLog PathNameNode/var/log/hadoop-hdfs/hadoop-hdfs-namenode-*.logDataNode/var/log/hadoop-hdfs/hadoop-hdfs-datanode-*.logResourceManager/var/log/hadoop-yarn/yarn-yarn-resourcemanager-*.logNodeManager/var/log/hadoop-yarn/yarn-yarn-nodemanager-*.logApplicationyarn logs -applicationId <app_id>

6. Safe Mode Handling

NameNode enters safe mode on startup or low block count: hdfs dfsadmin -safemode get # Check status hdfs dfsadmin -safemode leave # Exit (if blocks OK) Never force-leave if blocks are actually missing.

7. Memory Settings Matter

90% of "job killed" issues are memory: # Container settings yarn.nodemanager.resource.memory-mb # Total per node yarn.scheduler.minimum-allocation-mb # Min container mapreduce.map.memory.mb # Map task mapreduce.reduce.memory.mb # Reduce task Check these before assuming code is wrong.

Essential Commands

# Navigation hdfs dfs -ls /path hdfs dfs -du -h /path # Size with human units hdfs dfs -count -q /path # Quota info # Data movement hdfs dfs -put local.txt /hdfs/ # Upload hdfs dfs -get /hdfs/file.txt . # Download hdfs dfs -cp /src /dst # Copy within HDFS hdfs dfs -mv /src /dst # Move within HDFS # Maintenance hdfs dfs -rm -r /path # Delete (trash) hdfs dfs -rm -r -skipTrash /path # Delete (permanent) hdfs dfs -expunge # Empty trash

Block Management

# Find corrupt blocks hdfs fsck / -list-corruptfileblocks # Delete corrupt file (after confirming unrecoverable) hdfs fsck /path/file -delete # Force replication hdfs dfs -setrep -w 3 /important/data/

Application Lifecycle

# List applications yarn application -list # Running yarn application -list -appStates ALL # All states # Application details yarn application -status <app_id> # Kill stuck application yarn application -kill <app_id> # Get logs (after completion) yarn logs -applicationId <app_id> yarn logs -applicationId <app_id> -containerId <container_id>

Queue Management

# List queues yarn queue -list # Queue status yarn queue -status <queue_name> # Move application between queues yarn application -movetoqueue <app_id> -queue <target_queue>

Common Traps

Deleting without -skipTrash on full cluster โ†’ Trash still uses space, cluster stays full Setting container memory below JVM heap โ†’ Instant container kill, confusing errors Ignoring speculative execution on slow jobs โ†’ Wastes resources on duplicated tasks Running fsck on busy cluster โ†’ Performance impact, run during maintenance Assuming HDFS = POSIX semantics โ†’ No append-in-place, no random writes Forgetting timezone in scheduling โ†’ Oozie/Airflow jobs fire at wrong times

Security & Privacy

Data that stays local: Cluster notes saved in ~/hadoop/clusters/ Preferences and environment context What commands access: hdfs/yarn commands connect to your Hadoop cluster Some commands read system paths (/var/log, /etc/hadoop/conf) Destructive commands require explicit user confirmation This skill does NOT: Store credentials (use kinit/keytab separately) Make external API calls beyond your cluster Run destructive commands without asking first

Related Skills

Install with clawhub install <slug> if user confirms: linux โ€” system administration docker โ€” containerized deployments bash โ€” shell scripting

Feedback

If useful: clawhub star hadoop Stay updated: clawhub sync

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
6 Docs
  • SKILL.md Primary doc
  • hdfs.md Docs
  • memory-template.md Docs
  • setup.md Docs
  • troubleshooting.md Docs
  • yarn.md Docs