โ† All skills
Tencent SkillHub ยท Developer Tools

File Deduplicator

Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, SKILL.md, config.json, index.js, package.json, test.js

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 43 sections Open source page

File-Deduplicator - Find and Remove Duplicates

Vernox Utility Skill - Clean up your digital hoard.

Overview

File-Deduplicator is an intelligent file duplicate finder and remover. Uses content hashing to identify identical files across directories, then provides options to remove duplicates safely.

โœ… Duplicate Detection

Content-based hashing (MD5) for fast comparison Size-based detection (exact match, near match) Name-based detection (similar filenames) Directory scanning (recursive) Exclude patterns (.git, node_modules, etc.)

โœ… Removal Options

Auto-delete duplicates (keep newest/oldest) Interactive review before deletion Move to archive instead of delete Preserve permissions and metadata Dry-run mode (preview changes)

โœ… Analysis Tools

Duplicate count summary Space savings estimation Largest duplicate files Most common duplicate patterns Detailed report generation

โœ… Safety Features

Confirmation prompts before deletion Backup to archive folder Size threshold (don't remove huge files by mistake) Whitelist important directories Undo functionality (log for recovery)

Installation

clawhub install file-deduplicator

Find Duplicates in Directory

const result = await findDuplicates({ directories: ['./documents', './downloads', './projects'], options: { method: 'content', // content-based comparison includeSubdirs: true } }); console.log(`Found ${result.duplicateCount} duplicate groups`); console.log(`Potential space savings: ${result.spaceSaved}`);

Remove Duplicates Automatically

const result = await removeDuplicates({ directories: ['./documents', './downloads'], options: { method: 'content', keep: 'newest', // keep newest, delete oldest action: 'delete', // or 'move' to archive autoConfirm: false // show confirmation for each } }); console.log(`Removed ${result.filesRemoved} duplicates`); console.log(`Space saved: ${result.spaceSaved}`);

Dry-Run Preview

const result = await removeDuplicates({ directories: ['./documents', './downloads'], options: { method: 'content', keep: 'newest', action: 'delete', dryRun: true // Preview without actual deletion } }); console.log('Would remove:'); result.duplicates.forEach((dup, i) => { console.log(`${i+1}. ${dup.file}`); });

findDuplicates

Find duplicate files across directories. Parameters: directories (array|string, required): Directory paths to scan options (object, optional): method (string): 'content' | 'size' | 'name' - comparison method includeSubdirs (boolean): Scan recursively (default: true) minSize (number): Minimum size in bytes (default: 0) maxSize (number): Maximum size in bytes (default: 0) excludePatterns (array): Glob patterns to exclude (default: ['.git', 'node_modules']) whitelist (array): Directories to never scan (default: []) Returns: duplicates (array): Array of duplicate groups duplicateCount (number): Number of duplicate groups found totalFiles (number): Total files scanned scanDuration (number): Time taken to scan (ms) spaceWasted (number): Total bytes wasted by duplicates spaceSaved (number): Potential savings if duplicates removed

removeDuplicates

Remove duplicate files based on findings. Parameters: directories (array|string, required): Same as findDuplicates options (object, optional): keep (string): 'newest' | 'oldest' | 'smallest' | 'largest' - which to keep action (string): 'delete' | 'move' | 'archive' archivePath (string): Where to move files when action='move' dryRun (boolean): Preview without actual action autoConfirm (boolean): Auto-confirm deletions sizeThreshold (number): Don't remove files larger than this Returns: filesRemoved (number): Number of files removed/moved spaceSaved (number): Bytes saved groupsProcessed (number): Number of duplicate groups handled logPath (string): Path to action log errors (array): Any errors encountered

analyzeDirectory

Analyze a single directory for duplicates. Parameters: directory (string, required): Path to directory options (object, optional): Same as findDuplicates options Returns: fileCount (number): Total files in directory totalSize (number): Total bytes in directory duplicateSize (number): Bytes in duplicate files duplicateRatio (number): Percentage of files that are duplicates

Digital Hoarder Cleanup

Find duplicate photos/videos Identify wasted storage space Remove old duplicates, keep newest Clean up download folders

Document Management

Find duplicate PDFs, docs, reports Keep latest version, archive old versions Prevent version confusion Reduce backup bloat

Project Cleanup

Find duplicate source files Remove duplicate build artifacts Clean up node_modules duplicates Save storage on SSD/HDD

Backup Optimization

Find duplicate backup files Remove redundant backups Identify what's actually duplicated Save space on backup drives

Edit config.json:

{ "detection": { "defaultMethod": "content", "sizeTolerancePercent": 0, // exact match only "nameSimilarity": 0.7, // 0-1, lower = more similar "includeSubdirs": true }, "removal": { "defaultAction": "delete", "defaultKeep": "newest", "archivePath": "./archive", "sizeThreshold": 10485760, // 10MB threshold "autoConfirm": false, "dryRunDefault": false }, "exclude": { "patterns": [".git", "node_modules", ".vscode", ".idea"], "whitelist": ["important", "work", "projects"] } }

Content-Based (Recommended)

Fast MD5 hashing Detects exact duplicates regardless of filename Works across renamed files Perfect for documents, code, archives

Size-Based

Compares file sizes Faster than content hashing Good for media files where content hashing is slow Finds near-duplicates (similar but not exact)

Name-Based

Compares filenames Detects similar named files Good for finding version duplicates (file_v1, file_v2)

Find Duplicates in Documents

const result = await findDuplicates({ directories: '~/Documents', options: { method: 'content', includeSubdirs: true } }); console.log(`Found ${result.duplicateCount} duplicate sets`); result.duplicates.slice(0, 5).forEach((set, i) => { console.log(`Set ${i+1}: ${set.files.length} files`); console.log(` Total size: ${set.totalSize} bytes`); });

Remove Duplicates, Keep Newest

const result = await removeDuplicates({ directories: '~/Documents', options: { keep: 'newest', action: 'delete' } }); console.log(`Removed ${result.filesRemoved} files`); console.log(`Saved ${result.spaceSaved} bytes`);

Move to Archive Instead of Delete

const result = await removeDuplicates({ directories: '~/Downloads', options: { keep: 'newest', action: 'move', archivePath: '~/Documents/Archive' } }); console.log(`Archived ${result.filesRemoved} files`); console.log(`Safe in: ~/Documents/Archive`);

Dry-Run Preview Changes

const result = await removeDuplicates({ directories: '~/Documents', options: { dryRun: true // Just show what would happen } }); console.log('=== Dry Run Preview ==='); result.duplicates.forEach((set, i) => { console.log(`Would delete: ${set.toDelete.join(', ')}`); });

Scanning Speed

Small directories (<1000 files): <1s Medium directories (1000-10000 files): 1-5s Large directories (10000+ files): 5-20s

Detection Accuracy

Content-based: 100% (exact duplicates) Size-based: Fast but may miss renamed files Name-based: Detects naming patterns only

Memory Usage

Hash cache: ~1MB per 100,000 files Batch processing: Processes 1000 files at a time Peak memory: ~200MB for 1M files

Size Thresholding

Won't remove files larger than configurable threshold (default: 10MB). Prevents accidental deletion of important large files.

Archive Mode

Move files to archive directory instead of deleting. No data loss, full recoverability.

Action Logging

All deletions/moves are logged to file for recovery and audit.

Undo Functionality

Log file can be used to restore accidentally deleted files (limited undo window).

Permission Errors

Clear error message Suggest running with sudo Skip files that can't be accessed

File Lock Errors

Detect locked files Skip and report Suggest closing applications using files

Space Errors

Check available disk space before deletion Warn if space is critically low Prevent disk-full scenarios

Not Finding Expected Duplicates

Check detection method (content vs size vs name) Verify exclude patterns aren't too broad Check if files are in whitelisted directories Try with includeSubdirs: false

Deletion Not Working

Check write permissions on directories Verify action isn't 'delete' with autoConfirm: true Check size threshold isn't blocking all deletions Check file locks (is another program using files?)

Slow Scanning

Reduce includeSubdirs scope Use size-based detection (faster) Exclude large directories (node_modules, .git) Process directories individually instead of batch

Best Results

Use content-based detection for documents (100% accurate) Run dry-run first to preview changes Archive instead of delete for important files Check logs if anything unexpected deleted

Performance Optimization

Process frequently used directories first Use size threshold to skip large media files Exclude hidden directories from scan Process directories in parallel when possible

Space Management

Regular duplicate cleanup prevents storage bloat Delete temp directories regularly Clear download folders of installers Empty trash before large scans

Roadmap

Duplicate detection by image similarity Near-duplicate detection (similar but not exact) Duplicate detection across network drives Cloud storage integration (S3, Google Drive) Automatic scheduling of scans Heuristic duplicate detection (ML-based) Recover deleted files from backup Duplicate detection by file content similarity (not just hash)

License

MIT Find duplicates. Save space. Keep your system clean. ๐Ÿ”ฎ

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs2 Scripts2 Config
  • SKILL.md Primary doc
  • README.md Docs
  • index.js Scripts
  • test.js Scripts
  • config.json Config
  • package.json Config