Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.
Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Vernox Utility Skill - Clean up your digital hoard.
File-Deduplicator is an intelligent file duplicate finder and remover. Uses content hashing to identify identical files across directories, then provides options to remove duplicates safely.
Content-based hashing (MD5) for fast comparison Size-based detection (exact match, near match) Name-based detection (similar filenames) Directory scanning (recursive) Exclude patterns (.git, node_modules, etc.)
Auto-delete duplicates (keep newest/oldest) Interactive review before deletion Move to archive instead of delete Preserve permissions and metadata Dry-run mode (preview changes)
Duplicate count summary Space savings estimation Largest duplicate files Most common duplicate patterns Detailed report generation
Confirmation prompts before deletion Backup to archive folder Size threshold (don't remove huge files by mistake) Whitelist important directories Undo functionality (log for recovery)
clawhub install file-deduplicator
const result = await findDuplicates({ directories: ['./documents', './downloads', './projects'], options: { method: 'content', // content-based comparison includeSubdirs: true } }); console.log(`Found ${result.duplicateCount} duplicate groups`); console.log(`Potential space savings: ${result.spaceSaved}`);
const result = await removeDuplicates({ directories: ['./documents', './downloads'], options: { method: 'content', keep: 'newest', // keep newest, delete oldest action: 'delete', // or 'move' to archive autoConfirm: false // show confirmation for each } }); console.log(`Removed ${result.filesRemoved} duplicates`); console.log(`Space saved: ${result.spaceSaved}`);
const result = await removeDuplicates({ directories: ['./documents', './downloads'], options: { method: 'content', keep: 'newest', action: 'delete', dryRun: true // Preview without actual deletion } }); console.log('Would remove:'); result.duplicates.forEach((dup, i) => { console.log(`${i+1}. ${dup.file}`); });
Find duplicate files across directories. Parameters: directories (array|string, required): Directory paths to scan options (object, optional): method (string): 'content' | 'size' | 'name' - comparison method includeSubdirs (boolean): Scan recursively (default: true) minSize (number): Minimum size in bytes (default: 0) maxSize (number): Maximum size in bytes (default: 0) excludePatterns (array): Glob patterns to exclude (default: ['.git', 'node_modules']) whitelist (array): Directories to never scan (default: []) Returns: duplicates (array): Array of duplicate groups duplicateCount (number): Number of duplicate groups found totalFiles (number): Total files scanned scanDuration (number): Time taken to scan (ms) spaceWasted (number): Total bytes wasted by duplicates spaceSaved (number): Potential savings if duplicates removed
Remove duplicate files based on findings. Parameters: directories (array|string, required): Same as findDuplicates options (object, optional): keep (string): 'newest' | 'oldest' | 'smallest' | 'largest' - which to keep action (string): 'delete' | 'move' | 'archive' archivePath (string): Where to move files when action='move' dryRun (boolean): Preview without actual action autoConfirm (boolean): Auto-confirm deletions sizeThreshold (number): Don't remove files larger than this Returns: filesRemoved (number): Number of files removed/moved spaceSaved (number): Bytes saved groupsProcessed (number): Number of duplicate groups handled logPath (string): Path to action log errors (array): Any errors encountered
Analyze a single directory for duplicates. Parameters: directory (string, required): Path to directory options (object, optional): Same as findDuplicates options Returns: fileCount (number): Total files in directory totalSize (number): Total bytes in directory duplicateSize (number): Bytes in duplicate files duplicateRatio (number): Percentage of files that are duplicates
Find duplicate photos/videos Identify wasted storage space Remove old duplicates, keep newest Clean up download folders
Find duplicate PDFs, docs, reports Keep latest version, archive old versions Prevent version confusion Reduce backup bloat
Find duplicate source files Remove duplicate build artifacts Clean up node_modules duplicates Save storage on SSD/HDD
Find duplicate backup files Remove redundant backups Identify what's actually duplicated Save space on backup drives
{ "detection": { "defaultMethod": "content", "sizeTolerancePercent": 0, // exact match only "nameSimilarity": 0.7, // 0-1, lower = more similar "includeSubdirs": true }, "removal": { "defaultAction": "delete", "defaultKeep": "newest", "archivePath": "./archive", "sizeThreshold": 10485760, // 10MB threshold "autoConfirm": false, "dryRunDefault": false }, "exclude": { "patterns": [".git", "node_modules", ".vscode", ".idea"], "whitelist": ["important", "work", "projects"] } }
Fast MD5 hashing Detects exact duplicates regardless of filename Works across renamed files Perfect for documents, code, archives
Compares file sizes Faster than content hashing Good for media files where content hashing is slow Finds near-duplicates (similar but not exact)
Compares filenames Detects similar named files Good for finding version duplicates (file_v1, file_v2)
const result = await findDuplicates({ directories: '~/Documents', options: { method: 'content', includeSubdirs: true } }); console.log(`Found ${result.duplicateCount} duplicate sets`); result.duplicates.slice(0, 5).forEach((set, i) => { console.log(`Set ${i+1}: ${set.files.length} files`); console.log(` Total size: ${set.totalSize} bytes`); });
const result = await removeDuplicates({ directories: '~/Documents', options: { keep: 'newest', action: 'delete' } }); console.log(`Removed ${result.filesRemoved} files`); console.log(`Saved ${result.spaceSaved} bytes`);
const result = await removeDuplicates({ directories: '~/Downloads', options: { keep: 'newest', action: 'move', archivePath: '~/Documents/Archive' } }); console.log(`Archived ${result.filesRemoved} files`); console.log(`Safe in: ~/Documents/Archive`);
const result = await removeDuplicates({ directories: '~/Documents', options: { dryRun: true // Just show what would happen } }); console.log('=== Dry Run Preview ==='); result.duplicates.forEach((set, i) => { console.log(`Would delete: ${set.toDelete.join(', ')}`); });
Small directories (<1000 files): <1s Medium directories (1000-10000 files): 1-5s Large directories (10000+ files): 5-20s
Content-based: 100% (exact duplicates) Size-based: Fast but may miss renamed files Name-based: Detects naming patterns only
Hash cache: ~1MB per 100,000 files Batch processing: Processes 1000 files at a time Peak memory: ~200MB for 1M files
Won't remove files larger than configurable threshold (default: 10MB). Prevents accidental deletion of important large files.
Move files to archive directory instead of deleting. No data loss, full recoverability.
All deletions/moves are logged to file for recovery and audit.
Log file can be used to restore accidentally deleted files (limited undo window).
Clear error message Suggest running with sudo Skip files that can't be accessed
Detect locked files Skip and report Suggest closing applications using files
Check available disk space before deletion Warn if space is critically low Prevent disk-full scenarios
Check detection method (content vs size vs name) Verify exclude patterns aren't too broad Check if files are in whitelisted directories Try with includeSubdirs: false
Check write permissions on directories Verify action isn't 'delete' with autoConfirm: true Check size threshold isn't blocking all deletions Check file locks (is another program using files?)
Reduce includeSubdirs scope Use size-based detection (faster) Exclude large directories (node_modules, .git) Process directories individually instead of batch
Use content-based detection for documents (100% accurate) Run dry-run first to preview changes Archive instead of delete for important files Check logs if anything unexpected deleted
Process frequently used directories first Use size threshold to skip large media files Exclude hidden directories from scan Process directories in parallel when possible
Regular duplicate cleanup prevents storage bloat Delete temp directories regularly Clear download folders of installers Empty trash before large scans
Duplicate detection by image similarity Near-duplicate detection (similar but not exact) Duplicate detection across network drives Cloud storage integration (S3, Google Drive) Automatic scheduling of scans Heuristic duplicate detection (ML-based) Recover deleted files from backup Duplicate detection by file content similarity (not just hash)
MIT Find duplicates. Save space. Keep your system clean. ๐ฎ
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.