{
  "schemaVersion": "1.0",
  "item": {
    "slug": "emergency-rescue",
    "name": "Emergency Rescue Kit",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/gitgoodordietrying/emergency-rescue",
    "canonicalUrl": "https://clawhub.ai/gitgoodordietrying/emergency-rescue",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/emergency-rescue",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=emergency-rescue",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/emergency-rescue"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/emergency-rescue",
    "agentPageUrl": "https://openagent3.xyz/skills/emergency-rescue/agent",
    "manifestUrl": "https://openagent3.xyz/skills/emergency-rescue/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/emergency-rescue/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Emergency Rescue Kit",
        "body": "Step-by-step recovery procedures for the worst moments in a developer's day. Every section follows the same pattern: diagnose → fix → verify. Commands are non-destructive by default. Destructive steps are flagged.\n\nWhen something has gone wrong, find your situation below and follow the steps in order."
      },
      {
        "title": "When to Use",
        "body": "Someone force-pushed to main and overwrote history\nCredentials were committed to a public repository\nA rebase or reset destroyed commits you need\nDisk is full and nothing works\nA process is consuming all memory or won't die\nA database migration failed halfway through\nA deploy needs to be rolled back immediately\nSSH access is locked out\nSSL certificates expired in production\nYou don't know what went wrong, but it's broken"
      },
      {
        "title": "Force-pushed to main (or any shared branch)",
        "body": "Someone ran git push --force and overwrote remote history.\n\n# DIAGNOSE: Check the reflog on any machine that had the old state\ngit reflog show origin/main\n# Look for the last known-good commit hash\n\n# FIX (if you have the old state locally):\ngit push origin <good-commit-hash>:main --force-with-lease\n# --force-with-lease is safer than --force: it fails if remote changed again\n\n# FIX (if you DON'T have the old state locally):\n# GitHub/GitLab retain force-pushed refs temporarily\n\n# GitHub: check the \"push\" event in the audit log or use the API\ngh api repos/{owner}/{repo}/events --jq '.[] | select(.type==\"PushEvent\") | .payload.before'\n\n# GitLab: check the reflog on the server (admin access needed)\n# Or restore from any CI runner or team member's local clone\n\n# VERIFY:\ngit log --oneline -10 origin/main\n# Confirm the history looks correct"
      },
      {
        "title": "Lost commits after rebase or reset --hard",
        "body": "You ran git rebase or git reset --hard and commits disappeared.\n\n# DIAGNOSE: Your commits are NOT gone. Git keeps everything for 30+ days.\ngit reflog\n# Find the commit hash from BEFORE the rebase/reset\n# Look for entries like \"rebase (start)\" or \"reset: moving to\"\n\n# FIX: Reset back to the pre-disaster state\ngit reset --hard <commit-hash-before-disaster>\n\n# FIX (alternative): Cherry-pick specific lost commits\ngit cherry-pick <lost-commit-hash>\n\n# FIX (if reflog is empty — rare, usually means different repo):\ngit fsck --lost-found\n# Look in .git/lost-found/commit/ for dangling commits\nls .git/lost-found/commit/\ngit show <hash>  # Inspect each one\n\n# VERIFY:\ngit log --oneline -10\n# Your commits should be back"
      },
      {
        "title": "Committed to the wrong branch",
        "body": "You made commits on main that should be on a feature branch.\n\n# DIAGNOSE: Check where you are and what you committed\ngit log --oneline -5\ngit branch\n\n# FIX: Create the feature branch at current position, then reset main\ngit branch feature-branch          # Create branch pointing at current commit\ngit reset --hard HEAD~<N>          # Move main back N commits (⚠️ destructive)\ngit checkout feature-branch        # Switch to the new branch\n\n# FIX (safer alternative using cherry-pick):\ngit checkout -b feature-branch     # Create and switch to new branch\ngit checkout main\ngit reset --hard origin/main       # Reset main to remote state\n# Your commits are safely on feature-branch\n\n# VERIFY:\ngit log --oneline main -5\ngit log --oneline feature-branch -5"
      },
      {
        "title": "Merge gone wrong (conflicts everywhere, wrong result)",
        "body": "A merge produced a bad result and you want to start over.\n\n# FIX (merge not yet committed — still in conflict state):\ngit merge --abort\n\n# FIX (merge was committed but not pushed):\ngit reset --hard HEAD~1\n\n# FIX (merge was already pushed): Create a revert commit\ngit revert -m 1 <merge-commit-hash>\n# -m 1 means \"keep the first parent\" (your branch before merge)\ngit push\n\n# VERIFY:\ngit log --oneline --graph -10\ngit diff HEAD~1  # Review what changed"
      },
      {
        "title": "Corrupted git repository",
        "body": "Git commands fail with \"bad object\", \"corrupt\", or \"broken link\" errors.\n\n# DIAGNOSE: Check repository integrity\ngit fsck --full\n\n# FIX (if remote is intact — most common):\n# Save any uncommitted work first\ncp -r . ../repo-backup\n\n# Re-clone and restore local work\ncd ..\ngit clone <remote-url> repo-fresh\ncp -r repo-backup/path/to/uncommitted/files repo-fresh/\n\n# FIX (repair without re-cloning):\n# Remove corrupt objects and fetch them again\ngit fsck --full 2>&1 | grep \"corrupt\\|missing\" | awk '{print $NF}'\n# For each corrupt object:\nrm .git/objects/<first-2-chars>/<remaining-hash>\ngit fetch origin  # Re-download from remote\n\n# VERIFY:\ngit fsck --full  # Should report no errors\ngit log --oneline -5"
      },
      {
        "title": "Secret committed to git (API key, password, token)",
        "body": "A credential is in the git history. Every second counts — automated scrapers monitor public GitHub repos for leaked keys.\n\n# STEP 1: REVOKE THE CREDENTIAL IMMEDIATELY\n# Do this FIRST, before cleaning git history.\n# The credential is already compromised the moment it was pushed publicly.\n\n# AWS keys:\naws iam delete-access-key --access-key-id AKIAXXXXXXXXXXXXXXXX --user-name <user>\n# Then create a new key pair\n\n# GitHub tokens:\n# Go to github.com → Settings → Developer settings → Tokens → Revoke\n\n# Database passwords:\n# Change the password in the database immediately\n# ALTER USER myuser WITH PASSWORD 'new-secure-password';\n\n# Generic API tokens:\n# Revoke in the provider's dashboard, generate new ones\n\n# STEP 2: Remove from current branch\ngit rm --cached <file-with-secret>    # If the whole file is secret\n# OR edit the file to remove the secret, then:\ngit add <file>\n\n# STEP 3: Add to .gitignore\necho \".env\" >> .gitignore\necho \"credentials.json\" >> .gitignore\ngit add .gitignore\n\n# STEP 4: Remove from git history (⚠️ rewrites history)\n# Option A: git-filter-repo (recommended, install with pip install git-filter-repo)\ngit filter-repo --path <file-with-secret> --invert-paths\n\n# Option B: BFG Repo Cleaner (faster for large repos)\n# Download from https://rtyley.github.io/bfg-repo-cleaner/\njava -jar bfg.jar --delete-files <filename> .\ngit reflog expire --expire=now --all\ngit gc --prune=now --aggressive\n\n# STEP 5: Force push the cleaned history\ngit push origin --force --all\ngit push origin --force --tags\n\n# STEP 6: Notify all collaborators to re-clone\n# Their local copies still have the secret in reflog\n\n# VERIFY:\ngit log --all -p -S '<the-secret-string>' --diff-filter=A\n# Should return nothing"
      },
      {
        "title": ".env file pushed to public repo",
        "body": "# STEP 1: Revoke ALL credentials in that .env file. All of them. Now.\n\n# STEP 2: Remove and ignore\ngit rm --cached .env\necho \".env\" >> .gitignore\ngit add .gitignore\ngit commit -m \"Remove .env from tracking\"\n\n# STEP 3: Remove from history (see credential removal above)\ngit filter-repo --path .env --invert-paths\n\n# STEP 4: Check what was exposed\n# List every variable that was in the .env:\ngit show HEAD~1:.env 2>/dev/null || git log --all -p -- .env | head -50\n# Rotate every single value.\n\n# PREVENTION: Add a pre-commit hook\ncat > .git/hooks/pre-commit << 'HOOK'\n#!/bin/bash\nif git diff --cached --name-only | grep -qE '\\.env$|\\.env\\.local$|credentials'; then\n    echo \"ERROR: Attempting to commit potential secrets file\"\n    echo \"Files: $(git diff --cached --name-only | grep -E '\\.env|credentials')\"\n    exit 1\nfi\nHOOK\nchmod +x .git/hooks/pre-commit"
      },
      {
        "title": "Secret visible in CI/CD logs",
        "body": "# STEP 1: Revoke the credential immediately\n\n# STEP 2: Delete the CI run/logs if possible\n# GitHub Actions:\ngh run delete <run-id>\n# Or: Settings → Actions → delete specific run\n\n# STEP 3: Fix the pipeline\n# Never echo secrets. Mask them:\n# GitHub Actions: echo \"::add-mask::$MY_SECRET\"\n# GitLab CI: variables are masked if marked as \"Masked\" in settings\n\n# STEP 4: Audit what was exposed\n# Check the log output for patterns like:\n# AKIAXXXXXXXXX (AWS)\n# ghp_XXXXXXXXX (GitHub)\n# sk-XXXXXXXXXXX (OpenAI/Stripe)\n# Any connection strings with passwords"
      },
      {
        "title": "System or container disk is full",
        "body": "Nothing works — builds fail, logs can't write, services crash.\n\n# DIAGNOSE: What's using space?\ndf -h                          # Which filesystem is full?\ndu -sh /* 2>/dev/null | sort -rh | head -20    # Biggest top-level dirs\ndu -sh /var/log/* | sort -rh | head -10        # Log bloat?\n\n# QUICK WINS (safe to run immediately):\n\n# 1. Docker cleanup (often the #1 cause)\ndocker system df               # See Docker disk usage\ndocker system prune -a -f      # Remove all unused images, containers, networks\ndocker volume prune -f          # Remove unused volumes\ndocker builder prune -a -f      # Remove build cache\n# ⚠️ This removes ALL unused Docker data. Safe if you can re-pull/rebuild.\n\n# 2. Package manager caches\n# npm\nnpm cache clean --force\nrm -rf ~/.npm/_cacache\n\n# pip\npip cache purge\n\n# apt\nsudo apt-get clean\nsudo apt-get autoremove -y\n\n# brew\nbrew cleanup --prune=all\n\n# 3. Log rotation (immediate)\n# Truncate (not delete) large log files to free space instantly\nsudo truncate -s 0 /var/log/syslog\nsudo truncate -s 0 /var/log/journal/*/*.journal  # systemd journals\nfind /var/log -name \"*.log\" -size +100M -exec truncate -s 0 {} \\;\n# Truncate preserves the file handle so services don't break\n\n# 4. Old build artifacts\nfind . -name \"node_modules\" -type d -prune -exec rm -rf {} + 2>/dev/null\nfind . -name \".next\" -type d -exec rm -rf {} + 2>/dev/null\nfind . -name \"dist\" -type d -exec rm -rf {} + 2>/dev/null\nfind /tmp -type f -mtime +7 -delete 2>/dev/null\n\n# 5. Find the actual culprit\nfind / -xdev -type f -size +100M -exec ls -lh {} \\; 2>/dev/null | sort -k5 -rh | head -20\n# Shows files over 100MB, sorted by size\n\n# VERIFY:\ndf -h  # Check free space increased"
      },
      {
        "title": "Docker-specific disk full",
        "body": "# DIAGNOSE:\ndocker system df -v\n\n# Common culprits:\n# 1. Dangling images from builds\ndocker image prune -f\n\n# 2. Stopped containers accumulating\ndocker container prune -f\n\n# 3. Build cache (often the biggest)\ndocker builder prune -a -f\n\n# 4. Volumes from old containers\ndocker volume ls -qf dangling=true\ndocker volume prune -f\n\n# NUCLEAR OPTION (⚠️ removes EVERYTHING):\ndocker system prune -a --volumes -f\n# You will need to re-pull all images and recreate all volumes\n\n# VERIFY:\ndocker system df\ndf -h"
      },
      {
        "title": "Port already in use",
        "body": "# DIAGNOSE: What's using the port?\n# Linux:\nlsof -i :8080\nss -tlnp | grep 8080\n# macOS:\nlsof -i :8080\n# Windows:\nnetstat -ano | findstr :8080\n\n# FIX: Kill the process\nkill $(lsof -t -i :8080)           # Graceful\nkill -9 $(lsof -t -i :8080)       # Force (if graceful didn't work)\n\n# FIX (Windows):\n# Find PID from netstat output, then:\ntaskkill /PID <pid> /F\n\n# FIX (if it's a leftover Docker container):\ndocker ps | grep 8080\ndocker stop <container-id>\n\n# VERIFY:\nlsof -i :8080  # Should return nothing"
      },
      {
        "title": "Process won't die",
        "body": "# DIAGNOSE:\nps aux | grep <process-name>\n# Note the PID\n\n# ESCALATION LADDER:\nkill <pid>                # SIGTERM (graceful shutdown)\nsleep 5\nkill -9 <pid>             # SIGKILL (cannot be caught, immediate death)\n\n# If SIGKILL doesn't work, it's a zombie or kernel-stuck process:\n# Check if zombie:\nps aux | grep <pid>\n# State \"Z\" = zombie. The parent must reap it:\nkill -SIGCHLD $(ps -o ppid= -p <pid>)\n# Or kill the parent process\n\n# If truly stuck in kernel (state \"D\"):\n# Only a reboot will fix it. The process is stuck in an I/O syscall.\n\n# MASS CLEANUP: Kill all processes matching a name\npkill -f <pattern>          # Graceful\npkill -9 -f <pattern>      # Force"
      },
      {
        "title": "Out of memory (OOM killed)",
        "body": "# DIAGNOSE: Was your process OOM-killed?\ndmesg | grep -i \"oom\\|killed process\" | tail -20\njournalctl -k | grep -i \"oom\\|killed\" | tail -20\n\n# Check what's using memory right now:\nps aux --sort=-%mem | head -20        # Top memory consumers\nfree -h                                 # System memory overview\n\n# FIX: Free memory immediately\n# 1. Kill the biggest consumer (if safe to do so)\nkill $(ps aux --sort=-%mem | awk 'NR==2{print $2}')\n\n# 2. Drop filesystem caches (safe, no data loss)\nsync && echo 3 | sudo tee /proc/sys/vm/drop_caches\n\n# 3. Disable swap thrashing (if swap is full)\nsudo swapoff -a && sudo swapon -a\n\n# PREVENT: Set memory limits\n# Docker:\ndocker run --memory=512m --memory-swap=1g myapp\n\n# Systemd service:\n# Add to [Service] section:\n# MemoryMax=512M\n# MemoryHigh=400M\n\n# Node.js:\nnode --max-old-space-size=512 app.js\n\n# VERIFY:\nfree -h\nps aux --sort=-%mem | head -5"
      },
      {
        "title": "Failed migration (partially applied)",
        "body": "# DIAGNOSE: What state is the database in?\n# Check which migrations have run:\n\n# Rails:\nrails db:migrate:status\n\n# Django:\npython manage.py showmigrations\n\n# Knex/Node:\nnpx knex migrate:status\n\n# Prisma:\nnpx prisma migrate status\n\n# Raw SQL — check migration table:\n# PostgreSQL/MySQL:\nSELECT * FROM schema_migrations ORDER BY version DESC LIMIT 10;\n# Or: SELECT * FROM _migrations ORDER BY id DESC LIMIT 10;\n\n# FIX: Roll back the failed migration\n# Most frameworks track migration state. Roll back to last good state:\n\n# Rails:\nrails db:rollback STEP=1\n\n# Django:\npython manage.py migrate <app_name> <previous_migration_number>\n\n# Knex:\nnpx knex migrate:rollback\n\n# FIX (manual): If the framework is confused about state:\n# 1. Check what the migration actually did\n# 2. Manually undo partial changes\n# 3. Delete the migration record from the migrations table\n# 4. Fix the migration code\n# 5. Re-run\n\n# VERIFY:\n# Run the migration again and confirm it applies cleanly\n# Check the affected tables/columns exist correctly"
      },
      {
        "title": "Accidentally dropped a table or database",
        "body": "# PostgreSQL:\n# If you have WAL archiving / point-in-time recovery configured:\npg_restore -d mydb /backups/latest.dump -t dropped_table\n\n# If no backup exists, check if the transaction is still open:\n# (Only works if you haven't committed yet)\n# Just run ROLLBACK; in your SQL session.\n\n# MySQL:\n# If binary logging is enabled:\nmysqlbinlog /var/log/mysql/mysql-bin.000001 \\\n  --start-datetime=\"2026-02-03 10:00:00\" \\\n  --stop-datetime=\"2026-02-03 10:30:00\" > recovery.sql\n# Review recovery.sql, then apply\n\n# SQLite:\n# If the file still exists, it's fine — SQLite DROP TABLE is within the file\n# Restore from backup:\ncp /backups/db.sqlite3 ./db.sqlite3\n\n# PREVENTION: Always run destructive SQL in a transaction\nBEGIN;\nDROP TABLE users;  -- oops\nROLLBACK;          -- saved"
      },
      {
        "title": "Database locked / deadlocked",
        "body": "# PostgreSQL:\n-- Find blocking queries\nSELECT pid, usename, state, query, wait_event_type, query_start\nFROM pg_stat_activity\nWHERE state != 'idle'\nORDER BY query_start;\n\n-- Find locks\nSELECT blocked_locks.pid AS blocked_pid,\n       blocking_locks.pid AS blocking_pid,\n       blocked_activity.query AS blocked_query,\n       blocking_activity.query AS blocking_query\nFROM pg_catalog.pg_locks blocked_locks\nJOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid\nJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype\nJOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid\nWHERE NOT blocked_locks.granted;\n\n-- Kill blocking query\nSELECT pg_terminate_backend(<blocking_pid>);\n\n# MySQL:\nSHOW PROCESSLIST;\nSHOW ENGINE INNODB STATUS\\G  -- Look for \"LATEST DETECTED DEADLOCK\"\nKILL <process_id>;\n\n# SQLite:\n# SQLite uses file-level locking. Common fix:\n# 1. Find and close all connections\n# 2. Check for .db-journal or .db-wal files (active transactions)\n# 3. If stuck: cp database.db database-fixed.db && mv database-fixed.db database.db\n# This forces SQLite to release the lock by creating a fresh file handle\n\n# VERIFY:\n# Run a simple query to confirm database is responsive\nSELECT 1;"
      },
      {
        "title": "Connection pool exhausted",
        "body": "# DIAGNOSE:\n# Error messages like: \"too many connections\", \"connection pool exhausted\",\n# \"FATAL: remaining connection slots are reserved for superuser\"\n\n# PostgreSQL — check connection count:\nSELECT count(*), state FROM pg_stat_activity GROUP BY state;\nSELECT max_conn, used, max_conn - used AS available\nFROM (SELECT count(*) AS used FROM pg_stat_activity) t,\n     (SELECT setting::int AS max_conn FROM pg_settings WHERE name='max_connections') m;\n\n# FIX: Kill idle connections\n-- Terminate idle connections older than 5 minutes\nSELECT pg_terminate_backend(pid)\nFROM pg_stat_activity\nWHERE state = 'idle'\nAND query_start < now() - interval '5 minutes';\n\n# FIX: Increase max connections (requires restart)\n# postgresql.conf:\n# max_connections = 200  (default is 100)\n\n# BETTER FIX: Use a connection pooler\n# PgBouncer or pgcat in front of PostgreSQL\n# Application-level: set pool size to match your needs\n# Node.js (pg): { max: 20 }\n# Python (SQLAlchemy): pool_size=20, max_overflow=10\n# Go (database/sql): db.SetMaxOpenConns(20)\n\n# VERIFY:\nSELECT count(*) FROM pg_stat_activity;\n# Should be well below max_connections"
      },
      {
        "title": "Quick rollback",
        "body": "# Git-based deploys:\ngit log --oneline -5 origin/main\ngit revert HEAD                    # Create a revert commit\ngit push origin main               # Deploy the revert\n# Revert is safer than reset because it preserves history\n\n# Docker/container deploys:\n# Roll back to previous image tag\ndocker pull myapp:previous-tag\ndocker stop myapp-current\ndocker run -d --name myapp myapp:previous-tag\n\n# Kubernetes:\nkubectl rollout undo deployment/myapp\nkubectl rollout status deployment/myapp    # Watch rollback progress\n\n# Heroku:\nheroku releases\nheroku rollback v<previous-version>\n\n# AWS ECS:\naws ecs update-service --cluster mycluster --service myservice \\\n  --task-definition myapp:<previous-revision>\n\n# VERIFY:\n# Hit the health check endpoint\ncurl -s -o /dev/null -w \"%{http_code}\" https://myapp.example.com/health\n# Should return 200"
      },
      {
        "title": "Container won't start",
        "body": "# DIAGNOSE: Why did it fail?\ndocker logs <container-id> --tail 100\ndocker inspect <container-id> | grep -A5 \"State\"\n\n# Common causes and fixes:\n\n# 1. \"exec format error\" — wrong platform (built for arm64, running on amd64)\ndocker build --platform linux/amd64 -t myapp .\n\n# 2. \"permission denied\" — file not executable or wrong user\n# In Dockerfile:\nRUN chmod +x /app/entrypoint.sh\n# Or: USER root before the command, then drop back\n\n# 3. \"port already allocated\" — another container or process on that port\ndocker ps -a | grep <port>\ndocker stop <conflicting-container>\n\n# 4. \"no such file or directory\" — entrypoint or CMD path is wrong\ndocker run -it --entrypoint sh myapp  # Get a shell to debug\nls -la /app/                           # Check what's actually there\n\n# 5. Healthcheck failing → container keeps restarting\ndocker inspect <container-id> --format='{{json .State.Health}}'\n# Temporarily disable healthcheck to get logs:\ndocker run --no-healthcheck myapp\n\n# 6. Out of memory — container OOM killed\ndocker inspect <container-id> --format='{{.State.OOMKilled}}'\n# If true: docker run --memory=1g myapp\n\n# VERIFY:\ndocker ps  # Container should show \"Up\" status\ndocker logs <container-id> --tail 5  # No errors"
      },
      {
        "title": "SSL certificate expired",
        "body": "# DIAGNOSE: Check certificate expiry\necho | openssl s_client -connect mysite.com:443 -servername mysite.com 2>/dev/null | \\\n  openssl x509 -noout -dates\n# notAfter shows expiry date\n\n# FIX (Let's Encrypt — most common):\nsudo certbot renew --force-renewal\nsudo systemctl reload nginx   # or: sudo systemctl reload apache2\n\n# FIX (manual certificate):\n# 1. Get new certificate from your CA\n# 2. Replace files:\nsudo cp new-cert.pem /etc/ssl/certs/mysite.pem\nsudo cp new-key.pem /etc/ssl/private/mysite.key\n# 3. Reload web server\nsudo nginx -t && sudo systemctl reload nginx\n\n# FIX (AWS ACM):\n# ACM auto-renews if DNS validation is configured.\n# If email validation: check the admin email for renewal link\n# If stuck: request a new certificate in ACM and update the load balancer\n\n# PREVENTION: Auto-renewal with monitoring\n# Cron job to check expiry and alert:\necho '0 9 * * 1 echo | openssl s_client -connect mysite.com:443 2>/dev/null | openssl x509 -checkend 604800 -noout || echo \"CERT EXPIRES WITHIN 7 DAYS\" | mail -s \"SSL ALERT\" admin@example.com' | crontab -\n\n# VERIFY:\ncurl -sI https://mysite.com | head -5\n# Should return HTTP/2 200, not certificate errors"
      },
      {
        "title": "SSH locked out",
        "body": "# DIAGNOSE: Why can't you connect?\nssh -vvv user@host  # Verbose output shows where it fails\n\n# Common causes:\n\n# 1. Key not accepted — wrong key, permissions, or authorized_keys issue\nssh -i ~/.ssh/specific_key user@host  # Try explicit key\nchmod 600 ~/.ssh/id_rsa               # Fix key permissions\nchmod 700 ~/.ssh                       # Fix .ssh dir permissions\n\n# 2. \"Connection refused\" — sshd not running or firewall blocking\n# If you have console access (cloud provider's web console):\nsudo systemctl start sshd\nsudo systemctl status sshd\n\n# 3. Firewall blocking port 22\n# Cloud console:\nsudo ufw allow 22/tcp       # Ubuntu\nsudo firewall-cmd --add-service=ssh --permanent && sudo firewall-cmd --reload  # CentOS\n\n# 4. Changed SSH port and forgot\n# Try common alternate ports:\nssh -p 2222 user@host\nssh -p 22222 user@host\n# Or check from console: grep -i port /etc/ssh/sshd_config\n\n# 5. IP changed / DNS stale\nping hostname    # Verify IP resolution\nssh user@<direct-ip>  # Try IP instead of hostname\n\n# 6. Locked out after too many attempts (fail2ban)\n# From console:\nsudo fail2ban-client set sshd unbanip <your-ip>\n# Or wait for the ban to expire (usually 10 min)\n\n# CLOUD PROVIDER ESCAPE HATCHES:\n# AWS: EC2 → Instance → Connect → Session Manager (no SSH needed)\n# GCP: Compute → VM instances → SSH (browser-based)\n# Azure: VM → Serial console\n# DigitalOcean: Droplet → Access → Console\n\n# VERIFY:\nssh user@host echo \"connection works\""
      },
      {
        "title": "Lost sudo access",
        "body": "# If you have physical/console access:\n# 1. Boot into single-user/recovery mode\n#    - Reboot, hold Shift (GRUB), select \"recovery mode\"\n#    - Or add init=/bin/bash to kernel command line\n\n# 2. Remount filesystem read-write\nmount -o remount,rw /\n\n# 3. Fix sudo access\nusermod -aG sudo <username>    # Debian/Ubuntu\nusermod -aG wheel <username>   # CentOS/RHEL\n# Or edit directly:\nvisudo\n# Add: username ALL=(ALL:ALL) ALL\n\n# 4. Reboot normally\nreboot\n\n# If you have another sudo/root user:\nsu - other-admin\nsudo usermod -aG sudo <locked-user>\n\n# CLOUD: Use the provider's console or reset the instance\n# AWS: Create an AMI, launch new instance, mount old root volume, fix"
      },
      {
        "title": "Nothing connects (total network failure)",
        "body": "# DIAGNOSE: Isolate the layer\n# 1. Is the network interface up?\nip addr show         # or: ifconfig\nping 127.0.0.1       # Loopback works?\n\n# 2. Can you reach the gateway?\nip route | grep default\nping <gateway-ip>\n\n# 3. Can you reach the internet by IP?\nping 8.8.8.8          # Google DNS\nping 1.1.1.1          # Cloudflare DNS\n\n# 4. Is DNS working?\nnslookup google.com\ndig google.com\n\n# DECISION TREE:\n# ping 127.0.0.1 fails      → network stack broken, restart networking\n# ping gateway fails         → local network issue (cable, wifi, DHCP)\n# ping 8.8.8.8 fails        → routing/firewall issue\n# ping 8.8.8.8 works but    → DNS issue\n#   nslookup fails\n\n# FIX: DNS broken\necho \"nameserver 8.8.8.8\" | sudo tee /etc/resolv.conf\n# Or: sudo systemd-resolve --flush-caches\n\n# FIX: Interface down\nsudo ip link set eth0 up\nsudo dhclient eth0        # Request new DHCP lease\n\n# FIX: Restart networking entirely\nsudo systemctl restart NetworkManager    # Desktop Linux\nsudo systemctl restart networking        # Server\nsudo systemctl restart systemd-networkd  # Systemd-based\n\n# Docker: Container can't reach the internet\ndocker run --rm alpine ping 8.8.8.8  # Test from container\n# If fails:\nsudo systemctl restart docker    # Often fixes Docker networking\n# Or: docker network prune"
      },
      {
        "title": "DNS not propagating after change",
        "body": "# DIAGNOSE: Check what different DNS servers see\ndig @8.8.8.8 mysite.com        # Google\ndig @1.1.1.1 mysite.com        # Cloudflare\ndig @ns1.yourdns.com mysite.com # Authoritative nameserver\n\n# Check TTL (time remaining before caches expire):\ndig mysite.com | grep -i ttl\n\n# REALITY CHECK:\n# DNS propagation takes time. TTL controls this.\n# TTL 300 = 5 minutes. TTL 86400 = 24 hours.\n# You cannot speed this up. You can only wait.\n\n# FIX: If authoritative nameserver has wrong records\n# Update the record at your DNS provider (Cloudflare, Route53, etc.)\n# Then flush your local cache:\n# macOS:\nsudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder\n# Linux:\nsudo systemd-resolve --flush-caches\n# Windows:\nipconfig /flushdns\n\n# WORKAROUND: While waiting for propagation\n# Add to /etc/hosts for immediate local effect:\necho \"93.184.216.34 mysite.com\" | sudo tee -a /etc/hosts\n# Remove this after propagation completes!\n\n# VERIFY:\ndig +short mysite.com  # Should show new IP/record"
      },
      {
        "title": "Accidentally deleted files (not in git)",
        "body": "# DIAGNOSE: Are the files recoverable?\n\n# If the process still has the file open:\nlsof | grep deleted\n# Then recover from /proc:\ncp /proc/<pid>/fd/<fd-number> /path/to/restored-file\n\n# If recently deleted on ext4 (Linux):\n# Install extundelete or testdisk\nsudo extundelete /dev/sda1 --restore-file path/to/file\n# Or use testdisk interactively for a better UI\n\n# macOS:\n# Check Trash first: ~/.Trash/\n# Time Machine: tmutil restore /path/to/file\n\n# PREVENTION:\n# Use trash-cli instead of rm:\n# npm install -g trash-cli\n# trash file.txt  (moves to trash instead of permanent delete)\n# Or alias: alias rm='echo \"Use trash instead\"; false'"
      },
      {
        "title": "Wrong permissions applied recursively",
        "body": "# \"I ran chmod -R 777 /\" or \"chmod -R 000 /important/dir\"\n\n# FIX: Common default permissions\n# For a web project:\nfind /path -type d -exec chmod 755 {} \\;  # Directories: rwxr-xr-x\nfind /path -type f -exec chmod 644 {} \\;  # Files: rw-r--r--\nfind /path -name \"*.sh\" -exec chmod 755 {} \\;  # Scripts: executable\n\n# For SSH:\nchmod 700 ~/.ssh\nchmod 600 ~/.ssh/id_rsa\nchmod 644 ~/.ssh/id_rsa.pub\nchmod 600 ~/.ssh/authorized_keys\nchmod 644 ~/.ssh/config\n\n# For a system directory (⚠️ serious — may need rescue boot):\n# If /etc permissions are broken:\n# Boot from live USB, mount the drive, fix permissions\n# Reference: dpkg --verify (Debian) or rpm -Va (RHEL) to compare against package defaults\n\n# VERIFY:\nls -la /path/to/fixed/directory"
      },
      {
        "title": "The Universal Diagnostic",
        "body": "When you don't know what's wrong, run this sequence:\n\n#!/bin/bash\n# emergency-diagnostic.sh — Quick system health check\n\necho \"=== DISK ===\"\ndf -h | grep -E '^/|Filesystem'\n\necho -e \"\\n=== MEMORY ===\"\nfree -h\n\necho -e \"\\n=== CPU / LOAD ===\"\nuptime\n\necho -e \"\\n=== TOP PROCESSES (by CPU) ===\"\nps aux --sort=-%cpu | head -6\n\necho -e \"\\n=== TOP PROCESSES (by MEM) ===\"\nps aux --sort=-%mem | head -6\n\necho -e \"\\n=== NETWORK ===\"\nping -c 1 -W 2 8.8.8.8 > /dev/null 2>&1 && echo \"Internet: OK\" || echo \"Internet: UNREACHABLE\"\nping -c 1 -W 2 $(ip route | awk '/default/{print $3}') > /dev/null 2>&1 && echo \"Gateway: OK\" || echo \"Gateway: UNREACHABLE\"\n\necho -e \"\\n=== RECENT ERRORS ===\"\njournalctl -p err --since \"1 hour ago\" --no-pager | tail -20 2>/dev/null || \\\n  dmesg | tail -20\n\necho -e \"\\n=== DOCKER (if running) ===\"\ndocker ps --format \"table {{.Names}}\\t{{.Status}}\\t{{.Ports}}\" 2>/dev/null || echo \"Docker not running\"\ndocker system df 2>/dev/null || true\n\necho -e \"\\n=== LISTENING PORTS ===\"\nss -tlnp 2>/dev/null | head -15 || netstat -tlnp 2>/dev/null | head -15\n\necho -e \"\\n=== FAILED SERVICES ===\"\nsystemctl --failed 2>/dev/null || true\n\nRun it, read the output, then jump to the relevant section above."
      },
      {
        "title": "Tips",
        "body": "Revoke credentials before cleaning git history. The moment a secret is pushed publicly, automated scrapers have it within minutes. Cleaning the history is important but secondary to revocation.\ngit reflog is your undo button. It records every HEAD movement for 30+ days. Lost commits, bad rebases, accidental resets — the reflog has the recovery hash. Learn to read it before you need it.\nTruncate log files, don't delete them. truncate -s 0 file.log frees disk space instantly while keeping the file handle open. Deleting a log file that a process has open won't free space until the process restarts.\n--force-with-lease instead of --force. Always. It fails if someone else has pushed, preventing you from overwriting their work on top of your recovery.\nEvery recovery operation should end with verification. Run the diagnostic command, check the output, confirm the fix worked. Don't assume — confirm.\nDocker is the #1 disk space thief on developer machines. docker system prune -a is almost always safe on development machines and can recover tens of gigabytes.\nDatabase emergencies: wrap destructive operations in transactions. BEGIN; DROP TABLE users; ROLLBACK; costs nothing and saves everything. Make it muscle memory.\nWhen SSH is locked out, every cloud provider has a console escape hatch. AWS Session Manager, GCP browser SSH, Azure Serial Console. Know where yours is before you need it.\nThe order matters: diagnose → fix → verify. Skipping diagnosis leads to wrong fixes. Skipping verification leads to false confidence. Follow the sequence every time.\nKeep this skill installed. You won't need it most days. The day you do need it, you'll need it immediately."
      }
    ],
    "body": "Emergency Rescue Kit\n\nStep-by-step recovery procedures for the worst moments in a developer's day. Every section follows the same pattern: diagnose → fix → verify. Commands are non-destructive by default. Destructive steps are flagged.\n\nWhen something has gone wrong, find your situation below and follow the steps in order.\n\nWhen to Use\nSomeone force-pushed to main and overwrote history\nCredentials were committed to a public repository\nA rebase or reset destroyed commits you need\nDisk is full and nothing works\nA process is consuming all memory or won't die\nA database migration failed halfway through\nA deploy needs to be rolled back immediately\nSSH access is locked out\nSSL certificates expired in production\nYou don't know what went wrong, but it's broken\nGit Disasters\nForce-pushed to main (or any shared branch)\n\nSomeone ran git push --force and overwrote remote history.\n\n# DIAGNOSE: Check the reflog on any machine that had the old state\ngit reflog show origin/main\n# Look for the last known-good commit hash\n\n# FIX (if you have the old state locally):\ngit push origin <good-commit-hash>:main --force-with-lease\n# --force-with-lease is safer than --force: it fails if remote changed again\n\n# FIX (if you DON'T have the old state locally):\n# GitHub/GitLab retain force-pushed refs temporarily\n\n# GitHub: check the \"push\" event in the audit log or use the API\ngh api repos/{owner}/{repo}/events --jq '.[] | select(.type==\"PushEvent\") | .payload.before'\n\n# GitLab: check the reflog on the server (admin access needed)\n# Or restore from any CI runner or team member's local clone\n\n# VERIFY:\ngit log --oneline -10 origin/main\n# Confirm the history looks correct\n\nLost commits after rebase or reset --hard\n\nYou ran git rebase or git reset --hard and commits disappeared.\n\n# DIAGNOSE: Your commits are NOT gone. Git keeps everything for 30+ days.\ngit reflog\n# Find the commit hash from BEFORE the rebase/reset\n# Look for entries like \"rebase (start)\" or \"reset: moving to\"\n\n# FIX: Reset back to the pre-disaster state\ngit reset --hard <commit-hash-before-disaster>\n\n# FIX (alternative): Cherry-pick specific lost commits\ngit cherry-pick <lost-commit-hash>\n\n# FIX (if reflog is empty — rare, usually means different repo):\ngit fsck --lost-found\n# Look in .git/lost-found/commit/ for dangling commits\nls .git/lost-found/commit/\ngit show <hash>  # Inspect each one\n\n# VERIFY:\ngit log --oneline -10\n# Your commits should be back\n\nCommitted to the wrong branch\n\nYou made commits on main that should be on a feature branch.\n\n# DIAGNOSE: Check where you are and what you committed\ngit log --oneline -5\ngit branch\n\n# FIX: Create the feature branch at current position, then reset main\ngit branch feature-branch          # Create branch pointing at current commit\ngit reset --hard HEAD~<N>          # Move main back N commits (⚠️ destructive)\ngit checkout feature-branch        # Switch to the new branch\n\n# FIX (safer alternative using cherry-pick):\ngit checkout -b feature-branch     # Create and switch to new branch\ngit checkout main\ngit reset --hard origin/main       # Reset main to remote state\n# Your commits are safely on feature-branch\n\n# VERIFY:\ngit log --oneline main -5\ngit log --oneline feature-branch -5\n\nMerge gone wrong (conflicts everywhere, wrong result)\n\nA merge produced a bad result and you want to start over.\n\n# FIX (merge not yet committed — still in conflict state):\ngit merge --abort\n\n# FIX (merge was committed but not pushed):\ngit reset --hard HEAD~1\n\n# FIX (merge was already pushed): Create a revert commit\ngit revert -m 1 <merge-commit-hash>\n# -m 1 means \"keep the first parent\" (your branch before merge)\ngit push\n\n# VERIFY:\ngit log --oneline --graph -10\ngit diff HEAD~1  # Review what changed\n\nCorrupted git repository\n\nGit commands fail with \"bad object\", \"corrupt\", or \"broken link\" errors.\n\n# DIAGNOSE: Check repository integrity\ngit fsck --full\n\n# FIX (if remote is intact — most common):\n# Save any uncommitted work first\ncp -r . ../repo-backup\n\n# Re-clone and restore local work\ncd ..\ngit clone <remote-url> repo-fresh\ncp -r repo-backup/path/to/uncommitted/files repo-fresh/\n\n# FIX (repair without re-cloning):\n# Remove corrupt objects and fetch them again\ngit fsck --full 2>&1 | grep \"corrupt\\|missing\" | awk '{print $NF}'\n# For each corrupt object:\nrm .git/objects/<first-2-chars>/<remaining-hash>\ngit fetch origin  # Re-download from remote\n\n# VERIFY:\ngit fsck --full  # Should report no errors\ngit log --oneline -5\n\nCredential Leaks\nSecret committed to git (API key, password, token)\n\nA credential is in the git history. Every second counts — automated scrapers monitor public GitHub repos for leaked keys.\n\n# STEP 1: REVOKE THE CREDENTIAL IMMEDIATELY\n# Do this FIRST, before cleaning git history.\n# The credential is already compromised the moment it was pushed publicly.\n\n# AWS keys:\naws iam delete-access-key --access-key-id AKIAXXXXXXXXXXXXXXXX --user-name <user>\n# Then create a new key pair\n\n# GitHub tokens:\n# Go to github.com → Settings → Developer settings → Tokens → Revoke\n\n# Database passwords:\n# Change the password in the database immediately\n# ALTER USER myuser WITH PASSWORD 'new-secure-password';\n\n# Generic API tokens:\n# Revoke in the provider's dashboard, generate new ones\n\n# STEP 2: Remove from current branch\ngit rm --cached <file-with-secret>    # If the whole file is secret\n# OR edit the file to remove the secret, then:\ngit add <file>\n\n# STEP 3: Add to .gitignore\necho \".env\" >> .gitignore\necho \"credentials.json\" >> .gitignore\ngit add .gitignore\n\n# STEP 4: Remove from git history (⚠️ rewrites history)\n# Option A: git-filter-repo (recommended, install with pip install git-filter-repo)\ngit filter-repo --path <file-with-secret> --invert-paths\n\n# Option B: BFG Repo Cleaner (faster for large repos)\n# Download from https://rtyley.github.io/bfg-repo-cleaner/\njava -jar bfg.jar --delete-files <filename> .\ngit reflog expire --expire=now --all\ngit gc --prune=now --aggressive\n\n# STEP 5: Force push the cleaned history\ngit push origin --force --all\ngit push origin --force --tags\n\n# STEP 6: Notify all collaborators to re-clone\n# Their local copies still have the secret in reflog\n\n# VERIFY:\ngit log --all -p -S '<the-secret-string>' --diff-filter=A\n# Should return nothing\n\n.env file pushed to public repo\n# STEP 1: Revoke ALL credentials in that .env file. All of them. Now.\n\n# STEP 2: Remove and ignore\ngit rm --cached .env\necho \".env\" >> .gitignore\ngit add .gitignore\ngit commit -m \"Remove .env from tracking\"\n\n# STEP 3: Remove from history (see credential removal above)\ngit filter-repo --path .env --invert-paths\n\n# STEP 4: Check what was exposed\n# List every variable that was in the .env:\ngit show HEAD~1:.env 2>/dev/null || git log --all -p -- .env | head -50\n# Rotate every single value.\n\n# PREVENTION: Add a pre-commit hook\ncat > .git/hooks/pre-commit << 'HOOK'\n#!/bin/bash\nif git diff --cached --name-only | grep -qE '\\.env$|\\.env\\.local$|credentials'; then\n    echo \"ERROR: Attempting to commit potential secrets file\"\n    echo \"Files: $(git diff --cached --name-only | grep -E '\\.env|credentials')\"\n    exit 1\nfi\nHOOK\nchmod +x .git/hooks/pre-commit\n\nSecret visible in CI/CD logs\n# STEP 1: Revoke the credential immediately\n\n# STEP 2: Delete the CI run/logs if possible\n# GitHub Actions:\ngh run delete <run-id>\n# Or: Settings → Actions → delete specific run\n\n# STEP 3: Fix the pipeline\n# Never echo secrets. Mask them:\n# GitHub Actions: echo \"::add-mask::$MY_SECRET\"\n# GitLab CI: variables are masked if marked as \"Masked\" in settings\n\n# STEP 4: Audit what was exposed\n# Check the log output for patterns like:\n# AKIAXXXXXXXXX (AWS)\n# ghp_XXXXXXXXX (GitHub)\n# sk-XXXXXXXXXXX (OpenAI/Stripe)\n# Any connection strings with passwords\n\nDisk Full Emergencies\nSystem or container disk is full\n\nNothing works — builds fail, logs can't write, services crash.\n\n# DIAGNOSE: What's using space?\ndf -h                          # Which filesystem is full?\ndu -sh /* 2>/dev/null | sort -rh | head -20    # Biggest top-level dirs\ndu -sh /var/log/* | sort -rh | head -10        # Log bloat?\n\n# QUICK WINS (safe to run immediately):\n\n# 1. Docker cleanup (often the #1 cause)\ndocker system df               # See Docker disk usage\ndocker system prune -a -f      # Remove all unused images, containers, networks\ndocker volume prune -f          # Remove unused volumes\ndocker builder prune -a -f      # Remove build cache\n# ⚠️ This removes ALL unused Docker data. Safe if you can re-pull/rebuild.\n\n# 2. Package manager caches\n# npm\nnpm cache clean --force\nrm -rf ~/.npm/_cacache\n\n# pip\npip cache purge\n\n# apt\nsudo apt-get clean\nsudo apt-get autoremove -y\n\n# brew\nbrew cleanup --prune=all\n\n# 3. Log rotation (immediate)\n# Truncate (not delete) large log files to free space instantly\nsudo truncate -s 0 /var/log/syslog\nsudo truncate -s 0 /var/log/journal/*/*.journal  # systemd journals\nfind /var/log -name \"*.log\" -size +100M -exec truncate -s 0 {} \\;\n# Truncate preserves the file handle so services don't break\n\n# 4. Old build artifacts\nfind . -name \"node_modules\" -type d -prune -exec rm -rf {} + 2>/dev/null\nfind . -name \".next\" -type d -exec rm -rf {} + 2>/dev/null\nfind . -name \"dist\" -type d -exec rm -rf {} + 2>/dev/null\nfind /tmp -type f -mtime +7 -delete 2>/dev/null\n\n# 5. Find the actual culprit\nfind / -xdev -type f -size +100M -exec ls -lh {} \\; 2>/dev/null | sort -k5 -rh | head -20\n# Shows files over 100MB, sorted by size\n\n# VERIFY:\ndf -h  # Check free space increased\n\nDocker-specific disk full\n# DIAGNOSE:\ndocker system df -v\n\n# Common culprits:\n# 1. Dangling images from builds\ndocker image prune -f\n\n# 2. Stopped containers accumulating\ndocker container prune -f\n\n# 3. Build cache (often the biggest)\ndocker builder prune -a -f\n\n# 4. Volumes from old containers\ndocker volume ls -qf dangling=true\ndocker volume prune -f\n\n# NUCLEAR OPTION (⚠️ removes EVERYTHING):\ndocker system prune -a --volumes -f\n# You will need to re-pull all images and recreate all volumes\n\n# VERIFY:\ndocker system df\ndf -h\n\nProcess Emergencies\nPort already in use\n# DIAGNOSE: What's using the port?\n# Linux:\nlsof -i :8080\nss -tlnp | grep 8080\n# macOS:\nlsof -i :8080\n# Windows:\nnetstat -ano | findstr :8080\n\n# FIX: Kill the process\nkill $(lsof -t -i :8080)           # Graceful\nkill -9 $(lsof -t -i :8080)       # Force (if graceful didn't work)\n\n# FIX (Windows):\n# Find PID from netstat output, then:\ntaskkill /PID <pid> /F\n\n# FIX (if it's a leftover Docker container):\ndocker ps | grep 8080\ndocker stop <container-id>\n\n# VERIFY:\nlsof -i :8080  # Should return nothing\n\nProcess won't die\n# DIAGNOSE:\nps aux | grep <process-name>\n# Note the PID\n\n# ESCALATION LADDER:\nkill <pid>                # SIGTERM (graceful shutdown)\nsleep 5\nkill -9 <pid>             # SIGKILL (cannot be caught, immediate death)\n\n# If SIGKILL doesn't work, it's a zombie or kernel-stuck process:\n# Check if zombie:\nps aux | grep <pid>\n# State \"Z\" = zombie. The parent must reap it:\nkill -SIGCHLD $(ps -o ppid= -p <pid>)\n# Or kill the parent process\n\n# If truly stuck in kernel (state \"D\"):\n# Only a reboot will fix it. The process is stuck in an I/O syscall.\n\n# MASS CLEANUP: Kill all processes matching a name\npkill -f <pattern>          # Graceful\npkill -9 -f <pattern>      # Force\n\nOut of memory (OOM killed)\n# DIAGNOSE: Was your process OOM-killed?\ndmesg | grep -i \"oom\\|killed process\" | tail -20\njournalctl -k | grep -i \"oom\\|killed\" | tail -20\n\n# Check what's using memory right now:\nps aux --sort=-%mem | head -20        # Top memory consumers\nfree -h                                 # System memory overview\n\n# FIX: Free memory immediately\n# 1. Kill the biggest consumer (if safe to do so)\nkill $(ps aux --sort=-%mem | awk 'NR==2{print $2}')\n\n# 2. Drop filesystem caches (safe, no data loss)\nsync && echo 3 | sudo tee /proc/sys/vm/drop_caches\n\n# 3. Disable swap thrashing (if swap is full)\nsudo swapoff -a && sudo swapon -a\n\n# PREVENT: Set memory limits\n# Docker:\ndocker run --memory=512m --memory-swap=1g myapp\n\n# Systemd service:\n# Add to [Service] section:\n# MemoryMax=512M\n# MemoryHigh=400M\n\n# Node.js:\nnode --max-old-space-size=512 app.js\n\n# VERIFY:\nfree -h\nps aux --sort=-%mem | head -5\n\nDatabase Emergencies\nFailed migration (partially applied)\n# DIAGNOSE: What state is the database in?\n# Check which migrations have run:\n\n# Rails:\nrails db:migrate:status\n\n# Django:\npython manage.py showmigrations\n\n# Knex/Node:\nnpx knex migrate:status\n\n# Prisma:\nnpx prisma migrate status\n\n# Raw SQL — check migration table:\n# PostgreSQL/MySQL:\nSELECT * FROM schema_migrations ORDER BY version DESC LIMIT 10;\n# Or: SELECT * FROM _migrations ORDER BY id DESC LIMIT 10;\n\n# FIX: Roll back the failed migration\n# Most frameworks track migration state. Roll back to last good state:\n\n# Rails:\nrails db:rollback STEP=1\n\n# Django:\npython manage.py migrate <app_name> <previous_migration_number>\n\n# Knex:\nnpx knex migrate:rollback\n\n# FIX (manual): If the framework is confused about state:\n# 1. Check what the migration actually did\n# 2. Manually undo partial changes\n# 3. Delete the migration record from the migrations table\n# 4. Fix the migration code\n# 5. Re-run\n\n# VERIFY:\n# Run the migration again and confirm it applies cleanly\n# Check the affected tables/columns exist correctly\n\nAccidentally dropped a table or database\n# PostgreSQL:\n# If you have WAL archiving / point-in-time recovery configured:\npg_restore -d mydb /backups/latest.dump -t dropped_table\n\n# If no backup exists, check if the transaction is still open:\n# (Only works if you haven't committed yet)\n# Just run ROLLBACK; in your SQL session.\n\n# MySQL:\n# If binary logging is enabled:\nmysqlbinlog /var/log/mysql/mysql-bin.000001 \\\n  --start-datetime=\"2026-02-03 10:00:00\" \\\n  --stop-datetime=\"2026-02-03 10:30:00\" > recovery.sql\n# Review recovery.sql, then apply\n\n# SQLite:\n# If the file still exists, it's fine — SQLite DROP TABLE is within the file\n# Restore from backup:\ncp /backups/db.sqlite3 ./db.sqlite3\n\n# PREVENTION: Always run destructive SQL in a transaction\nBEGIN;\nDROP TABLE users;  -- oops\nROLLBACK;          -- saved\n\nDatabase locked / deadlocked\n# PostgreSQL:\n-- Find blocking queries\nSELECT pid, usename, state, query, wait_event_type, query_start\nFROM pg_stat_activity\nWHERE state != 'idle'\nORDER BY query_start;\n\n-- Find locks\nSELECT blocked_locks.pid AS blocked_pid,\n       blocking_locks.pid AS blocking_pid,\n       blocked_activity.query AS blocked_query,\n       blocking_activity.query AS blocking_query\nFROM pg_catalog.pg_locks blocked_locks\nJOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid\nJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype\nJOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid\nWHERE NOT blocked_locks.granted;\n\n-- Kill blocking query\nSELECT pg_terminate_backend(<blocking_pid>);\n\n# MySQL:\nSHOW PROCESSLIST;\nSHOW ENGINE INNODB STATUS\\G  -- Look for \"LATEST DETECTED DEADLOCK\"\nKILL <process_id>;\n\n# SQLite:\n# SQLite uses file-level locking. Common fix:\n# 1. Find and close all connections\n# 2. Check for .db-journal or .db-wal files (active transactions)\n# 3. If stuck: cp database.db database-fixed.db && mv database-fixed.db database.db\n# This forces SQLite to release the lock by creating a fresh file handle\n\n# VERIFY:\n# Run a simple query to confirm database is responsive\nSELECT 1;\n\nConnection pool exhausted\n# DIAGNOSE:\n# Error messages like: \"too many connections\", \"connection pool exhausted\",\n# \"FATAL: remaining connection slots are reserved for superuser\"\n\n# PostgreSQL — check connection count:\nSELECT count(*), state FROM pg_stat_activity GROUP BY state;\nSELECT max_conn, used, max_conn - used AS available\nFROM (SELECT count(*) AS used FROM pg_stat_activity) t,\n     (SELECT setting::int AS max_conn FROM pg_settings WHERE name='max_connections') m;\n\n# FIX: Kill idle connections\n-- Terminate idle connections older than 5 minutes\nSELECT pg_terminate_backend(pid)\nFROM pg_stat_activity\nWHERE state = 'idle'\nAND query_start < now() - interval '5 minutes';\n\n# FIX: Increase max connections (requires restart)\n# postgresql.conf:\n# max_connections = 200  (default is 100)\n\n# BETTER FIX: Use a connection pooler\n# PgBouncer or pgcat in front of PostgreSQL\n# Application-level: set pool size to match your needs\n# Node.js (pg): { max: 20 }\n# Python (SQLAlchemy): pool_size=20, max_overflow=10\n# Go (database/sql): db.SetMaxOpenConns(20)\n\n# VERIFY:\nSELECT count(*) FROM pg_stat_activity;\n# Should be well below max_connections\n\nDeploy Emergencies\nQuick rollback\n# Git-based deploys:\ngit log --oneline -5 origin/main\ngit revert HEAD                    # Create a revert commit\ngit push origin main               # Deploy the revert\n# Revert is safer than reset because it preserves history\n\n# Docker/container deploys:\n# Roll back to previous image tag\ndocker pull myapp:previous-tag\ndocker stop myapp-current\ndocker run -d --name myapp myapp:previous-tag\n\n# Kubernetes:\nkubectl rollout undo deployment/myapp\nkubectl rollout status deployment/myapp    # Watch rollback progress\n\n# Heroku:\nheroku releases\nheroku rollback v<previous-version>\n\n# AWS ECS:\naws ecs update-service --cluster mycluster --service myservice \\\n  --task-definition myapp:<previous-revision>\n\n# VERIFY:\n# Hit the health check endpoint\ncurl -s -o /dev/null -w \"%{http_code}\" https://myapp.example.com/health\n# Should return 200\n\nContainer won't start\n# DIAGNOSE: Why did it fail?\ndocker logs <container-id> --tail 100\ndocker inspect <container-id> | grep -A5 \"State\"\n\n# Common causes and fixes:\n\n# 1. \"exec format error\" — wrong platform (built for arm64, running on amd64)\ndocker build --platform linux/amd64 -t myapp .\n\n# 2. \"permission denied\" — file not executable or wrong user\n# In Dockerfile:\nRUN chmod +x /app/entrypoint.sh\n# Or: USER root before the command, then drop back\n\n# 3. \"port already allocated\" — another container or process on that port\ndocker ps -a | grep <port>\ndocker stop <conflicting-container>\n\n# 4. \"no such file or directory\" — entrypoint or CMD path is wrong\ndocker run -it --entrypoint sh myapp  # Get a shell to debug\nls -la /app/                           # Check what's actually there\n\n# 5. Healthcheck failing → container keeps restarting\ndocker inspect <container-id> --format='{{json .State.Health}}'\n# Temporarily disable healthcheck to get logs:\ndocker run --no-healthcheck myapp\n\n# 6. Out of memory — container OOM killed\ndocker inspect <container-id> --format='{{.State.OOMKilled}}'\n# If true: docker run --memory=1g myapp\n\n# VERIFY:\ndocker ps  # Container should show \"Up\" status\ndocker logs <container-id> --tail 5  # No errors\n\nSSL certificate expired\n# DIAGNOSE: Check certificate expiry\necho | openssl s_client -connect mysite.com:443 -servername mysite.com 2>/dev/null | \\\n  openssl x509 -noout -dates\n# notAfter shows expiry date\n\n# FIX (Let's Encrypt — most common):\nsudo certbot renew --force-renewal\nsudo systemctl reload nginx   # or: sudo systemctl reload apache2\n\n# FIX (manual certificate):\n# 1. Get new certificate from your CA\n# 2. Replace files:\nsudo cp new-cert.pem /etc/ssl/certs/mysite.pem\nsudo cp new-key.pem /etc/ssl/private/mysite.key\n# 3. Reload web server\nsudo nginx -t && sudo systemctl reload nginx\n\n# FIX (AWS ACM):\n# ACM auto-renews if DNS validation is configured.\n# If email validation: check the admin email for renewal link\n# If stuck: request a new certificate in ACM and update the load balancer\n\n# PREVENTION: Auto-renewal with monitoring\n# Cron job to check expiry and alert:\necho '0 9 * * 1 echo | openssl s_client -connect mysite.com:443 2>/dev/null | openssl x509 -checkend 604800 -noout || echo \"CERT EXPIRES WITHIN 7 DAYS\" | mail -s \"SSL ALERT\" admin@example.com' | crontab -\n\n# VERIFY:\ncurl -sI https://mysite.com | head -5\n# Should return HTTP/2 200, not certificate errors\n\nAccess Emergencies\nSSH locked out\n# DIAGNOSE: Why can't you connect?\nssh -vvv user@host  # Verbose output shows where it fails\n\n# Common causes:\n\n# 1. Key not accepted — wrong key, permissions, or authorized_keys issue\nssh -i ~/.ssh/specific_key user@host  # Try explicit key\nchmod 600 ~/.ssh/id_rsa               # Fix key permissions\nchmod 700 ~/.ssh                       # Fix .ssh dir permissions\n\n# 2. \"Connection refused\" — sshd not running or firewall blocking\n# If you have console access (cloud provider's web console):\nsudo systemctl start sshd\nsudo systemctl status sshd\n\n# 3. Firewall blocking port 22\n# Cloud console:\nsudo ufw allow 22/tcp       # Ubuntu\nsudo firewall-cmd --add-service=ssh --permanent && sudo firewall-cmd --reload  # CentOS\n\n# 4. Changed SSH port and forgot\n# Try common alternate ports:\nssh -p 2222 user@host\nssh -p 22222 user@host\n# Or check from console: grep -i port /etc/ssh/sshd_config\n\n# 5. IP changed / DNS stale\nping hostname    # Verify IP resolution\nssh user@<direct-ip>  # Try IP instead of hostname\n\n# 6. Locked out after too many attempts (fail2ban)\n# From console:\nsudo fail2ban-client set sshd unbanip <your-ip>\n# Or wait for the ban to expire (usually 10 min)\n\n# CLOUD PROVIDER ESCAPE HATCHES:\n# AWS: EC2 → Instance → Connect → Session Manager (no SSH needed)\n# GCP: Compute → VM instances → SSH (browser-based)\n# Azure: VM → Serial console\n# DigitalOcean: Droplet → Access → Console\n\n# VERIFY:\nssh user@host echo \"connection works\"\n\nLost sudo access\n# If you have physical/console access:\n# 1. Boot into single-user/recovery mode\n#    - Reboot, hold Shift (GRUB), select \"recovery mode\"\n#    - Or add init=/bin/bash to kernel command line\n\n# 2. Remount filesystem read-write\nmount -o remount,rw /\n\n# 3. Fix sudo access\nusermod -aG sudo <username>    # Debian/Ubuntu\nusermod -aG wheel <username>   # CentOS/RHEL\n# Or edit directly:\nvisudo\n# Add: username ALL=(ALL:ALL) ALL\n\n# 4. Reboot normally\nreboot\n\n# If you have another sudo/root user:\nsu - other-admin\nsudo usermod -aG sudo <locked-user>\n\n# CLOUD: Use the provider's console or reset the instance\n# AWS: Create an AMI, launch new instance, mount old root volume, fix\n\nNetwork Emergencies\nNothing connects (total network failure)\n# DIAGNOSE: Isolate the layer\n# 1. Is the network interface up?\nip addr show         # or: ifconfig\nping 127.0.0.1       # Loopback works?\n\n# 2. Can you reach the gateway?\nip route | grep default\nping <gateway-ip>\n\n# 3. Can you reach the internet by IP?\nping 8.8.8.8          # Google DNS\nping 1.1.1.1          # Cloudflare DNS\n\n# 4. Is DNS working?\nnslookup google.com\ndig google.com\n\n# DECISION TREE:\n# ping 127.0.0.1 fails      → network stack broken, restart networking\n# ping gateway fails         → local network issue (cable, wifi, DHCP)\n# ping 8.8.8.8 fails        → routing/firewall issue\n# ping 8.8.8.8 works but    → DNS issue\n#   nslookup fails\n\n# FIX: DNS broken\necho \"nameserver 8.8.8.8\" | sudo tee /etc/resolv.conf\n# Or: sudo systemd-resolve --flush-caches\n\n# FIX: Interface down\nsudo ip link set eth0 up\nsudo dhclient eth0        # Request new DHCP lease\n\n# FIX: Restart networking entirely\nsudo systemctl restart NetworkManager    # Desktop Linux\nsudo systemctl restart networking        # Server\nsudo systemctl restart systemd-networkd  # Systemd-based\n\n# Docker: Container can't reach the internet\ndocker run --rm alpine ping 8.8.8.8  # Test from container\n# If fails:\nsudo systemctl restart docker    # Often fixes Docker networking\n# Or: docker network prune\n\nDNS not propagating after change\n# DIAGNOSE: Check what different DNS servers see\ndig @8.8.8.8 mysite.com        # Google\ndig @1.1.1.1 mysite.com        # Cloudflare\ndig @ns1.yourdns.com mysite.com # Authoritative nameserver\n\n# Check TTL (time remaining before caches expire):\ndig mysite.com | grep -i ttl\n\n# REALITY CHECK:\n# DNS propagation takes time. TTL controls this.\n# TTL 300 = 5 minutes. TTL 86400 = 24 hours.\n# You cannot speed this up. You can only wait.\n\n# FIX: If authoritative nameserver has wrong records\n# Update the record at your DNS provider (Cloudflare, Route53, etc.)\n# Then flush your local cache:\n# macOS:\nsudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder\n# Linux:\nsudo systemd-resolve --flush-caches\n# Windows:\nipconfig /flushdns\n\n# WORKAROUND: While waiting for propagation\n# Add to /etc/hosts for immediate local effect:\necho \"93.184.216.34 mysite.com\" | sudo tee -a /etc/hosts\n# Remove this after propagation completes!\n\n# VERIFY:\ndig +short mysite.com  # Should show new IP/record\n\nFile Emergencies\nAccidentally deleted files (not in git)\n# DIAGNOSE: Are the files recoverable?\n\n# If the process still has the file open:\nlsof | grep deleted\n# Then recover from /proc:\ncp /proc/<pid>/fd/<fd-number> /path/to/restored-file\n\n# If recently deleted on ext4 (Linux):\n# Install extundelete or testdisk\nsudo extundelete /dev/sda1 --restore-file path/to/file\n# Or use testdisk interactively for a better UI\n\n# macOS:\n# Check Trash first: ~/.Trash/\n# Time Machine: tmutil restore /path/to/file\n\n# PREVENTION:\n# Use trash-cli instead of rm:\n# npm install -g trash-cli\n# trash file.txt  (moves to trash instead of permanent delete)\n# Or alias: alias rm='echo \"Use trash instead\"; false'\n\nWrong permissions applied recursively\n# \"I ran chmod -R 777 /\" or \"chmod -R 000 /important/dir\"\n\n# FIX: Common default permissions\n# For a web project:\nfind /path -type d -exec chmod 755 {} \\;  # Directories: rwxr-xr-x\nfind /path -type f -exec chmod 644 {} \\;  # Files: rw-r--r--\nfind /path -name \"*.sh\" -exec chmod 755 {} \\;  # Scripts: executable\n\n# For SSH:\nchmod 700 ~/.ssh\nchmod 600 ~/.ssh/id_rsa\nchmod 644 ~/.ssh/id_rsa.pub\nchmod 600 ~/.ssh/authorized_keys\nchmod 644 ~/.ssh/config\n\n# For a system directory (⚠️ serious — may need rescue boot):\n# If /etc permissions are broken:\n# Boot from live USB, mount the drive, fix permissions\n# Reference: dpkg --verify (Debian) or rpm -Va (RHEL) to compare against package defaults\n\n# VERIFY:\nls -la /path/to/fixed/directory\n\nThe Universal Diagnostic\n\nWhen you don't know what's wrong, run this sequence:\n\n#!/bin/bash\n# emergency-diagnostic.sh — Quick system health check\n\necho \"=== DISK ===\"\ndf -h | grep -E '^/|Filesystem'\n\necho -e \"\\n=== MEMORY ===\"\nfree -h\n\necho -e \"\\n=== CPU / LOAD ===\"\nuptime\n\necho -e \"\\n=== TOP PROCESSES (by CPU) ===\"\nps aux --sort=-%cpu | head -6\n\necho -e \"\\n=== TOP PROCESSES (by MEM) ===\"\nps aux --sort=-%mem | head -6\n\necho -e \"\\n=== NETWORK ===\"\nping -c 1 -W 2 8.8.8.8 > /dev/null 2>&1 && echo \"Internet: OK\" || echo \"Internet: UNREACHABLE\"\nping -c 1 -W 2 $(ip route | awk '/default/{print $3}') > /dev/null 2>&1 && echo \"Gateway: OK\" || echo \"Gateway: UNREACHABLE\"\n\necho -e \"\\n=== RECENT ERRORS ===\"\njournalctl -p err --since \"1 hour ago\" --no-pager | tail -20 2>/dev/null || \\\n  dmesg | tail -20\n\necho -e \"\\n=== DOCKER (if running) ===\"\ndocker ps --format \"table {{.Names}}\\t{{.Status}}\\t{{.Ports}}\" 2>/dev/null || echo \"Docker not running\"\ndocker system df 2>/dev/null || true\n\necho -e \"\\n=== LISTENING PORTS ===\"\nss -tlnp 2>/dev/null | head -15 || netstat -tlnp 2>/dev/null | head -15\n\necho -e \"\\n=== FAILED SERVICES ===\"\nsystemctl --failed 2>/dev/null || true\n\n\nRun it, read the output, then jump to the relevant section above.\n\nTips\nRevoke credentials before cleaning git history. The moment a secret is pushed publicly, automated scrapers have it within minutes. Cleaning the history is important but secondary to revocation.\ngit reflog is your undo button. It records every HEAD movement for 30+ days. Lost commits, bad rebases, accidental resets — the reflog has the recovery hash. Learn to read it before you need it.\nTruncate log files, don't delete them. truncate -s 0 file.log frees disk space instantly while keeping the file handle open. Deleting a log file that a process has open won't free space until the process restarts.\n--force-with-lease instead of --force. Always. It fails if someone else has pushed, preventing you from overwriting their work on top of your recovery.\nEvery recovery operation should end with verification. Run the diagnostic command, check the output, confirm the fix worked. Don't assume — confirm.\nDocker is the #1 disk space thief on developer machines. docker system prune -a is almost always safe on development machines and can recover tens of gigabytes.\nDatabase emergencies: wrap destructive operations in transactions. BEGIN; DROP TABLE users; ROLLBACK; costs nothing and saves everything. Make it muscle memory.\nWhen SSH is locked out, every cloud provider has a console escape hatch. AWS Session Manager, GCP browser SSH, Azure Serial Console. Know where yours is before you need it.\nThe order matters: diagnose → fix → verify. Skipping diagnosis leads to wrong fixes. Skipping verification leads to false confidence. Follow the sequence every time.\nKeep this skill installed. You won't need it most days. The day you do need it, you'll need it immediately."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/gitgoodordietrying/emergency-rescue",
    "publisherUrl": "https://clawhub.ai/gitgoodordietrying/emergency-rescue",
    "owner": "gitgoodordietrying",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/emergency-rescue",
    "downloadUrl": "https://openagent3.xyz/downloads/emergency-rescue",
    "agentUrl": "https://openagent3.xyz/skills/emergency-rescue/agent",
    "manifestUrl": "https://openagent3.xyz/skills/emergency-rescue/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/emergency-rescue/agent.md"
  }
}