Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Manages technology incidents end-to-end with severity classification, timeline reconstruction, post-incident reviews, communication templates, and runbook in...
Manages technology incidents end-to-end with severity classification, timeline reconstruction, post-incident reviews, communication templates, and runbook in...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Category: Engineering Team Tier: POWERFUL Author: Claude Skills Team Version: 1.0.0 Last Updated: February 2026
The Incident Commander skill provides a comprehensive incident response framework for managing technology incidents from detection through resolution and post-incident review. This skill implements battle-tested practices from SRE and DevOps teams at scale, providing structured tools for severity classification, timeline reconstruction, and thorough post-incident analysis.
Automated Severity Classification - Intelligent incident triage based on impact and urgency metrics Timeline Reconstruction - Transform scattered logs and events into coherent incident narratives Post-Incident Review Generation - Structured PIRs with multiple RCA frameworks Communication Templates - Pre-built templates for stakeholder updates and escalations Runbook Integration - Generate actionable runbooks from incident patterns
Incident Classifier (incident_classifier.py) Analyzes incident descriptions and outputs severity levels Recommends response teams and initial actions Generates communication templates based on severity Timeline Reconstructor (timeline_reconstructor.py) Processes timestamped events from multiple sources Reconstructs chronological incident timeline Identifies gaps and provides duration analysis PIR Generator (pir_generator.py) Creates comprehensive Post-Incident Review documents Applies multiple RCA frameworks (5 Whys, Fishbone, Timeline) Generates actionable follow-up items
SEV1 - Critical Outage Definition: Complete service failure affecting all users or critical business functions Characteristics: Customer-facing services completely unavailable Data loss or corruption affecting users Security breaches with customer data exposure Revenue-generating systems down SLA violations with financial penalties Response Requirements: Immediate escalation to on-call engineer Incident Commander assigned within 5 minutes Executive notification within 15 minutes Public status page update within 15 minutes War room established All hands on deck if needed Communication Frequency: Every 15 minutes until resolution SEV2 - Major Impact Definition: Significant degradation affecting subset of users or non-critical functions Characteristics: Partial service degradation (>25% of users affected) Performance issues causing user frustration Non-critical features unavailable Internal tools impacting productivity Data inconsistencies not affecting user experience Response Requirements: On-call engineer response within 15 minutes Incident Commander assigned within 30 minutes Status page update within 30 minutes Stakeholder notification within 1 hour Regular team updates Communication Frequency: Every 30 minutes during active response SEV3 - Minor Impact Definition: Limited impact with workarounds available Characteristics: Single feature or component affected <25% of users impacted Workarounds available Performance degradation not significantly impacting UX Non-urgent monitoring alerts Response Requirements: Response within 2 hours during business hours Next business day response acceptable outside hours Internal team notification Optional status page update Communication Frequency: At key milestones only SEV4 - Low Impact Definition: Minimal impact, cosmetic issues, or planned maintenance Characteristics: Cosmetic bugs Documentation issues Logging or monitoring gaps Performance issues with no user impact Development/test environment issues Response Requirements: Response within 1-2 business days Standard ticket/issue tracking No special escalation required Communication Frequency: Standard development cycle updates
Primary Responsibilities Command and Control Own the incident response process Make critical decisions about resource allocation Coordinate between technical teams and stakeholders Maintain situational awareness across all response streams Communication Hub Provide regular updates to stakeholders Manage external communications (status pages, customer notifications) Facilitate effective communication between response teams Shield responders from external distractions Process Management Ensure proper incident tracking and documentation Drive toward resolution while maintaining quality Coordinate handoffs between team members Plan and execute rollback strategies if needed Post-Incident Leadership Ensure thorough post-incident reviews are conducted Drive implementation of preventive measures Share learnings with broader organization Decision-Making Framework Emergency Decisions (SEV1/2): Incident Commander has full authority Bias toward action over analysis Document decisions for later review Consult subject matter experts but don't get blocked Resource Allocation: Can pull in any necessary team members Authority to escalate to senior leadership Can approve emergency spend for external resources Make call on communication channels and timing Technical Decisions: Lean on technical leads for implementation details Make final calls on trade-offs between speed and risk Approve rollback vs. fix-forward strategies Coordinate testing and validation approaches
Stakeholder Classification Internal Stakeholders: Engineering Leadership - Technical decisions and resource allocation Product Management - Customer impact assessment and feature implications Customer Support - User communication and support ticket management Sales/Account Management - Customer relationship management for enterprise clients Executive Team - Business impact decisions and external communication approval Legal/Compliance - Regulatory reporting and liability assessment External Stakeholders: Customers - Service availability and impact communication Partners - API availability and integration impacts Vendors - Third-party service dependencies and support escalation Regulators - Compliance reporting for regulated industries Public/Media - Transparency for public-facing outages Communication Cadence by Stakeholder StakeholderSEV1SEV2SEV3SEV4Engineering LeadershipReal-time30min4hrsDailyExecutive Team15min1hrEODWeeklyCustomer SupportReal-time30min2hrsAs neededCustomers15min1hrOptionalNonePartners30min2hrsOptionalNone
# Quick classification from stdin echo "API rate limits causing customer API calls to fail" | python scripts/incident_classifier.py --format text # Build timeline from multiple sources python scripts/timeline_reconstructor.py --input assets/api_incident_logs.json --detect-phases --gap-analysis # Generate comprehensive PIR python scripts/pir_generator.py --incident assets/api_incident_summary.json --rca-method fishbone --action-items
Maintain Calm Leadership Stay composed under pressure Make decisive calls with incomplete information Communicate confidence while acknowledging uncertainty Document Everything All actions taken and their outcomes Decision rationale, especially for controversial calls Timeline of events as they happen Effective Communication Use clear, jargon-free language Provide regular updates even when there's no new information Manage stakeholder expectations proactively Technical Excellence Prefer rollbacks to risky fixes under pressure Validate fixes before declaring resolution Plan for secondary failures and cascading effects
Blameless Culture Focus on system failures, not individual mistakes Encourage honest reporting of what went wrong Celebrate learning and improvement opportunities Action Item Discipline Assign specific owners and due dates Track progress publicly Prioritize based on risk and effort Knowledge Sharing Share PIRs broadly within the organization Update runbooks based on lessons learned Conduct training sessions for common failure modes Continuous Improvement Look for patterns across multiple incidents Invest in tooling and automation Regularly review and update processes
PagerDuty/Opsgenie integration for escalation Datadog/Grafana for metrics and dashboards ELK/Splunk for log analysis and correlation
Slack/Teams for war room coordination Zoom/Meet for video bridges Status page providers (Statuspage.io, etc.)
Confluence/Notion for PIR storage GitHub/GitLab for runbook version control JIRA/Linear for action item tracking
CI/CD pipeline integration Deployment tracking systems Feature flag platforms for quick rollbacks
The Incident Commander skill provides a comprehensive framework for managing incidents from detection through post-incident review. By implementing structured processes, clear communication templates, and thorough analysis tools, teams can improve their incident response capabilities and build more resilient systems. The key to successful incident management is preparation, practice, and continuous learning. Use this framework as a starting point, but adapt it to your organization's specific needs, culture, and technical environment. Remember: The goal isn't to prevent all incidents (which is impossible), but to detect them quickly, respond effectively, communicate clearly, and learn continuously.
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.