Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Guide structured, blameless post-mortems with root cause analysis, action tracking, and prevention steps to reduce repeat production incidents and outages.
Guide structured, blameless post-mortems with root cause analysis, action tracking, and prevention steps to reduce repeat production incidents and outages.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Run structured post-mortems that actually prevent repeat failures. Blameless analysis, root cause identification, and action tracking.
After any production incident, outage, or service degradation After a missed deadline, failed launch, or lost deal After any event costing >$5K or >4 hours of team time Quarterly review of recurring incident patterns
Incident ID: [AUTO-GENERATED] Date/Time: [Start] โ [End] (Duration: X hours) Severity: SEV-1 (revenue impact) | SEV-2 (customer impact) | SEV-3 (internal impact) Impact: [Users affected] | [Revenue lost] | [SLA breached Y/N] Detection: How was it found? (Monitoring / Customer report / Internal discovery) Detection Delay: Time from incident start โ first alert
HH:MM - Event description HH:MM - First alert triggered HH:MM - Team notified HH:MM - Investigation started HH:MM - Root cause identified HH:MM - Fix deployed HH:MM - Confirmed resolved
Why 1: [Direct cause] Why 2: [Why did that happen?] Why 3: [Why did THAT happen?] Why 4: [Systemic cause] Why 5: [Organizational/cultural root]
Score each factor 0-3 (0=not a factor, 3=primary contributor): FactorScoreNotesMissing/inadequate monitoringInsufficient testingDocumentation gapsProcess not followedKnowledge concentration (bus factor)Capacity/scaling limitsThird-party dependencyCommunication breakdownChange management failureTechnical debt
List 3-5 things that worked during the response: Fast detection? Good runbooks? Strong communication? Quick escalation?
Every action MUST have an owner and deadline: #ActionOwnerDeadlinePriorityStatus1P0/P1/P2Open Priority definitions: P0: Must complete before next business day P1: Must complete within 1 week P2: Must complete within 1 sprint/month
Monitoring added/improved for this failure mode Runbook created/updated Test coverage added Architecture change needed? (If yes, create RFC) Training needed for team?
Focus on systems, not individuals "What happened" not "who did it" Assume everyone acted with best intentions and available information The goal is learning, not punishment If you find yourself writing someone's name next to a mistake, rewrite it as a process gap
Direct costs: Revenue lost during downtime: $___ SLA credits issued: $___ Emergency vendor/contractor costs: $___ Indirect costs: Engineering hours ร loaded rate: ___ hrs ร $___/hr = $___ Customer churn risk (affected users ร churn probability ร LTV): $___ Brand/reputation (estimate): $___ Total incident cost: $___ Cost per minute of downtime: $___
Every quarter, analyze patterns across all post-mortems: Top 3 root cause categories โ Where should you invest in prevention? Mean time to detect (MTTD) โ Is monitoring improving? Mean time to resolve (MTTR) โ Is response getting faster? Action item completion rate โ Are you actually fixing things? Repeat incidents โ Same root cause twice = systemic failure Cost trend โ Total incident cost per quarter (should decrease)
IndustryKey FocusRegulatory RequirementFintechTransaction integrity, audit trailSOX, PCI-DSS incident reportingHealthcarePHI exposure, patient safetyHIPAA breach notification (60 days)SaaSSLA compliance, data integritySOC 2 incident managementE-commerceOrder integrity, payment processingPCI-DSS, consumer protectionManufacturingSafety incidents, production lossOSHA reporting requirements
Your post-mortems reveal where AI agents should be deployed first โ the repetitive failures, the manual monitoring gaps, the processes that break under load. Find your highest-cost gaps: AI Revenue Leak Calculator Industry-specific deployment playbooks: AfrexAI Context Packs โ $47 Pick 3: $97 | All 10: $197 | Everything: $247 Deploy your first agent: Agent Setup Wizard Built by AfrexAI โ turning incident patterns into automation opportunities.
Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.
Largest current source with strong distribution and engagement signals.