Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Meta-skill that orchestrates logging, monitoring, error handling, performance, security, deployment, and testing skills to ensure a service is fully production-ready before launch. Use before first deploy, major releases, quarterly reviews, or after incidents.
Meta-skill that orchestrates logging, monitoring, error handling, performance, security, deployment, and testing skills to ensure a service is fully production-ready before launch. Use before first deploy, major releases, quarterly reviews, or after incidents.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Coordinates all operational concerns into a single readiness review. Instead of duplicating domain expertise, this skill routes to specialized skills and agents for each area, then synthesizes results into a unified go/no-go assessment.
npx clawhub@latest install production-readiness
Ensure a service is production-ready by systematically checking every operational concern โ logging, error handling, performance, security, deployment, testing, and documentation โ before traffic hits it. A production-ready service: Fails gracefully under load and partial outages Observes itself with structured logs, metrics, and traces Recovers automatically from transient failures Communicates health to orchestrators and operators Documents operations so on-call engineers can respond without tribal knowledge
TriggerContextBefore first deployNew service going to production for the first timeBefore major releaseSignificant feature or architectural change shippingQuarterly production reviewScheduled audit of existing servicesAfter incidentPost-incident hardening to prevent recurrenceDependency upgradeMajor framework, runtime, or infrastructure changeTeam handoffTransferring ownership of a service to another team
Run each area sequentially or in parallel. Each step delegates to a specialized skill or agent โ this skill does not re-implement their logic. โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Production Readiness Review โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ โ โ 1. Logging & Observability โโโบ logging-observability skill โ 2. Error Handling โโโโโโโโโโโโบ error-handling-patterns skill โ 3. Performance โโโโโโโโโโโโโโโบ performance-agent โ 4. Security โโโโโโโโโโโโโโโโโโบ security-review meta-skill โ 5. Deployment โโโโโโโโโโโโโโโโบ deployment-agent + docker-expert skill โ 6. Testing โโโโโโโโโโโโโโโโโโโบ testing-workflow meta-skill โ 7. Documentation โโโโโโโโโโโโโบ /generate-docs command โ โ โ โโโบ Synthesize results into go/no-go report โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Logging & Observability โ Structured logging, log levels, correlation IDs, metrics endpoints, distributed tracing, alerting rules Error Handling โ Global error boundaries, retry policies, dead-letter queues, error classification, user-facing error messages Performance โ Load testing results, P95/P99 latency baselines, memory/CPU profiling, database query analysis, caching strategy Security โ Auth/authz verification, input validation, dependency audit, secrets management, OWASP top-10 review Deployment โ Container hardening, rollback strategy, blue-green/canary configuration, infrastructure-as-code review Testing โ Unit/integration/e2e coverage, contract tests, chaos/failure injection, smoke tests in staging Documentation โ API docs, runbooks, architecture diagrams, on-call playbooks, ADRs for key decisions
ConcernSkill / AgentPathLogging & Observabilitylogging-observabilityai/skills/tools/logging-observability/SKILL.mdError Handlingerror-handling-patternsai/skills/backend/error-handling-patterns/SKILL.mdPerformanceperformance-agentai/agents/performance/Securitysecurity-reviewai/skills/meta/security-review/SKILL.mdDeployment (containers)docker-expertai/skills/devops/docker/SKILL.mdDeployment (pipelines)deployment-agentai/agents/deployment/Testingtesting-workflowai/skills/testing/testing-workflow/SKILL.mdRate Limitingrate-limiting-patternsai/skills/backend/rate-limiting-patterns/SKILL.mdDocumentation/generate-docsai/commands/documentation/ Routing rule: Read the target skill first, follow its instructions, then return results here for synthesis.
Health check endpoint (/healthz or /health) returns dependency status Readiness probe distinguishes "starting" from "ready to serve" Liveness probe detects deadlocks and unrecoverable states Graceful shutdown drains in-flight requests before exit Startup probe allows for slow initialization without false restarts
Circuit breakers on all external service calls Retry with exponential backoff and jitter on transient failures Rate limiting configured per endpoint and per client Backpressure mechanisms prevent cascade failures under load Timeouts set on every outbound call (HTTP, DB, queue) Bulkhead isolation separates critical from non-critical paths
All configuration externalized (env vars, config service, or feature flags) No secrets in code, images, or environment variable defaults Secrets loaded from a vault (e.g., AWS Secrets Manager, HashiCorp Vault) Configuration changes do not require redeployment Feature flags in place for high-risk changes
Backup strategy defined and tested (RPO/RTO documented) Restore procedure verified in a non-production environment Database migrations are backward-compatible and reversible Data retention policies implemented and enforced
Runbooks exist for top 5 most likely failure scenarios SLOs defined (availability, latency, error rate) with error budgets SLAs communicated to dependent teams or customers On-call rotation staffed and escalation path documented Dashboards show golden signals (latency, traffic, errors, saturation) Alerting rules configured with appropriate thresholds and severity
LevelNameRequirementsL1MVPHealth check, basic logging, error handling, manual deploy, unit tests, READMEL2StableStructured logging, metrics, graceful shutdown, CI/CD pipeline, integration tests, runbooksL3ResilientDistributed tracing, circuit breakers, auto-scaling, chaos testing, SLOs, on-call rotationL4OptimizedAdaptive rate limiting, predictive alerting, canary deploys, full observability, error budgets, postmortem culture
L1 โ L2: Add structured logging, metrics endpoint, and a CI/CD pipeline. Write runbooks for known failure modes. L2 โ L3: Instrument distributed tracing. Add circuit breakers to external calls. Define SLOs and set up on-call. L3 โ L4: Implement canary deployments. Adopt error budgets. Run regular game days. Build predictive alerting.
Minimum two engineers per rotation (primary + secondary) Handoff includes review of recent deploys, open issues, and known risks Escalation targets defined: primary โ secondary โ engineering lead โ VP Eng
SeverityResponse TimeEscalation AfterStakeholder NotificationSEV-1 (outage)15 min30 minImmediate โ exec + customersSEV-2 (degraded)30 min1 hourWithin 1 hour โ eng leadSEV-3 (minor)4 hoursNext business dayDaily standupSEV-4 (cosmetic)Next sprintN/ABacklog
NEVER skip health checks โ every service must expose health endpoints; no exceptions for "simple" services NEVER store secrets in code or container images โ use a secrets manager; never default env vars with real values NEVER deploy without a rollback plan โ if you cannot roll back in under 5 minutes, you are not ready to deploy NEVER ignore error budget violations โ when the error budget is exhausted, freeze feature work and fix reliability NEVER treat logging as optional โ a service without structured logging is a service you cannot debug at 3 AM NEVER go to production without runbooks โ if on-call cannot resolve the top 5 failure modes without the original author, the service is not production-ready
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.