Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Set up SLA monitoring and uptime tracking for AI agents and services. Generates monitoring configs, alert rules, and incident response playbooks. Use when de...
Set up SLA monitoring and uptime tracking for AI agents and services. Generates monitoring configs, alert rules, and incident response playbooks. Use when de...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Help teams set up production-grade monitoring for AI agents and automated services. Covers uptime tracking, response time SLAs, error budgets, and incident escalation.
Deploying AI agents to production Setting up monitoring for client-facing automation Creating SLA documentation for service agreements Building incident response procedures
50 monitors free, 5-minute intervals HTTP, keyword, ping, port monitors Email + Slack + webhook alerts
Status pages included Incident management built-in Free tier: 10 monitors
docker run -d --restart=always -p 3001:3001 -v uptime-kuma:/app/data --name uptime-kuma louislam/uptime-kuma:1
99.5% uptime guarantee (43.8h downtime/year) Response within 4 hours (business hours) Monthly performance report
99.9% uptime guarantee (8.76h downtime/year) Response within 1 hour (business hours) Weekly performance reports Quarterly optimization reviews
99.95% uptime (4.38h downtime/year) Response within 15 minutes (24/7) Real-time dashboard access Dedicated support channel
monitors: - name: "Agent Health Check" type: http url: "https://your-agent-endpoint/health" interval: 300 # 5 minutes alerts: - type: email threshold: 1 # alert after 1 failure - type: slack webhook: "${SLACK_WEBHOOK}" threshold: 2 # alert after 2 consecutive failures - type: sms threshold: 3 # escalate after 3 failures - name: "API Response Time" type: http url: "https://your-agent-endpoint/api" interval: 60 expected_response_time: 2000 # ms alerts: - type: slack condition: "response_time > 5000" error_budget: monthly_target: 99.9 burn_rate_alert: 2.0 # Alert if burning 2x normal rate
Acknowledge within 5 minutes Status page update within 10 minutes Root cause identification within 30 minutes Resolution or workaround within 2 hours Post-mortem within 24 hours
Acknowledge within 15 minutes Investigation within 30 minutes Resolution within 4 hours Summary report within 48 hours
Acknowledge within 1 hour Resolution within 24 hours Logged for next review cycle
Monthly minutes: 43,200 (30 days) 99.9% SLA = 43.2 minutes downtime allowed 99.5% SLA = 216 minutes downtime allowed 99.0% SLA = 432 minutes downtime allowed Burn rate = (actual downtime / budget) ร 100 If burn rate > 50% with 2+ weeks remaining โ review needed If burn rate > 80% โ freeze deployments
Provide clients with a public status page showing: Current system status (operational / degraded / outage) Component-level status (Agent A, Agent B, API, Dashboard) Uptime percentage (30-day rolling) Incident history with resolution notes Scheduled maintenance windows
Need managed AI agents with built-in SLA monitoring? โ AfrexAI handles deployment, monitoring, and maintenance for $1,500/mo โ Book a call: https://calendly.com/cbeckford-afrexai/30min โ Learn more: https://afrexai-cto.github.io/aaas/landing.html
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.