# Send Prometheus to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "prometheus-devops",
    "name": "Prometheus",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/wpank/prometheus-devops",
    "canonicalUrl": "https://clawhub.ai/wpank/prometheus-devops",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/prometheus-devops",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=prometheus-devops",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "README.md",
      "SKILL.md",
      "templates/recording-rules.yml",
      "templates/alert-rules.yml",
      "templates/prometheus.yml"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "prometheus-devops",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-07T11:16:56.222Z",
      "expiresAt": "2026-05-14T11:16:56.222Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=prometheus-devops",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=prometheus-devops",
        "contentDisposition": "attachment; filename=\"prometheus-devops-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "prometheus-devops"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/prometheus-devops"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/prometheus-devops",
    "downloadUrl": "https://openagent3.xyz/downloads/prometheus-devops",
    "agentUrl": "https://openagent3.xyz/skills/prometheus-devops/agent",
    "manifestUrl": "https://openagent3.xyz/skills/prometheus-devops/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/prometheus-devops/agent.md"
  }
}
```
## Documentation

### Prometheus

Production Prometheus setup covering scrape configuration, service discovery,
recording rules, alert rules, and operational best practices for infrastructure
and application monitoring.

### When to Use

ScenarioExampleSet up metrics collectionNew service needs Prometheus scrapingConfigure service discoveryK8s pods, file-based, or static targetsCreate recording rulesPre-compute expensive PromQL queriesDesign alert rulesSLO-based alerts for availability and latencyProduction deploymentHA setup with retention and storage planningTroubleshoot scrapingTargets down, metrics missing, relabeling issues

### Architecture

Applications ──(/metrics)──→ Prometheus Server ──→ AlertManager → Slack/PD
      ↑                           │
  client libraries          ├──→ Grafana (dashboards)
  (prom client)             └──→ Thanos/Cortex (long-term storage)

### Kubernetes (Helm)

helm repo add prometheus-community \\
  https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \\
  --namespace monitoring --create-namespace \\
  --set prometheus.prometheusSpec.retention=30d \\
  --set prometheus.prometheusSpec.storageVolumeSize=50Gi

### prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: production
    region: us-west-2

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  # Self-monitoring
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

  # Node exporters
  - job_name: node-exporter
    static_configs:
      - targets: ["node1:9100", "node2:9100", "node3:9100"]
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: "([^:]+)(:[0-9]+)?"
        replacement: "${1}"

  # Application metrics (TLS)
  - job_name: my-app
    scheme: https
    metrics_path: /metrics
    tls_config:
      ca_file: /etc/prometheus/ca.crt
    static_configs:
      - targets: ["app1:9090", "app2:9090"]

### Kubernetes Pods (Annotation-Based)

scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels:
          [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\\d+)?;(\\d+)
        replacement: $1:$2
        target_label: __address__
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Pod annotations to enable scraping:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

### File-Based Discovery

scrape_configs:
  - job_name: file-sd
    file_sd_configs:
      - files: ["/etc/prometheus/targets/*.json"]
        refresh_interval: 5m

targets/production.json:

[{
  "targets": ["app1:9090", "app2:9090"],
  "labels": { "env": "production", "service": "api" }
}]

### Discovery Method Comparison

MethodBest ForDynamicstatic_configsFixed infrastructure, devNofile_sd_configsCM-managed inventoriesYes (file watch)kubernetes_sd_configsK8s workloadsYes (API watch)consul_sd_configsConsul service meshYes (Consul watch)ec2_sd_configsAWS EC2 instancesYes (API poll)

### Recording Rules

Pre-compute expensive queries for dashboard and alert performance:

# /etc/prometheus/rules/recording_rules.yml
groups:
  - name: api_metrics
    interval: 15s
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))

      - record: job:http_errors:rate5m
        expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))

      - record: job:http_error_rate:ratio
        expr: job:http_errors:rate5m / job:http_requests:rate5m

      - record: job:http_duration:p95
        expr: >
          histogram_quantile(0.95,
            sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
          )

  - name: resource_metrics
    interval: 30s
    rules:
      - record: instance:node_cpu:utilization
        expr: >
          100 - (avg by (instance)
            (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

      - record: instance:node_memory:utilization
        expr: >
          100 - ((node_memory_MemAvailable_bytes
            / node_memory_MemTotal_bytes) * 100)

      - record: instance:node_disk:utilization
        expr: >
          100 - ((node_filesystem_avail_bytes
            / node_filesystem_size_bytes) * 100)

### Naming Convention

level:metric_name:operations

PartExampleMeaningleveljob:, instance:Aggregation levelmetric_namehttp_requestsBase metricoperations:rate5m, :ratioApplied functions

### Alert Rules

# /etc/prometheus/rules/alert_rules.yml
groups:
  - name: availability
    rules:
      - alert: ServiceDown
        expr: up{job="my-app"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "{{ $labels.instance }} is down"
          description: "{{ $labels.job }} down for >1 minute"

      - alert: HighErrorRate
        expr: job:http_error_rate:ratio > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Error rate {{ $value | humanizePercentage }} for {{ $labels.job }}"

      - alert: HighP95Latency
        expr: job:http_duration:p95 > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency {{ $value }}s for {{ $labels.job }}"

  - name: resources
    rules:
      - alert: HighCPU
        expr: instance:node_cpu:utilization > 80
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "CPU {{ $value }}% on {{ $labels.instance }}"

      - alert: HighMemory
        expr: instance:node_memory:utilization > 85
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Memory {{ $value }}% on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: instance:node_disk:utilization > 90
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "Disk {{ $value }}% on {{ $labels.instance }}"

### Alert Severity Guide

SeverityThresholdResponsecriticalService down, data loss riskPage on-call immediatelywarningDegraded, approaching limitInvestigate within hoursinfoNotable but not urgentReview in next business day

### Validation

# Validate config syntax
promtool check config prometheus.yml

# Validate rule files
promtool check rules /etc/prometheus/rules/*.yml

# Test a query
promtool query instant http://localhost:9090 'up'

# Reload config without restart
curl -X POST http://localhost:9090/-/reload

### Best Practices

PracticeDetailNaming: prefix_name_unitSnake_case, _total for counters, _seconds/_bytes for unitsScrape intervals 15–60sShorter wastes resources and storageRecording rules for dashboardsPre-compute anything queried repeatedlyMonitor Prometheus itselfprometheus_tsdb_*, scrape_duration_secondsHA deployment2+ instances scraping same targetsRetention planningMatch --storage.tsdb.retention.time to disk capacityFederation for scaleGlobal Prometheus aggregates from regional instancesLong-term storageThanos or Cortex for >30d retention

### Troubleshooting Quick Reference

ProblemDiagnosisFixTarget shows DOWNCheck /targets page for errorFix firewall, verify endpoint, check TLSMetrics missingQuery up{job="x"}Verify scrape config, check /metrics endpointHigh cardinalityprometheus_tsdb_head_series growingDrop high-cardinality labels with metric_relabel_configsStorage filling upCheck prometheus_tsdb_storage_*Reduce retention, add disk, enable compactionSlow queriesCheck prometheus_engine_query_duration_secondsAdd recording rules, reduce range, limit seriesConfig not appliedCheck prometheus_config_last_reload_successfulFix syntax, POST /-/reload

### NEVER Do

Anti-PatternWhyDo InsteadScrape interval < 5sOverwhelms targets and storageUse 15–60s intervalsHigh-cardinality labels (user ID, request ID)Explodes TSDB series countUse logs for high-cardinality dataAlert without for durationFires on transient spikesAlways set for: 1m minimumSkip recording rulesDashboards compute expensive queries every loadPre-compute with recording rulesStore secrets in prometheus.ymlConfig often in GitUse file-based secrets or env substitutionIgnore up metricMiss targets silently going downAlert on up == 0 for all jobsSingle Prometheus instance in prodSingle point of failureRun 2+ replicas with shared targetsUnbounded retentionDisk fills, Prometheus crashesSet explicit --storage.tsdb.retention.time

### Templates

TemplateDescriptiontemplates/prometheus.ymlFull config with static, file-based, and K8s discoverytemplates/alert-rules.yml25+ alert rules by categorytemplates/recording-rules.ymlPre-computed metrics for HTTP, latency, resources, SLOs
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: wpank
- Version: 1.0.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-07T11:16:56.222Z
- Expires at: 2026-05-14T11:16:56.222Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/prometheus-devops)
- [Send to Agent page](https://openagent3.xyz/skills/prometheus-devops/agent)
- [JSON manifest](https://openagent3.xyz/skills/prometheus-devops/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/prometheus-devops/agent.md)
- [Download page](https://openagent3.xyz/downloads/prometheus-devops)