Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    Microck

    prometheus-configuration

    Microck/prometheus-configuration
    DevOps
    116
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications...

    SKILL.md

    Prometheus Configuration

    Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.

    Purpose

    Configure Prometheus for comprehensive metric collection, alerting, and monitoring of infrastructure and applications.

    When to Use

    • Set up Prometheus monitoring
    • Configure metric scraping
    • Create recording rules
    • Design alert rules
    • Implement service discovery

    Prometheus Architecture

    ┌──────────────┐
    │ Applications │ ← Instrumented with client libraries
    └──────┬───────┘
           │ /metrics endpoint
           ↓
    ┌──────────────┐
    │  Prometheus  │ ← Scrapes metrics periodically
    │    Server    │
    └──────┬───────┘
           │
           ├─→ AlertManager (alerts)
           ├─→ Grafana (visualization)
           └─→ Long-term storage (Thanos/Cortex)
    

    Installation

    Kubernetes with Helm

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    
    helm install prometheus prometheus-community/kube-prometheus-stack \
      --namespace monitoring \
      --create-namespace \
      --set prometheus.prometheusSpec.retention=30d \
      --set prometheus.prometheusSpec.storageVolumeSize=50Gi
    

    Docker Compose

    version: '3.8'
    services:
      prometheus:
        image: prom/prometheus:latest
        ports:
          - "9090:9090"
        volumes:
          - ./prometheus.yml:/etc/prometheus/prometheus.yml
          - prometheus-data:/prometheus
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=30d'
    
    volumes:
      prometheus-data:
    

    Configuration File

    prometheus.yml:

    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      external_labels:
        cluster: 'production'
        region: 'us-west-2'
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
              - alertmanager:9093
    
    # Load rules files
    rule_files:
      - /etc/prometheus/rules/*.yml
    
    # Scrape configurations
    scrape_configs:
      # Prometheus itself
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
    
      # Node exporters
      - job_name: 'node-exporter'
        static_configs:
          - targets:
            - 'node1:9100'
            - 'node2:9100'
            - 'node3:9100'
        relabel_configs:
          - source_labels: [__address__]
            target_label: instance
            regex: '([^:]+)(:[0-9]+)?'
            replacement: '${1}'
    
      # Kubernetes pods with annotations
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod
    
      # Application metrics
      - job_name: 'my-app'
        static_configs:
          - targets:
            - 'app1.example.com:9090'
            - 'app2.example.com:9090'
        metrics_path: '/metrics'
        scheme: 'https'
        tls_config:
          ca_file: /etc/prometheus/ca.crt
          cert_file: /etc/prometheus/client.crt
          key_file: /etc/prometheus/client.key
    

    Reference: See assets/prometheus.yml.template

    Scrape Configurations

    Static Targets

    scrape_configs:
      - job_name: 'static-targets'
        static_configs:
          - targets: ['host1:9100', 'host2:9100']
            labels:
              env: 'production'
              region: 'us-west-2'
    

    File-based Service Discovery

    scrape_configs:
      - job_name: 'file-sd'
        file_sd_configs:
          - files:
            - /etc/prometheus/targets/*.json
            - /etc/prometheus/targets/*.yml
            refresh_interval: 5m
    

    targets/production.json:

    [
      {
        "targets": ["app1:9090", "app2:9090"],
        "labels": {
          "env": "production",
          "service": "api"
        }
      }
    ]
    

    Kubernetes Service Discovery

    scrape_configs:
      - job_name: 'kubernetes-services'
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
    

    Reference: See references/scrape-configs.md

    Recording Rules

    Create pre-computed metrics for frequently queried expressions:

    # /etc/prometheus/rules/recording_rules.yml
    groups:
      - name: api_metrics
        interval: 15s
        rules:
          # HTTP request rate per service
          - record: job:http_requests:rate5m
            expr: sum by (job) (rate(http_requests_total[5m]))
    
          # Error rate percentage
          - record: job:http_requests_errors:rate5m
            expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
    
          - record: job:http_requests_error_rate:percentage
            expr: |
              (job:http_requests_errors:rate5m / job:http_requests:rate5m) * 100
    
          # P95 latency
          - record: job:http_request_duration:p95
            expr: |
              histogram_quantile(0.95,
                sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
              )
    
      - name: resource_metrics
        interval: 30s
        rules:
          # CPU utilization percentage
          - record: instance:node_cpu:utilization
            expr: |
              100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
    
          # Memory utilization percentage
          - record: instance:node_memory:utilization
            expr: |
              100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)
    
          # Disk usage percentage
          - record: instance:node_disk:utilization
            expr: |
              100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
    

    Reference: See references/recording-rules.md

    Alert Rules

    # /etc/prometheus/rules/alert_rules.yml
    groups:
      - name: availability
        interval: 30s
        rules:
          - alert: ServiceDown
            expr: up{job="my-app"} == 0
            for: 1m
            labels:
              severity: critical
            annotations:
              summary: "Service {{ $labels.instance }} is down"
              description: "{{ $labels.job }} has been down for more than 1 minute"
    
          - alert: HighErrorRate
            expr: job:http_requests_error_rate:percentage > 5
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "High error rate for {{ $labels.job }}"
              description: "Error rate is {{ $value }}% (threshold: 5%)"
    
          - alert: HighLatency
            expr: job:http_request_duration:p95 > 1
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "High latency for {{ $labels.job }}"
              description: "P95 latency is {{ $value }}s (threshold: 1s)"
    
      - name: resources
        interval: 1m
        rules:
          - alert: HighCPUUsage
            expr: instance:node_cpu:utilization > 80
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "High CPU usage on {{ $labels.instance }}"
              description: "CPU usage is {{ $value }}%"
    
          - alert: HighMemoryUsage
            expr: instance:node_memory:utilization > 85
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "High memory usage on {{ $labels.instance }}"
              description: "Memory usage is {{ $value }}%"
    
          - alert: DiskSpaceLow
            expr: instance:node_disk:utilization > 90
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "Low disk space on {{ $labels.instance }}"
              description: "Disk usage is {{ $value }}%"
    

    Validation

    # Validate configuration
    promtool check config prometheus.yml
    
    # Validate rules
    promtool check rules /etc/prometheus/rules/*.yml
    
    # Test query
    promtool query instant http://localhost:9090 'up'
    

    Reference: See scripts/validate-prometheus.sh

    Best Practices

    1. Use consistent naming for metrics (prefix_name_unit)
    2. Set appropriate scrape intervals (15-60s typical)
    3. Use recording rules for expensive queries
    4. Implement high availability (multiple Prometheus instances)
    5. Configure retention based on storage capacity
    6. Use relabeling for metric cleanup
    7. Monitor Prometheus itself
    8. Implement federation for large deployments
    9. Use Thanos/Cortex for long-term storage
    10. Document custom metrics

    Troubleshooting

    Check scrape targets:

    curl http://localhost:9090/api/v1/targets
    

    Check configuration:

    curl http://localhost:9090/api/v1/status/config
    

    Test query:

    curl 'http://localhost:9090/api/v1/query?query=up'
    

    Reference Files

    • assets/prometheus.yml.template - Complete configuration template
    • references/scrape-configs.md - Scrape configuration patterns
    • references/recording-rules.md - Recording rule examples
    • scripts/validate-prometheus.sh - Validation script

    Related Skills

    • grafana-dashboards - For visualization
    • slo-implementation - For SLO monitoring
    • distributed-tracing - For request tracing
    Recommended Servers
    Vercel
    Vercel
    Cloudflare
    Cloudflare
    Better Stack
    Better Stack
    Repository
    microck/ordinary-claude-skills
    Files