Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    wshobson

    grafana-dashboards

    wshobson/grafana-dashboards
    DevOps
    28,185
    18 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Create and manage production Grafana dashboards for real-time visualization of system and application metrics...

    SKILL.md

    Grafana Dashboards

    Create and manage production-ready Grafana dashboards for comprehensive system observability.

    Purpose

    Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

    When to Use

    • Visualize Prometheus metrics
    • Create custom dashboards
    • Implement SLO dashboards
    • Monitor infrastructure
    • Track business KPIs

    Dashboard Design Principles

    1. Hierarchy of Information

    ┌─────────────────────────────────────┐
    │  Critical Metrics (Big Numbers)     │
    ├─────────────────────────────────────┤
    │  Key Trends (Time Series)           │
    ├─────────────────────────────────────┤
    │  Detailed Metrics (Tables/Heatmaps) │
    └─────────────────────────────────────┘
    

    2. RED Method (Services)

    • Rate - Requests per second
    • Errors - Error rate
    • Duration - Latency/response time

    3. USE Method (Resources)

    • Utilization - % time resource is busy
    • Saturation - Queue length/wait time
    • Errors - Error count

    Dashboard Structure

    API Monitoring Dashboard

    {
      "dashboard": {
        "title": "API Monitoring",
        "tags": ["api", "production"],
        "timezone": "browser",
        "refresh": "30s",
        "panels": [
          {
            "title": "Request Rate",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(http_requests_total[5m])) by (service)",
                "legendFormat": "{{service}}"
              }
            ],
            "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
          },
          {
            "title": "Error Rate %",
            "type": "graph",
            "targets": [
              {
                "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
                "legendFormat": "Error Rate"
              }
            ],
            "alert": {
              "conditions": [
                {
                  "evaluator": { "params": [5], "type": "gt" },
                  "operator": { "type": "and" },
                  "query": { "params": ["A", "5m", "now"] },
                  "type": "query"
                }
              ]
            },
            "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
          },
          {
            "title": "P95 Latency",
            "type": "graph",
            "targets": [
              {
                "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
                "legendFormat": "{{service}}"
              }
            ],
            "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
          }
        ]
      }
    }
    

    Reference: See assets/api-dashboard.json

    Panel Types

    1. Stat Panel (Single Value)

    {
      "type": "stat",
      "title": "Total Requests",
      "targets": [
        {
          "expr": "sum(http_requests_total)"
        }
      ],
      "options": {
        "reduceOptions": {
          "values": false,
          "calcs": ["lastNotNull"]
        },
        "orientation": "auto",
        "textMode": "auto",
        "colorMode": "value"
      },
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "value": 0, "color": "green" },
              { "value": 80, "color": "yellow" },
              { "value": 90, "color": "red" }
            ]
          }
        }
      }
    }
    

    2. Time Series Graph

    {
      "type": "graph",
      "title": "CPU Usage",
      "targets": [
        {
          "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
        }
      ],
      "yaxes": [
        { "format": "percent", "max": 100, "min": 0 },
        { "format": "short" }
      ]
    }
    

    3. Table Panel

    {
      "type": "table",
      "title": "Service Status",
      "targets": [
        {
          "expr": "up",
          "format": "table",
          "instant": true
        }
      ],
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": { "Time": true },
            "indexByName": {},
            "renameByName": {
              "instance": "Instance",
              "job": "Service",
              "Value": "Status"
            }
          }
        }
      ]
    }
    

    4. Heatmap

    {
      "type": "heatmap",
      "title": "Latency Heatmap",
      "targets": [
        {
          "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
          "format": "heatmap"
        }
      ],
      "dataFormat": "tsbuckets",
      "yAxis": {
        "format": "s"
      }
    }
    

    Variables

    Query Variables

    {
      "templating": {
        "list": [
          {
            "name": "namespace",
            "type": "query",
            "datasource": "Prometheus",
            "query": "label_values(kube_pod_info, namespace)",
            "refresh": 1,
            "multi": false
          },
          {
            "name": "service",
            "type": "query",
            "datasource": "Prometheus",
            "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
            "refresh": 1,
            "multi": true
          }
        ]
      }
    }
    

    Use Variables in Queries

    sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))
    

    Alerts in Dashboards

    {
      "alert": {
        "name": "High Error Rate",
        "conditions": [
          {
            "evaluator": {
              "params": [5],
              "type": "gt"
            },
            "operator": { "type": "and" },
            "query": {
              "params": ["A", "5m", "now"]
            },
            "reducer": { "type": "avg" },
            "type": "query"
          }
        ],
        "executionErrorState": "alerting",
        "for": "5m",
        "frequency": "1m",
        "message": "Error rate is above 5%",
        "noDataState": "no_data",
        "notifications": [{ "uid": "slack-channel" }]
      }
    }
    

    Dashboard Provisioning

    dashboards.yml:

    apiVersion: 1
    
    providers:
      - name: "default"
        orgId: 1
        folder: "General"
        type: file
        disableDeletion: false
        updateIntervalSeconds: 10
        allowUiUpdates: true
        options:
          path: /etc/grafana/dashboards
    

    Common Dashboard Patterns

    Infrastructure Dashboard

    Key Panels:

    • CPU utilization per node
    • Memory usage per node
    • Disk I/O
    • Network traffic
    • Pod count by namespace
    • Node status

    Reference: See assets/infrastructure-dashboard.json

    Database Dashboard

    Key Panels:

    • Queries per second
    • Connection pool usage
    • Query latency (P50, P95, P99)
    • Active connections
    • Database size
    • Replication lag
    • Slow queries

    Reference: See assets/database-dashboard.json

    Application Dashboard

    Key Panels:

    • Request rate
    • Error rate
    • Response time (percentiles)
    • Active users/sessions
    • Cache hit rate
    • Queue length

    Best Practices

    1. Start with templates (Grafana community dashboards)
    2. Use consistent naming for panels and variables
    3. Group related metrics in rows
    4. Set appropriate time ranges (default: Last 6 hours)
    5. Use variables for flexibility
    6. Add panel descriptions for context
    7. Configure units correctly
    8. Set meaningful thresholds for colors
    9. Use consistent colors across dashboards
    10. Test with different time ranges

    Dashboard as Code

    Terraform Provisioning

    resource "grafana_dashboard" "api_monitoring" {
      config_json = file("${path.module}/dashboards/api-monitoring.json")
      folder      = grafana_folder.monitoring.id
    }
    
    resource "grafana_folder" "monitoring" {
      title = "Production Monitoring"
    }
    

    Ansible Provisioning

    - name: Deploy Grafana dashboards
      copy:
        src: "{{ item }}"
        dest: /etc/grafana/dashboards/
      with_fileglob:
        - "dashboards/*.json"
      notify: restart grafana
    

    Related Skills

    • prometheus-configuration - For metric collection
    • slo-implementation - For SLO dashboards
    Recommended Servers
    Better Stack
    Better Stack
    Draw.io
    Draw.io
    Mixpanel
    Mixpanel
    Repository
    wshobson/agents
    Files