Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    neversight

    troubleshooting-guide

    neversight/troubleshooting-guide
    Productivity
    2

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Эксперт по troubleshooting гайдам. Используй для создания диагностических процедур, FAQ и решения проблем.

    SKILL.md

    Troubleshooting Guide Creator

    Эксперт по созданию структурированных руководств по диагностике и устранению проблем.

    Core Principles

    Problem-Centric Structure

    troubleshooting_principles:
      - principle: "Start with clear problem statements and symptoms"
        reason: "Users need to quickly identify if guide applies to their issue"
    
      - principle: "Use If-Then logic flows for decision trees"
        reason: "Systematic elimination of possible causes"
    
      - principle: "Organize solutions by likelihood and impact"
        reason: "Try simple/common fixes first, escalate to complex"
    
      - principle: "Follow logical diagnostic sequence (simple to complex)"
        reason: "Minimize time to resolution"
    
      - principle: "Include verification steps after each fix"
        reason: "Confirm the issue is actually resolved"
    
      - principle: "Provide rollback instructions"
        reason: "Allow safe recovery if fix causes new issues"
    

    User Experience Focus

    • Пиши для целевого уровня аудитории
    • Используй консистентное форматирование
    • Указывай оценочное время для каждого шага
    • Включай скриншоты и примеры где возможно

    Standard Guide Template

    # Troubleshooting: [Problem Title]
    
    **Last Updated:** [Date]
    **Applies To:** [Product/Service/Version]
    **Difficulty:** Beginner | Intermediate | Advanced
    **Time Estimate:** X-Y minutes
    
    ---
    
    ## Problem Statement
    
    ### Symptoms
    Users experiencing this issue will observe:
    - [ ] Symptom 1 (observable behavior)
    - [ ] Symptom 2 (error message or code)
    - [ ] Symptom 3 (system state)
    
    ### Error Messages
    

    [Exact error message or code]

    
    ### Affected Components
    - Component A
    - Component B
    
    ### Impact
    - **Severity:** Critical | High | Medium | Low
    - **Affected Users:** All users | Specific group | Single user
    - **Business Impact:** [Description]
    
    ---
    
    ## Quick Checks (2-5 minutes)
    
    Before diving into detailed troubleshooting, verify these common causes:
    
    ### Check 1: [Most Common Cause]
    **Time:** 30 seconds
    
    ```bash
    # Command to verify
    [diagnostic command]
    

    Expected Output: [What you should see] If this fails: Continue to Check 2

    Check 2: [Second Most Common Cause]

    Time: 1 minute

    [Steps to verify]


    Diagnostic Steps

    Step 1: Gather Information

    Collect the following before proceeding:

    • Error logs from [location]
    • System configuration from [location]
    • User actions that triggered the issue
    # Commands to gather diagnostic info
    [command 1]
    [command 2]
    

    Step 2: Identify the Root Cause

    Use this decision tree to identify the cause:

    Start
      │
      ├─ Is [condition A] true?
      │    ├─ YES → Go to Solution A
      │    └─ NO → Continue
      │
      ├─ Is [condition B] true?
      │    ├─ YES → Go to Solution B
      │    └─ NO → Continue
      │
      └─ None of the above → Escalate to Support
    

    Solutions

    Solution A: [Fix Name]

    Difficulty: Easy Time: 5 minutes Risk: Low

    Prerequisites

    • [Prerequisite 1]
    • [Prerequisite 2]

    Steps

    1. Step 1 Title

      [command]
      

      Expected output: [description]

    2. Step 2 Title [Instructions]

    3. Step 3 Title [Instructions]

    Verify Fix

    [verification command]
    

    Success Indicator: [What to look for]

    Rollback (if needed)

    [rollback command]
    

    Solution B: [Fix Name]

    Difficulty: Medium Time: 15 minutes Risk: Medium

    [Same structure as Solution A]


    Prevention

    To prevent this issue from recurring:

    1. Monitoring: Set up alerts for [metric]
    2. Configuration: Ensure [setting] is properly configured
    3. Process: Follow [procedure] when making changes
    4. Training: Educate team on [best practice]

    Escalation

    If the above solutions don't resolve the issue:

    When to Escalate

    • Issue persists after trying all solutions
    • Data loss or security concern identified
    • Multiple users affected simultaneously

    Information to Provide

    • Time issue started
    • Steps already attempted
    • Diagnostic logs collected
    • Business impact assessment

    Contact

    • Support Team: [Contact info]
    • Escalation Path: [Who to contact]
    • SLA: [Expected response time]

    Related Resources

    • [Link to related guide]
    • [Link to documentation]
    • [Link to FAQ]

    Revision History

    Date Author Changes
    [Date] [Name] Initial version
    [Date] [Name] Added Solution C
    
    ---
    
    ## Diagnostic Patterns
    
    ### Layer-by-Layer Approach
    
    ```markdown
    ## Network Connectivity Troubleshooting
    
    ### Layer 1: Physical
    - [ ] Check cable connections
    - [ ] Verify link lights are active
    - [ ] Test with known-good cable
    
    ### Layer 2: Data Link
    - [ ] Verify MAC address is learned
    - [ ] Check for VLAN misconfigurations
    - [ ] Review spanning tree state
    
    ### Layer 3: Network
    - [ ] Verify IP configuration
    - [ ] Test ping to gateway
    - [ ] Check routing table
    
    ### Layer 4: Transport
    - [ ] Verify service is listening on correct port
    - [ ] Check firewall rules
    - [ ] Test with telnet/nc to port
    
    ### Layer 7: Application
    - [ ] Check application logs
    - [ ] Verify configuration files
    - [ ] Test with curl/wget
    

    Binary Elimination Method

    ## Identifying Faulty Component
    
    Use binary search to isolate the issue:
    
    ### Step 1: Test Midpoint
    Test the system at the midpoint of the data flow:
    

    [Client] → [Load Balancer] → [App Server] → [Database] ↑ Test here first

    
    **If working at midpoint:** Issue is between midpoint and client
    **If failing at midpoint:** Issue is between midpoint and database
    
    ### Step 2: Narrow Down
    Repeat the process, testing the midpoint of the remaining segment.
    
    ### Step 3: Isolate
    Continue until you've identified the specific failing component.
    

    Symptom-Based Decision Tree

    ## Application Not Responding
    

    ┌─ Can you reach the server at all? │ ├─ NO → Network/DNS Issue │ └─ Go to: Network Troubleshooting Guide │ └─ YES → Continue │ ├─ Does the service port respond? │ ├─ NO → Service Not Running │ └─ Go to: Service Restart Procedure │ └─ YES → Continue │ ├─ Are there errors in application logs? │ ├─ YES → Application Error │ └─ Go to: Log Analysis Guide │ └─ NO → Resource Exhaustion └─ Go to: Performance Troubleshooting

    
    

    Log Analysis Guide

    Common Log Locations

    linux_logs:
      system:
        - /var/log/syslog
        - /var/log/messages
        - journalctl -xe
    
      application:
        - /var/log/[app-name]/
        - ~/.pm2/logs/
        - docker logs [container]
    
      web_server:
        nginx:
          - /var/log/nginx/error.log
          - /var/log/nginx/access.log
        apache:
          - /var/log/apache2/error.log
          - /var/log/httpd/error_log
    
      database:
        postgresql:
          - /var/log/postgresql/
        mysql:
          - /var/log/mysql/error.log
    

    Log Analysis Commands

    # Find errors in last 100 lines
    tail -100 /var/log/app.log | grep -i error
    
    # Find errors with timestamp
    grep -i error /var/log/app.log | tail -50
    
    # Watch log in real-time
    tail -f /var/log/app.log | grep --line-buffered -i error
    
    # Count errors by type
    grep -i error /var/log/app.log | sort | uniq -c | sort -rn | head -20
    
    # Find entries around specific time
    awk '/2024-01-15 14:3[0-5]/' /var/log/app.log
    
    # Extract specific fields (JSON logs)
    cat /var/log/app.json | jq 'select(.level == "error") | {time, message}'
    
    # Search compressed logs
    zgrep -i error /var/log/app.log.*.gz
    

    Error Pattern Recognition

    ## Common Error Patterns
    
    ### Connection Errors
    

    Pattern: "Connection refused" | "ECONNREFUSED" | "Connection timed out" Cause: Service not running or firewall blocking Fix: Check service status, verify port, check firewall rules

    
    ### Memory Errors
    

    Pattern: "Out of memory" | "OOM" | "Cannot allocate memory" Cause: Process exhausting available RAM Fix: Increase memory, optimize application, add swap

    
    ### Disk Errors
    

    Pattern: "No space left on device" | "ENOSPC" | "Disk full" Cause: Filesystem at capacity Fix: Clean old files, increase disk, enable log rotation

    
    ### Permission Errors
    

    Pattern: "Permission denied" | "EACCES" | "Operation not permitted" Cause: Insufficient file/directory permissions Fix: Check ownership, verify permissions, check SELinux/AppArmor

    
    ### Database Errors
    

    Pattern: "Too many connections" | "Connection pool exhausted" Cause: Connection leak or undersized pool Fix: Close unused connections, increase pool size, fix leaks

    
    

    Specific Problem Templates

    API Not Responding

    # Troubleshooting: API Not Responding
    
    ## Quick Diagnosis Script
    
    ```bash
    #!/bin/bash
    # api-health-check.sh
    
    API_URL="${1:-http://localhost:8080}"
    TIMEOUT=5
    
    echo "=== API Health Check ==="
    echo "Target: $API_URL"
    echo
    
    # 1. DNS Resolution
    echo "1. DNS Resolution..."
    if host=$(dig +short $(echo $API_URL | sed 's|.*://||' | cut -d'/' -f1 | cut -d':' -f1) 2>/dev/null); then
        echo "   ✅ DNS resolves to: $host"
    else
        echo "   ❌ DNS resolution failed"
    fi
    
    # 2. Port Connectivity
    echo "2. Port Connectivity..."
    PORT=$(echo $API_URL | grep -oP ':\K[0-9]+' || echo "80")
    HOST=$(echo $API_URL | sed 's|.*://||' | cut -d'/' -f1 | cut -d':' -f1)
    if nc -z -w $TIMEOUT $HOST $PORT 2>/dev/null; then
        echo "   ✅ Port $PORT is open"
    else
        echo "   ❌ Port $PORT is not reachable"
    fi
    
    # 3. HTTP Response
    echo "3. HTTP Response..."
    HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout $TIMEOUT "$API_URL/health" 2>/dev/null)
    if [ "$HTTP_CODE" = "200" ]; then
        echo "   ✅ Health endpoint returns 200"
    elif [ -n "$HTTP_CODE" ] && [ "$HTTP_CODE" != "000" ]; then
        echo "   ⚠️  Health endpoint returns $HTTP_CODE"
    else
        echo "   ❌ No HTTP response"
    fi
    
    # 4. Response Time
    echo "4. Response Time..."
    RESPONSE_TIME=$(curl -s -o /dev/null -w "%{time_total}" --connect-timeout $TIMEOUT "$API_URL/health" 2>/dev/null)
    if (( $(echo "$RESPONSE_TIME < 1" | bc -l) )); then
        echo "   ✅ Response time: ${RESPONSE_TIME}s"
    else
        echo "   ⚠️  Slow response: ${RESPONSE_TIME}s"
    fi
    
    echo
    echo "=== Check Complete ==="
    

    Decision Tree

    API Not Responding
           │
           ├─ Can you ping the server?
           │   ├─ NO → Check network/DNS
           │   └─ YES ↓
           │
           ├─ Is the service running?
           │   ├─ NO → Start/restart service
           │   └─ YES ↓
           │
           ├─ Is the port listening?
           │   ├─ NO → Check service configuration
           │   └─ YES ↓
           │
           ├─ Does health check pass?
           │   ├─ NO → Check dependencies (DB, cache)
           │   └─ YES ↓
           │
           └─ Check application logs for errors
    
    
    ### Database Connection Issues
    
    ```markdown
    # Troubleshooting: Database Connection Failed
    
    ## Symptoms
    - Application shows "Connection refused" or "Connection timed out"
    - Error: "FATAL: too many connections for role"
    - Error: "FATAL: password authentication failed"
    
    ## Quick Checks
    
    ### 1. Verify Database is Running
    ```bash
    # PostgreSQL
    sudo systemctl status postgresql
    pg_isready -h localhost -p 5432
    
    # MySQL
    sudo systemctl status mysql
    mysqladmin -u root -p ping
    

    2. Test Connection

    # PostgreSQL
    psql -h localhost -U username -d database -c "SELECT 1"
    
    # MySQL
    mysql -h localhost -u username -p -e "SELECT 1"
    

    3. Check Connection Count

    -- PostgreSQL
    SELECT count(*) FROM pg_stat_activity;
    SELECT max_connections FROM pg_settings WHERE name = 'max_connections';
    
    -- MySQL
    SHOW STATUS LIKE 'Threads_connected';
    SHOW VARIABLES LIKE 'max_connections';
    

    Solutions

    Solution 1: Restart Connection Pool

    # If using PgBouncer
    sudo systemctl restart pgbouncer
    
    # Application restart
    sudo systemctl restart myapp
    

    Solution 2: Clear Idle Connections

    -- PostgreSQL: Kill idle connections older than 10 minutes
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'idle'
      AND state_change < NOW() - INTERVAL '10 minutes';
    

    Solution 3: Increase Max Connections

    -- PostgreSQL (requires restart)
    ALTER SYSTEM SET max_connections = 200;
    
    -- MySQL (can be done live)
    SET GLOBAL max_connections = 200;
    
    
    ---
    
    ## Quality Assurance Checklist
    
    ### Pre-Publication Review
    
    ```markdown
    ## Troubleshooting Guide Quality Checklist
    
    ### Accuracy
    - [ ] All commands tested and verified
    - [ ] Output examples are accurate
    - [ ] Links to resources are valid
    - [ ] Version numbers are current
    
    ### Completeness
    - [ ] All common causes covered
    - [ ] Rollback instructions provided
    - [ ] Escalation path defined
    - [ ] Prevention tips included
    
    ### Usability
    - [ ] Clear success/failure criteria
    - [ ] Time estimates accurate
    - [ ] Difficulty levels appropriate
    - [ ] Tested by someone unfamiliar with issue
    
    ### Formatting
    - [ ] Consistent heading structure
    - [ ] Code blocks properly formatted
    - [ ] Decision trees clear
    - [ ] Screenshots/diagrams where helpful
    
    ### Maintenance
    - [ ] Last updated date included
    - [ ] Revision history maintained
    - [ ] Owner/contact identified
    - [ ] Review schedule established
    

    Лучшие практики

    1. Начинай с симптомов — пользователь должен быстро понять, подходит ли гайд
    2. Простое решение первым — проверь очевидные причины до сложной диагностики
    3. Включай verification steps — как понять, что проблема решена
    4. Документируй rollback — возможность отката если fix не помог
    5. Указывай время — пользователь должен знать сколько займёт каждый шаг
    6. Тестируй на новичках — гайд должен работать для тех, кто не знает систему
    7. Обновляй регулярно — устаревший гайд хуже чем его отсутствие
    8. Включай escalation path — когда и к кому обращаться
    Recommended Servers
    Google search console
    Google search console
    Astro Docs
    Astro Docs
    Ref
    Ref
    Repository
    neversight/skills_feed