Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    DNYoussef

    when-profiling-performance-use-performance-profiler

    DNYoussef/when-profiling-performance-use-performance-profiler
    Data & Analytics
    2

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Comprehensive performance profiling, bottleneck detection, and optimization system

    SKILL.md

    Performance Profiler Skill

    Overview

    When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.

    MECE Breakdown

    Mutually Exclusive Components:

    1. Baseline Phase: Establish current performance metrics
    2. Detection Phase: Identify bottlenecks and hot paths
    3. Analysis Phase: Root cause analysis and impact assessment
    4. Optimization Phase: Generate and prioritize recommendations
    5. Implementation Phase: Apply optimizations with agent assistance
    6. Validation Phase: Benchmark improvements and verify gains

    Collectively Exhaustive Coverage:

    • CPU Profiling: Function execution time, hot paths, call graphs
    • Memory Profiling: Heap usage, allocations, leaks, garbage collection
    • I/O Profiling: File system, database, network latency
    • Network Profiling: Request timing, bandwidth, connection pooling
    • Concurrency: Thread utilization, lock contention, async operations
    • Algorithm Analysis: Time complexity, space complexity
    • Cache Analysis: Hit rates, cache misses, invalidation patterns
    • Database: Query performance, N+1 problems, index usage

    Features

    Core Capabilities:

    • Multi-dimensional performance profiling (CPU, memory, I/O, network)
    • Automated bottleneck detection with prioritization
    • Real-time profiling and historical analysis
    • Flame graph generation for visual analysis
    • Memory leak detection and heap snapshots
    • Database query optimization
    • Algorithmic complexity analysis
    • A/B comparison of before/after optimizations
    • Production-safe profiling with minimal overhead
    • Integration with APM tools (New Relic, DataDog, etc.)

    Profiling Modes:

    • Quick Scan: 30-second lightweight profiling
    • Standard: 5-minute comprehensive analysis
    • Deep: 30-minute detailed investigation
    • Continuous: Long-running production monitoring
    • Stress Test: Load-based profiling under high traffic

    Usage

    Slash Command:

    /profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]
    

    Subagent Invocation:

    Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")
    

    MCP Tool:

    mcp__performance-profiler__analyze({
      project_path: "./app",
      profiling_mode: "standard",
      targets: ["cpu", "memory", "io"],
      generate_optimizations: true
    })
    

    Architecture

    Phase 1: Baseline Measurement

    1. Establish current performance metrics
    2. Define performance budgets
    3. Set up monitoring infrastructure
    4. Capture baseline snapshots

    Phase 2: Bottleneck Detection

    1. CPU profiling (sampling or instrumentation)
    2. Memory profiling (heap analysis)
    3. I/O profiling (syscall tracing)
    4. Network profiling (packet analysis)
    5. Database profiling (query logs)

    Phase 3: Root Cause Analysis

    1. Correlate metrics across dimensions
    2. Identify causal relationships
    3. Calculate performance impact
    4. Prioritize issues by severity

    Phase 4: Optimization Generation

    1. Algorithmic improvements
    2. Caching strategies
    3. Parallelization opportunities
    4. Database query optimization
    5. Memory optimization
    6. Network optimization

    Phase 5: Implementation

    1. Generate optimized code with coder agent
    2. Apply database optimizations
    3. Configure caching layers
    4. Implement parallelization

    Phase 6: Validation

    1. Run benchmark suite
    2. Compare before/after metrics
    3. Verify no regressions
    4. Generate performance report

    Output Formats

    Performance Report:

    {
      "project": "my-app",
      "profiling_mode": "standard",
      "duration_seconds": 300,
      "baseline": {
        "requests_per_second": 1247,
        "avg_response_time_ms": 123,
        "p95_response_time_ms": 456,
        "p99_response_time_ms": 789,
        "cpu_usage_percent": 67,
        "memory_usage_mb": 512,
        "error_rate_percent": 0.1
      },
      "bottlenecks": [
        {
          "type": "cpu",
          "severity": "high",
          "function": "processData",
          "time_percent": 34.5,
          "calls": 123456,
          "avg_time_ms": 2.3,
          "recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
        }
      ],
      "optimizations": [...],
      "estimated_improvement": {
        "throughput_increase": "3.2x",
        "latency_reduction": "68%",
        "memory_reduction": "45%"
      }
    }
    

    Flame Graph:

    Interactive SVG flame graph showing call stack with time proportions

    Heap Snapshot:

    Memory allocation breakdown with retention paths

    Optimization Report:

    Prioritized list of actionable improvements with code examples

    Examples

    Example 1: Quick CPU Profiling

    /profile ./my-app --mode quick --target cpu
    

    Example 2: Deep Memory Analysis

    /profile ./my-app --mode deep --target memory --detect-leaks
    

    Example 3: Full Stack Optimization

    /profile ./my-app --mode standard --target all --optimize --benchmark
    

    Example 4: Database Query Optimization

    /profile ./my-app --mode standard --target io --database --explain-queries
    

    Integration with Claude-Flow

    Coordination Pattern:

    // Step 1: Initialize profiling swarm
    mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })
    
    // Step 2: Spawn specialized agents
    [Parallel Execution]:
      Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
      Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
      Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
      Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
      Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")
    
    // Step 3: Implementation agent applies optimizations
    [Sequential Execution]:
      Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
      Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")
    

    Configuration

    Default Settings:

    {
      "profiling": {
        "sampling_rate_hz": 99,
        "stack_depth": 128,
        "include_native_code": false,
        "track_allocations": true
      },
      "thresholds": {
        "cpu_hot_path_percent": 10,
        "memory_leak_growth_mb": 10,
        "slow_query_ms": 100,
        "slow_request_ms": 1000
      },
      "optimization": {
        "auto_apply": false,
        "require_approval": true,
        "run_tests_before": true,
        "run_benchmarks_after": true
      },
      "output": {
        "flame_graph": true,
        "heap_snapshot": true,
        "call_tree": true,
        "recommendations": true
      }
    }
    

    Profiling Techniques

    CPU Profiling:

    • Sampling: Periodic stack sampling (low overhead)
    • Instrumentation: Function entry/exit hooks (accurate but higher overhead)
    • Tracing: Event-based profiling

    Memory Profiling:

    • Heap Snapshots: Point-in-time memory state
    • Allocation Tracking: Record all allocations
    • Leak Detection: Compare snapshots over time
    • GC Analysis: Garbage collection patterns

    I/O Profiling:

    • Syscall Tracing: Track system calls (strace, dtrace)
    • File System: Monitor read/write operations
    • Database: Query logging and EXPLAIN ANALYZE
    • Network: Packet capture and request timing

    Concurrency Profiling:

    • Thread Analysis: CPU utilization per thread
    • Lock Contention: Identify blocking operations
    • Async Operations: Promise/callback timing

    Performance Optimization Strategies

    Algorithmic:

    • Reduce time complexity (O(n²) → O(n log n))
    • Use appropriate data structures
    • Eliminate unnecessary work
    • Memoization and dynamic programming

    Caching:

    • In-memory caching (Redis, Memcached)
    • CDN for static assets
    • HTTP caching headers
    • Query result caching

    Parallelization:

    • Multi-threading
    • Worker pools
    • Async I/O
    • Batching operations

    Database:

    • Add missing indexes
    • Optimize queries
    • Reduce N+1 queries
    • Connection pooling
    • Read replicas

    Memory:

    • Object pooling
    • Reduce allocations
    • Stream processing
    • Compression

    Network:

    • Connection keep-alive
    • HTTP/2 or HTTP/3
    • Compression
    • Request batching
    • Rate limiting

    Performance Budgets

    Frontend:

    • Time to First Byte (TTFB): < 200ms
    • First Contentful Paint (FCP): < 1.8s
    • Largest Contentful Paint (LCP): < 2.5s
    • Time to Interactive (TTI): < 3.8s
    • Total Blocking Time (TBT): < 200ms
    • Cumulative Layout Shift (CLS): < 0.1

    Backend:

    • API Response Time (p50): < 100ms
    • API Response Time (p95): < 500ms
    • API Response Time (p99): < 1000ms
    • Throughput: > 1000 req/s
    • Error Rate: < 0.1%
    • CPU Usage: < 70%
    • Memory Usage: < 80%

    Database:

    • Query Time (p50): < 10ms
    • Query Time (p95): < 50ms
    • Query Time (p99): < 100ms
    • Connection Pool Utilization: < 80%

    Best Practices

    1. Profile production workloads when possible
    2. Use production-like data volumes
    3. Profile under realistic load
    4. Measure multiple times for consistency
    5. Focus on p95/p99, not just averages
    6. Optimize bottlenecks in order of impact
    7. Always benchmark before and after
    8. Monitor for regressions in CI/CD
    9. Set up continuous profiling
    10. Track performance over time

    Troubleshooting

    Issue: High CPU usage but no obvious hot path

    Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation

    Issue: Memory grows continuously

    Solution: Run heap snapshot comparison to identify leak sources

    Issue: Slow database queries

    Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans

    Issue: High latency but low CPU

    Solution: Profile I/O operations, check for blocking synchronous calls

    See Also

    • PROCESS.md - Detailed step-by-step profiling workflow
    • README.md - Quick start guide
    • subagent-performance-profiler.md - Agent implementation details
    • slash-command-profile.sh - Command-line interface
    • mcp-performance-profiler.json - MCP tool schema
    Recommended Servers
    Vercel
    Vercel
    Browserbase
    Browserbase
    Parallel Web Search
    Parallel Web Search
    Repository
    dnyoussef/ai-chrome-extension
    Files