Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    MadAppGang

    data-extraction-patterns

    MadAppGang/data-extraction-patterns
    Data & Analytics
    219
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Common patterns for extracting analytics data from GA4 and GSC with API handling

    SKILL.md

    plugin: seo updated: 2026-01-20

    Data Extraction Patterns

    When to Use

    • Setting up analytics data pipelines
    • Combining data from multiple sources
    • Handling API rate limits and errors
    • Caching frequently accessed data
    • Building data collection workflows

    API Reference

    Google Analytics 4 (GA4)

    MCP Server: mcp-server-google-analytics

    Key Operations:

    get_report({
      propertyId: "properties/123456789",
      dateRange: { startDate: "30daysAgo", endDate: "today" },
      dimensions: ["pagePath", "date"],
      metrics: ["screenPageViews", "averageSessionDuration", "bounceRate"]
    })
    

    Useful Metrics:

    Metric Description Use Case
    screenPageViews Total page views Traffic volume
    sessions User sessions Visitor count
    averageSessionDuration Avg time in session Engagement
    bounceRate Single-page visits Content quality
    engagementRate Engaged sessions % True engagement
    scrolledUsers Users who scrolled Content consumption

    Useful Dimensions:

    Dimension Description
    pagePath URL path
    date Date (for trending)
    sessionSource Traffic source
    deviceCategory Desktop/mobile/tablet

    Google Search Console (GSC)

    MCP Server: mcp-server-gsc

    Key Operations:

    search_analytics({
      siteUrl: "https://example.com",
      startDate: "2025-11-27",
      endDate: "2025-12-27",
      dimensions: ["query", "page"],
      rowLimit: 1000
    })
    
    get_url_inspection({
      siteUrl: "https://example.com",
      inspectionUrl: "https://example.com/page"
    })
    

    Available Metrics:

    Metric Description Use Case
    clicks Total clicks from search Traffic from Google
    impressions Times shown in results Visibility
    ctr Click-through rate Snippet effectiveness
    position Average ranking SEO success

    Dimensions:

    Dimension Description
    query Search query
    page Landing page URL
    country User country
    device Desktop/mobile/tablet
    date Date (for trending)

    Parallel Execution Pattern

    Optimal Data Fetch (All Sources)

    ## Parallel Data Fetch Pattern
    
    When fetching from multiple sources, issue all requests in a SINGLE message
    for parallel execution:
    
    ┌─────────────────────────────────────────────────────────────────┐
    │  MESSAGE 1: Parallel Data Requests                              │
    ├─────────────────────────────────────────────────────────────────┤
    │                                                                  │
    │  [MCP Call 1]: google-analytics.get_report(...)                 │
    │  [MCP Call 2]: google-search-console.search_analytics(...)      │
    │                                                                  │
    │  → All execute simultaneously                                    │
    │  → Results return when all complete                              │
    │  → ~2x faster than sequential                                    │
    │                                                                  │
    └─────────────────────────────────────────────────────────────────┘
    

    Sequential (When Needed)

    Some operations require sequential execution:

    ## Sequential Pattern (Dependencies)
    
    When one request depends on another's result:
    
    ┌─────────────────────────────────────────────────────────────────┐
    │  MESSAGE 1: Get list of pages                                   │
    │  → Returns: ["/page1", "/page2", "/page3"]                      │
    ├─────────────────────────────────────────────────────────────────┤
    │  MESSAGE 2: Get details for each page                           │
    │  → Uses page list from Message 1                                │
    │  → Can parallelize within this message                          │
    └─────────────────────────────────────────────────────────────────┘
    

    Rate Limiting

    API Rate Limits

    API Limit Strategy
    GA4 10 QPS per property Batch dimensions
    GSC 1,200 requests/min Paginate large exports

    Retry Pattern

    #!/bin/bash
    # Retry with exponential backoff
    
    MAX_RETRIES=3
    RETRY_DELAY=5
    
    fetch_with_retry() {
        local url="$1"
        local attempt=1
    
        while [ $attempt -le $MAX_RETRIES ]; do
            response=$(curl -s -w "%{http_code}" -o /tmp/response.json "$url")
            http_code="${response: -3}"
    
            if [ "$http_code" = "200" ]; then
                cat /tmp/response.json
                return 0
            elif [ "$http_code" = "429" ]; then
                echo "Rate limited, waiting ${RETRY_DELAY}s..." >&2
                sleep $RETRY_DELAY
                RETRY_DELAY=$((RETRY_DELAY * 2))
            else
                echo "Error: HTTP $http_code" >&2
                return 1
            fi
    
            attempt=$((attempt + 1))
        done
    
        echo "Max retries exceeded" >&2
        return 1
    }
    

    Caching Pattern

    Session-Based Cache

    # Cache structure
    SESSION_PATH="/tmp/seo-performance-20251227-143000-example"
    CACHE_DIR="${SESSION_PATH}/cache"
    CACHE_TTL=3600  # 1 hour in seconds
    
    mkdir -p "$CACHE_DIR"
    
    # Cache key generation
    cache_key() {
        echo "$1" | md5sum | cut -d' ' -f1
    }
    
    # Check cache
    get_cached() {
        local key=$(cache_key "$1")
        local cache_file="${CACHE_DIR}/${key}.json"
    
        if [ -f "$cache_file" ]; then
            local age=$(($(date +%s) - $(stat -f%m "$cache_file" 2>/dev/null || stat -c%Y "$cache_file")))
            if [ $age -lt $CACHE_TTL ]; then
                cat "$cache_file"
                return 0
            fi
        fi
        return 1
    }
    
    # Save to cache
    save_cache() {
        local key=$(cache_key "$1")
        local cache_file="${CACHE_DIR}/${key}.json"
        cat > "$cache_file"
    }
    
    # Usage
    CACHE_KEY="ga4_${URL}_${DATE_RANGE}"
    if ! RESULT=$(get_cached "$CACHE_KEY"); then
        RESULT=$(fetch_from_api)
        echo "$RESULT" | save_cache "$CACHE_KEY"
    fi
    

    Date Range Standardization

    Common Date Ranges

    # Standard date range calculations
    TODAY=$(date +%Y-%m-%d)
    
    case "$RANGE" in
        "7d")
            START_DATE=$(date -v-7d +%Y-%m-%d 2>/dev/null || date -d "7 days ago" +%Y-%m-%d)
            ;;
        "30d")
            START_DATE=$(date -v-30d +%Y-%m-%d 2>/dev/null || date -d "30 days ago" +%Y-%m-%d)
            ;;
        "90d")
            START_DATE=$(date -v-90d +%Y-%m-%d 2>/dev/null || date -d "90 days ago" +%Y-%m-%d)
            ;;
        "mtd")
            START_DATE=$(date +%Y-%m-01)
            ;;
        "ytd")
            START_DATE=$(date +%Y-01-01)
            ;;
    esac
    
    END_DATE="$TODAY"
    

    API-Specific Formats

    API Format Example
    GA4 Relative or ISO "30daysAgo", "2025-12-01"
    GSC ISO 8601 "2025-12-01"

    Graceful Degradation

    Data Source Fallback

    ## Fallback Strategy
    
    When a data source is unavailable:
    
    ┌─────────────────────────────────────────────────────────────────┐
    │  PRIMARY SOURCE      │  FALLBACK           │  LAST RESORT       │
    ├──────────────────────┼─────────────────────┼────────────────────┤
    │  GA4 traffic data    │  GSC clicks         │  Estimate from GSC │
    │  GSC search perf     │  Manual SERP check  │  WebSearch SERP    │
    │  CWV (CrUX)          │  PageSpeed API      │  Lighthouse CLI    │
    └──────────────────────┴─────────────────────┴────────────────────┘
    

    Partial Data Output

    ## Analysis Report (Partial Data)
    
    ### Data Availability
    
    | Source | Status | Impact |
    |--------|--------|--------|
    | GA4 | NOT CONFIGURED | Missing engagement metrics |
    | GSC | AVAILABLE | Full search data |
    
    ### Analysis Notes
    
    This analysis is based on limited data sources:
    - Search performance metrics are complete (GSC)
    - Engagement metrics unavailable (no GA4)
    
    **Recommendation**: Configure GA4 for complete analysis.
    Run `/setup-analytics` to add Google Analytics.
    

    Unified Data Model

    Combined Output Structure

    {
      "metadata": {
        "url": "https://example.com/page",
        "fetchedAt": "2025-12-27T14:30:00Z",
        "dateRange": {
          "start": "2025-11-27",
          "end": "2025-12-27"
        }
      },
      "sources": {
        "ga4": {
          "available": true,
          "metrics": {
            "pageViews": 2450,
            "avgTimeOnPage": 222,
            "bounceRate": 38.2,
            "engagementRate": 64.5
          }
        },
        "gsc": {
          "available": true,
          "metrics": {
            "impressions": 15200,
            "clicks": 428,
            "ctr": 2.82,
            "avgPosition": 4.2
          },
          "topQueries": [
            {"query": "seo guide", "clicks": 156, "position": 4}
          ]
        }
      },
      "computed": {
        "healthScore": 72,
        "status": "GOOD"
      }
    }
    

    Error Handling

    Common Errors

    Error Cause Resolution
    401 Unauthorized Invalid/expired credentials Re-run /setup-analytics
    403 Forbidden Missing permissions Check API access in console
    429 Too Many Requests Rate limit Wait and retry with backoff
    404 Not Found Invalid property/site Verify IDs in configuration
    500 Server Error API issue Retry later, check status page

    Error Output Pattern

    ## Data Fetch Error
    
    **Source**: Google Analytics 4
    **Error**: 403 Forbidden
    **Message**: "User does not have sufficient permissions for this property"
    
    ### Troubleshooting Steps
    
    1. Verify Service Account email in GA4 Admin
    2. Ensure "Viewer" role is granted
    3. Check Analytics Data API is enabled
    4. Wait 5 minutes for permission propagation
    
    ### Workaround
    
    Proceeding with available data sources (GSC).
    GA4 engagement metrics will not be included in this analysis.
    
    Recommended Servers
    Nimble MCP Server
    Nimble MCP Server
    Vercel Grep
    Vercel Grep
    ScrapeGraph AI Integration Server
    ScrapeGraph AI Integration Server
    Repository
    madappgang/claude-code
    Files