Smithery Logo
MCPsSkillsDocsPricing
Login
NewFlame, an assistant that learns and improves. Available onTelegramSlack
    jeremylongshore

    exa-load-scale

    jeremylongshore/exa-load-scale
    DevOps
    1,221

    About

    SKILL.md

    Install

    • Telegram
      Telegram
    • Slack
      Slack
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    • Download skill
    ├─
    ├─
    └─
    Smithery Logo

    Give agents more agency

    Resources

    DocumentationPrivacy PolicySystem Status

    Company

    PricingAboutBlog

    Connect

    © 2026 Smithery. All rights reserved.

    About

    Implement Exa load testing, auto-scaling, and capacity planning strategies. Use when running performance tests, configuring horizontal scaling, or planning capacity for Exa integrations. Trigger with...

    SKILL.md

    Exa Load & Scale

    Overview

    Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.

    Prerequisites

    • k6 load testing tool installed
    • Test environment Exa API key (separate from production)
    • Redis for result caching

    Capacity Reference

    Search Type Typical Latency Max Throughput (10 QPS)
    instant < 150ms 10 req/s (600/min)
    fast < 425ms 10 req/s (600/min)
    auto 300-1500ms 10 req/s (600/min)
    neural 500-2000ms 10 req/s (600/min)
    deep 2-5s 10 req/s (600/min)

    With caching (50% hit rate): Effective throughput doubles to 20 req/s equivalent.

    Instructions

    Step 1: k6 Load Test Against Your Wrapper

    // exa-load-test.js
    import http from "k6/http";
    import { check, sleep } from "k6";
    
    export const options = {
      stages: [
        { duration: "1m", target: 5 },    // Ramp up to 5 VUs
        { duration: "3m", target: 5 },    // Steady state
        { duration: "1m", target: 10 },   // Push toward rate limit
        { duration: "2m", target: 10 },   // Stress test
        { duration: "1m", target: 0 },    // Ramp down
      ],
      thresholds: {
        http_req_duration: ["p(95)<3000"],  // 3s P95 for neural search
        http_req_failed: ["rate<0.05"],     // < 5% error rate
      },
    };
    
    const queries = [
      "best practices for building RAG systems",
      "transformer architecture improvements 2025",
      "TypeScript 5.5 new features",
      "vector database comparison guide",
      "AI safety alignment research",
    ];
    
    export default function () {
      const query = queries[Math.floor(Math.random() * queries.length)];
    
      const response = http.post(
        `${__ENV.APP_URL}/api/search`,
        JSON.stringify({ query, numResults: 3 }),
        {
          headers: { "Content-Type": "application/json" },
          timeout: "10s",
        }
      );
    
      check(response, {
        "status 200": (r) => r.status === 200,
        "has results": (r) => JSON.parse(r.body).results?.length > 0,
        "latency < 3s": (r) => r.timings.duration < 3000,
      });
    
      sleep(0.5 + Math.random()); // 0.5-1.5s between requests
    }
    
    # Run load test
    k6 run --env APP_URL=http://localhost:3000 exa-load-test.js
    

    Step 2: Throughput Maximizer with Request Queue

    import Exa from "exa-js";
    import PQueue from "p-queue";
    
    const exa = new Exa(process.env.EXA_API_KEY);
    
    // Stay under 10 QPS rate limit
    const searchQueue = new PQueue({
      concurrency: 8,        // max concurrent requests
      interval: 1000,        // per second
      intervalCap: 10,       // Exa's QPS limit
    });
    
    async function highThroughputSearch(queries: string[]) {
      const results = [];
    
      for (const query of queries) {
        const promise = searchQueue.add(async () => {
          const result = await exa.searchAndContents(query, {
            type: "auto",
            numResults: 3,
            text: { maxCharacters: 500 },
          });
          return { query, results: result.results };
        });
        results.push(promise);
      }
    
      return Promise.all(results);
    }
    
    // Process 100 queries respecting rate limits
    const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`);
    console.time("batch");
    const results = await highThroughputSearch(queries);
    console.timeEnd("batch");
    // Expected: ~10-12 seconds (100 queries / 10 QPS)
    

    Step 3: Caching for Scale

    import { LRUCache } from "lru-cache";
    
    // Cache eliminates repeat queries entirely
    const cache = new LRUCache<string, any>({
      max: 10000,
      ttl: 3600 * 1000, // 1-hour TTL
    });
    
    async function scalableSearch(query: string, opts: any) {
      const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`;
      const cached = cache.get(key);
      if (cached) return cached;
    
      const result = await searchQueue.add(() =>
        exa.searchAndContents(query, opts)
      );
      cache.set(key, result);
      return result;
    }
    
    // With 50% cache hit rate:
    // 100 unique queries → 50 API calls → 5 seconds instead of 10
    

    Step 4: Capacity Planning Calculator

    interface CapacityEstimate {
      dailySearches: number;
      peakQPS: number;
      cacheHitRate: number;
      effectiveQPS: number;
      withinLimits: boolean;
      recommendation: string;
    }
    
    function estimateCapacity(
      dailySearches: number,
      peakMultiplier = 3,
      expectedCacheHitRate = 0.5
    ): CapacityEstimate {
      const avgQPS = dailySearches / (24 * 3600);
      const peakQPS = avgQPS * peakMultiplier;
      const effectiveQPS = peakQPS * (1 - expectedCacheHitRate);
      const withinLimits = effectiveQPS <= 10; // Default Exa limit
    
      let recommendation = "Within default limits";
      if (effectiveQPS > 10 && effectiveQPS <= 50) {
        recommendation = "Contact hello@exa.ai for Enterprise rate limits";
      } else if (effectiveQPS > 50) {
        recommendation = "Requires Enterprise plan + aggressive caching + request queue";
      }
    
      return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation };
    }
    
    // Example: 50,000 searches/day
    const estimate = estimateCapacity(50000);
    console.log(estimate);
    // { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }
    

    Benchmark Results Template

    ## Exa Performance Benchmark
    **Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z
    
    | Metric | Value |
    |--------|-------|
    | Total Requests | N |
    | Success Rate | X% |
    | Cache Hit Rate | X% |
    | P50 Latency | Xms |
    | P95 Latency | Xms |
    | Peak QPS (actual API calls) | X |
    | 429 Rate Limit Errors | N |
    

    Error Handling

    Issue Cause Solution
    429 errors in load test Exceeding 10 QPS Reduce concurrency, add cache
    Inconsistent latency Different search types Standardize on one type per test
    Timeout errors Deep search under load Use fast or auto for load tests
    Cache miss rate high Unique queries per request Use a fixed query pool

    Resources

    • Exa Rate Limits
    • k6 Documentation
    • p-queue

    Next Steps

    For reliability patterns, see exa-reliability-patterns.

    Recommended Servers
    GroundRoute: Web Search for AI Agents across 6 Engines ($10 free credit)
    GroundRoute: Web Search for AI Agents across 6 Engines ($10 free credit)
    DC Hub — Data Center & Energy Intelligence
    DC Hub — Data Center & Energy Intelligence
    PlanetScale
    PlanetScale
    Repository
    jeremylongshore/claude-code-plugins-plus-skills
    Files