Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    melodic-software

    gemini-token-optimization

    melodic-software/gemini-token-optimization
    AI & ML
    19
    3 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.

    SKILL.md

    Gemini Token Optimization

    🚨 MANDATORY: Invoke gemini-cli-docs First

    STOP - Before providing ANY response about Gemini token usage:

    1. INVOKE gemini-cli-docs skill
    2. QUERY for the specific token or pricing topic
    3. BASE all responses EXCLUSIVELY on official documentation loaded

    Overview

    Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.

    When to Use This Skill

    Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens

    Use this skill when:

    • Planning bulk Gemini operations
    • Optimizing costs for large-scale analysis
    • Choosing between Flash and Pro models
    • Understanding token caching benefits
    • Tracking usage across sessions

    Token Caching

    Gemini CLI automatically caches context to reduce costs by reusing previously processed content.

    Availability

    Auth Method Caching Available
    API key (Gemini API) YES
    Vertex AI YES
    OAuth (personal/enterprise) NO

    How It Works

    • System instructions and repeated context are cached
    • Cached tokens don't count toward billing
    • View savings via /stats command or JSON output

    Maximizing Cache Hits

    1. Use consistent system prompts - Same prefix increases cache reuse
    2. Batch similar queries - Group related analysis together
    3. Reuse context files - Same files in same order

    Monitoring Cache Usage

    result=$(gemini "query" --output-format json)
    total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
    cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
    billable=$((total - cached))
    savings=$((cached * 100 / total))
    
    echo "Total: $total tokens"
    echo "Cached: $cached tokens ($savings% savings)"
    echo "Billable: $billable tokens"
    

    Model Selection

    Model Comparison

    Model Context Window Speed Cost Quality
    gemini-2.5-flash Large Fast Lower Good
    gemini-2.5-pro Very large Slower Higher Best

    Selection Criteria

    Use Flash (-m gemini-2.5-flash) when:

    • Processing large files (bulk analysis)
    • Simple extraction tasks
    • Cost is a primary concern
    • Speed is critical
    • Task is straightforward

    Use Pro (-m gemini-2.5-pro) when:

    • Complex reasoning required
    • Quality is critical
    • Nuanced analysis needed
    • Task requires deep understanding
    • Context exceeds 1M tokens

    Model Selection Examples

    # Bulk file analysis - use Flash
    for file in src/*.ts; do
      gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
    done
    
    # Security audit - use Pro for quality
    gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
    
    # Cost tracking with model info
    result=$(gemini "query" --output-format json)
    model=$(echo "$result" | jq -r '.stats.models | keys[0]')
    tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
    echo "Used $model: $tokens tokens"
    

    Batching Strategy

    Why Batch?

    • Reduces API overhead
    • Increases cache hit rate
    • Provides consistent context

    Batching Patterns

    Pattern 1: Concatenate Files

    # Instead of N separate calls
    # Do one call with all files
    cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
    

    Pattern 2: Batch Prompts

    # Combine related questions
    gemini "Answer these questions about the codebase:
    1. What is the main architecture pattern?
    2. How is authentication handled?
    3. What database is used?" --output-format json
    

    Pattern 3: Staged Analysis

    # First pass: Quick overview with Flash
    overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
    
    # Second pass: Deep dive critical areas with Pro
    echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
      gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
    done
    

    Cost Tracking

    Per-Query Tracking

    result=$(gemini "query" --output-format json)
    
    # Extract all cost-relevant stats
    total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
    cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
    models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
    tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
    latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
    
    echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
    

    Session Tracking

    # Track cumulative usage across a session
    total_session_tokens=0
    total_session_cached=0
    total_session_calls=0
    
    track_usage() {
      local result="$1"
      local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
      local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
    
      total_session_tokens=$((total_session_tokens + tokens))
      total_session_cached=$((total_session_cached + cached))
      total_session_calls=$((total_session_calls + 1))
    }
    
    # Use in workflow
    result=$(gemini "query 1" --output-format json)
    track_usage "$result"
    
    result=$(gemini "query 2" --output-format json)
    track_usage "$result"
    
    echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
    

    Optimization Checklist

    Before Large Operations

    • Choose appropriate model (Flash vs Pro)
    • Check if caching is available (API key or Vertex)
    • Plan batching strategy
    • Set up usage tracking

    During Operations

    • Monitor cache hit rates
    • Track per-query costs
    • Adjust model if quality insufficient
    • Batch similar queries

    After Operations

    • Review total usage
    • Calculate effective cost
    • Identify optimization opportunities
    • Document learnings

    Quick Reference

    Cost-Saving Commands

    # Use Flash for bulk
    gemini "query" -m gemini-2.5-flash --output-format json
    
    # Check cache effectiveness
    gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
    
    # Minimal output (fewer output tokens)
    gemini "Answer in one sentence: {question}" --output-format json
    

    Cost Estimation

    Rough token estimates:

    • 1 token ~ 4 characters (English)
    • 1 page of code ~ 500-1000 tokens
    • Typical source file ~ 200-2000 tokens

    Keyword Registry (Delegates to gemini-cli-docs)

    Topic Query Keywords
    Caching token caching, cached tokens, /stats
    Model selection model routing, flash vs pro, -m flag
    Costs quota pricing, token usage, billing
    Output control output format, json output

    Test Scenarios

    Scenario 1: Check Token Usage

    Query: "How do I see how many tokens Gemini used?" Expected Behavior:

    • Skill activates on "token usage" or "gemini cost"
    • Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts

    Scenario 2: Reduce Costs

    Query: "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior:

    • Skill activates on "cost optimization" or "reduce tokens"
    • Recommends Flash model and batching Success Criteria: User receives cost optimization strategies

    Scenario 3: Model Selection

    Query: "Should I use Flash or Pro for this task?" Expected Behavior:

    • Skill activates on "flash vs pro" or "model selection"
    • Provides decision criteria table Success Criteria: User receives model comparison and recommendation

    References

    Query gemini-cli-docs for official documentation on:

    • "token caching"
    • "model selection"
    • "quota and pricing"

    Version History

    • v1.1.0 (2025-12-01): Added Test Scenarios section
    • v1.0.0 (2025-11-25): Initial release
    Recommended Servers
    Gemini
    Gemini
    AurelianFlo
    AurelianFlo
    Blockscout MCP Server
    Blockscout MCP Server
    Repository
    melodic-software/claude-code-plugins
    Files