Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    secondsky

    google-gemini-embeddings

    secondsky/google-gemini-embeddings
    AI & ML
    38

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Google Gemini embeddings API (gemini-embedding-001) for RAG and semantic search. Use for vector search, Vectorize integration, or encountering dimension mismatches, rate limits, text truncation.

    SKILL.md


    name: google-gemini-embeddings description: Google Gemini embeddings API (gemini-embedding-001) for RAG and semantic search. Use for vector search, Vectorize integration, or encountering dimension mismatches, rate limits, text truncation.

    Keywords: gemini embeddings, gemini-embedding-001, google embeddings, semantic search, RAG, vector search, document clustering, similarity search, retrieval augmented generation, vectorize integration, cloudflare vectorize embeddings, 768 dimensions, embed content gemini, batch embeddings, embeddings api, cosine similarity, vector normalization, retrieval query, retrieval document, task types, dimension mismatch, embeddings rate limit, text truncation, @google/genai license: MIT metadata: version: 1.0.0 last_updated: 2025-11-21 tested_package_version: "@google/genai@1.27.0" target_audience: "Developers building RAG, semantic search, or vector-based applications" complexity: intermediate estimated_reading_time: "8 minutes" tokens_saved: "~60%" errors_prevented: 8 production_tested: true

    Google Gemini Embeddings

    Complete production-ready guide for Google Gemini embeddings API

    This skill provides comprehensive coverage of the gemini-embedding-001 model for generating text embeddings, including SDK usage, REST API patterns, batch processing, RAG integration with Cloudflare Vectorize, and advanced use cases like semantic search and document clustering.


    Table of Contents

    1. Quick Start
    2. gemini-embedding-001 Model
    3. Basic Embeddings
    4. Batch Embeddings
    5. Task Types
    6. Top 5 Errors
    7. Best Practices
    8. When to Load References

    1. Quick Start

    Installation

    Install the Google Generative AI SDK:

    bun add @google/genai@^1.27.0
    

    For TypeScript projects:

    bun add -d typescript@^5.0.0
    

    Environment Setup

    Set your Gemini API key as an environment variable:

    export GEMINI_API_KEY="your-api-key-here"
    

    Get your API key from: https://aistudio.google.com/apikey

    First Embedding Example

    import { GoogleGenAI } from "@google/genai";
    
    const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
    
    const response = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: 'What is the meaning of life?',
      config: {
        taskType: 'RETRIEVAL_QUERY',
        outputDimensionality: 768
      }
    });
    
    console.log(response.embedding.values); // [0.012, -0.034, ...]
    console.log(response.embedding.values.length); // 768
    

    Result: A 768-dimension embedding vector representing the semantic meaning of the text.


    2. gemini-embedding-001 Model

    Model Specifications

    Current Model: gemini-embedding-001 (stable, production-ready)

    • Status: Stable
    • Experimental: gemini-embedding-exp-03-07 (deprecated October 2025, do not use)

    Dimensions

    The model supports flexible output dimensionality using Matryoshka Representation Learning:

    Dimension Use Case Storage Performance
    768 Recommended for most use cases Low Fast
    1536 Balance between accuracy and efficiency Medium Medium
    3072 Maximum accuracy (default) High Slower

    Default: 3072 dimensions Recommended: 768 dimensions for most RAG applications

    Load references/dimension-guide.md when you need detailed comparisons of storage costs, accuracy trade-offs, or migration strategies between dimensions.

    Load references/model-comparison.md when comparing Gemini embeddings with OpenAI (text-embedding-3-small/large) or Cloudflare Workers AI (BGE).

    Rate Limits

    Tier RPM TPM RPD
    Free 100 30,000 1,000
    Tier 1 3,000 1,000,000 -

    RPM = Requests Per Minute, TPM = Tokens Per Minute, RPD = Requests Per Day

    Context Window

    • Input Limit: 2,048 tokens per text
    • Input Type: Text only (no images, audio, or video)

    3. Basic Embeddings

    SDK Approach (Node.js)

    Single text embedding:

    import { GoogleGenAI } from "@google/genai";
    
    const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
    
    const response = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: 'The quick brown fox jumps over the lazy dog',
      config: {
        taskType: 'SEMANTIC_SIMILARITY',
        outputDimensionality: 768
      }
    });
    
    console.log(response.embedding.values);
    // [0.00388, -0.00762, 0.01543, ...]
    

    Fetch Approach (Cloudflare Workers)

    For Workers/edge environments without SDK support:

    export default {
      async fetch(request: Request, env: Env): Promise<Response> {
        const apiKey = env.GEMINI_API_KEY;
        const text = "What is the meaning of life?";
    
        const response = await fetch(
          'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
          {
            method: 'POST',
            headers: {
              'x-goog-api-key': apiKey,
              'Content-Type': 'application/json'
            },
            body: JSON.stringify({
              content: {
                parts: [{ text }]
              },
              taskType: 'RETRIEVAL_QUERY',
              outputDimensionality: 768
            })
          }
        );
    
        const data = await response.json();
    
        // Response format:
        // {
        //   embedding: {
        //     values: [0.012, -0.034, ...]
        //   }
        // }
    
        return new Response(JSON.stringify(data), {
          headers: { 'Content-Type': 'application/json' }
        });
      }
    };
    

    Response Parsing

    interface EmbeddingResponse {
      embedding: {
        values: number[];
      };
    }
    
    const response: EmbeddingResponse = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: 'Sample text',
      config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
    });
    
    const embedding: number[] = response.embedding.values;
    const dimensions: number = embedding.length; // 768
    

    4. Batch Embeddings

    Multiple Texts in One Request (SDK)

    Generate embeddings for multiple texts simultaneously:

    import { GoogleGenAI } from "@google/genai";
    
    const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
    
    const texts = [
      "What is the meaning of life?",
      "How does photosynthesis work?",
      "Tell me about the history of the internet."
    ];
    
    const response = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      contents: texts, // Array of strings
      config: {
        taskType: 'RETRIEVAL_DOCUMENT',
        outputDimensionality: 768
      }
    });
    
    // Process each embedding
    response.embeddings.forEach((embedding, index) => {
      console.log(`Text ${index}: ${texts[index]}`);
      console.log(`Embedding: ${embedding.values.slice(0, 5)}...`);
      console.log(`Dimensions: ${embedding.values.length}`);
    });
    

    Chunking for Rate Limits

    When processing large datasets, chunk requests to stay within rate limits:

    async function batchEmbedWithRateLimit(
      texts: string[],
      batchSize: number = 100, // Free tier: 100 RPM
      delayMs: number = 60000 // 1 minute delay between batches
    ): Promise<number[][]> {
      const allEmbeddings: number[][] = [];
    
      for (let i = 0; i < texts.length; i += batchSize) {
        const batch = texts.slice(i, i + batchSize);
    
        console.log(`Processing batch ${i / batchSize + 1} (${batch.length} texts)`);
    
        const response = await ai.models.embedContent({
          model: 'gemini-embedding-001',
          contents: batch,
          config: {
            taskType: 'RETRIEVAL_DOCUMENT',
            outputDimensionality: 768
          }
        });
    
        allEmbeddings.push(...response.embeddings.map(e => e.values));
    
        // Wait before next batch (except last batch)
        if (i + batchSize < texts.length) {
          await new Promise(resolve => setTimeout(resolve, delayMs));
        }
      }
    
      return allEmbeddings;
    }
    
    // Usage
    const embeddings = await batchEmbedWithRateLimit(documents, 100);
    

    5. Task Types

    The taskType parameter optimizes embeddings for specific use cases. Always specify a task type for best results.

    Available Task Types (8 total)

    Task Type Use Case Example
    RETRIEVAL_QUERY User search queries "How do I fix a flat tire?"
    RETRIEVAL_DOCUMENT Documents to be indexed/searched Product descriptions, articles
    SEMANTIC_SIMILARITY Comparing text similarity Duplicate detection, clustering
    CLASSIFICATION Categorizing texts Spam detection, sentiment analysis
    CLUSTERING Grouping similar texts Topic modeling, content organization
    CODE_RETRIEVAL_QUERY Code search queries "function to sort array"
    QUESTION_ANSWERING Questions seeking answers FAQ matching
    FACT_VERIFICATION Verifying claims with evidence Fact-checking systems

    RAG Systems (Most Common)

    // When embedding user queries
    const queryEmbedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: userQuery,
      config: {
        taskType: 'RETRIEVAL_QUERY', // ← Use RETRIEVAL_QUERY
        outputDimensionality: 768
      }
    });
    
    // When embedding documents for indexing
    const docEmbedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: documentText,
      config: {
        taskType: 'RETRIEVAL_DOCUMENT', // ← Use RETRIEVAL_DOCUMENT
        outputDimensionality: 768
      }
    });
    

    Impact: Using correct task type improves search relevance by 10-30%.


    6. Top 5 Errors

    Error 1: Dimension Mismatch

    Error: Vector dimensions do not match. Expected 768, got 3072

    Cause: Not specifying outputDimensionality parameter (defaults to 3072).

    Fix:

    // ❌ BAD: No outputDimensionality (defaults to 3072)
    const embedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: text
    });
    
    // ✅ GOOD: Match Vectorize index dimensions
    const embedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: text,
      config: { outputDimensionality: 768 } // ← Match your index
    });
    

    Error 2: Rate Limiting (429 Too Many Requests)

    Error: 429 Too Many Requests - Rate limit exceeded

    Cause: Exceeded 100 requests per minute (free tier).

    Fix:

    // ✅ GOOD: Exponential backoff
    async function embedWithRetry(text: string, maxRetries = 3) {
      for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
          return await ai.models.embedContent({
            model: 'gemini-embedding-001',
            content: text,
            config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
          });
        } catch (error: any) {
          if (error.status === 429 && attempt < maxRetries - 1) {
            const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
            await new Promise(resolve => setTimeout(resolve, delay));
            continue;
          }
          throw error;
        }
      }
    }
    

    Error 3: Text Truncation (Silent)

    Error: No error! Text is silently truncated at 2,048 tokens.

    Cause: Input text exceeds 2,048 token limit.

    Fix: Chunk long texts before embedding:

    function chunkText(text: string, maxTokens = 2000): string[] {
      const words = text.split(/\s+/);
      const chunks: string[] = [];
      let currentChunk: string[] = [];
    
      for (const word of words) {
        currentChunk.push(word);
    
        // Rough estimate: 1 token ≈ 0.75 words
        if (currentChunk.length * 0.75 >= maxTokens) {
          chunks.push(currentChunk.join(' '));
          currentChunk = [];
        }
      }
    
      if (currentChunk.length > 0) {
        chunks.push(currentChunk.join(' '));
      }
    
      return chunks;
    }
    

    Error 4: Incorrect Task Type

    Error: No error, but search quality is poor (10-30% worse).

    Cause: Using wrong task type (e.g., RETRIEVAL_DOCUMENT for queries).

    Fix:

    // ❌ BAD: Wrong task type for RAG query
    const queryEmbedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: userQuery,
      config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong!
    });
    
    // ✅ GOOD: Correct task types
    const queryEmbedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: userQuery,
      config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
    });
    

    Error 5: Cosine Similarity Calculation Errors

    Error: Similarity values out of range (-1.5 to 1.2)

    Cause: Using dot product instead of proper cosine similarity formula.

    Fix:

    // ✅ GOOD: Proper cosine similarity
    function cosineSimilarity(a: number[], b: number[]): number {
      if (a.length !== b.length) {
        throw new Error('Vector dimensions must match');
      }
    
      let dotProduct = 0;
      let magnitudeA = 0;
      let magnitudeB = 0;
    
      for (let i = 0; i < a.length; i++) {
        dotProduct += a[i] * b[i];
        magnitudeA += a[i] * a[i];
        magnitudeB += b[i] * b[i];
      }
    
      if (magnitudeA === 0 || magnitudeB === 0) {
        return 0; // Handle zero vectors
      }
    
      return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
    }
    

    Load references/top-errors.md for all 8 errors with detailed solutions, including batch size limits, vector storage precision loss, and model version confusion.


    7. Best Practices

    Always Do

    ✅ Specify Task Type

    const embedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: text,
      config: { taskType: 'RETRIEVAL_QUERY' } // ← Always specify
    });
    

    ✅ Match Dimensions with Vectorize

    const embedding = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      content: text,
      config: { outputDimensionality: 768 } // ← Match index
    });
    

    ✅ Implement Rate Limiting

    // Use exponential backoff for 429 errors (see Error 2)
    

    ✅ Cache Embeddings

    const cache = new Map<string, number[]>();
    
    async function getCachedEmbedding(text: string): Promise<number[]> {
      if (cache.has(text)) {
        return cache.get(text)!;
      }
    
      const response = await ai.models.embedContent({
        model: 'gemini-embedding-001',
        content: text,
        config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
      });
    
      const embedding = response.embedding.values;
      cache.set(text, embedding);
      return embedding;
    }
    

    ✅ Use Batch API for Multiple Texts

    // Single batch request vs multiple individual requests
    const embeddings = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      contents: texts, // Array of texts
      config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
    });
    

    Never Do

    ❌ Don't Skip Task Type - Reduces quality by 10-30% ❌ Don't Mix Different Dimensions - Can't compare embeddings ❌ Don't Use Wrong Task Type for RAG - Reduces search quality ❌ Don't Exceed 2,048 Tokens - Text will be silently truncated ❌ Don't Ignore Rate Limits - Will hit 429 errors


    8. When to Load References

    Load references/rag-patterns.md when:

    • Building a RAG (Retrieval Augmented Generation) system
    • Need document ingestion pipeline with chunking strategies
    • Implementing semantic search with cosine similarity
    • Building conversational RAG with history
    • Need citation RAG or multi-query RAG patterns
    • Want complete examples of filtered RAG, streaming RAG, or hybrid search
    • Need document clustering with K-means implementation

    Load references/vectorize-integration.md when:

    • Setting up Cloudflare Vectorize index for embeddings
    • Need complete RAG example with Vectorize insert/query patterns
    • Configuring dimension/metric settings for Vectorize
    • Implementing metadata best practices
    • Troubleshooting dimension mismatch errors with Vectorize
    • Need index management commands (create/delete/list)

    Load references/dimension-guide.md when:

    • Deciding between 768, 1536, or 3072 dimensions
    • Need storage cost analysis (100k vs 1M vectors)
    • Understanding accuracy trade-offs (MTEB benchmarks)
    • Migrating between different dimensions
    • Want query performance comparisons
    • Testing methodology for optimal dimension selection

    Load references/model-comparison.md when:

    • Comparing Gemini vs OpenAI (text-embedding-3-small/large)
    • Comparing Gemini vs Cloudflare Workers AI (BGE)
    • Need MTEB benchmark scores
    • Deciding which embedding model to use
    • Migrating from OpenAI to Gemini
    • Understanding cost differences between providers

    Load references/top-errors.md when:

    • Encountering any of the 8 documented errors
    • Need detailed root cause analysis
    • Want production-tested solutions with code examples
    • Building error handling for production systems
    • Need verification checklist before deployment

    Using Bundled Resources

    Templates (templates/)

    • package.json - Package configuration with verified versions
    • basic-embeddings.ts - Single text embedding with SDK
    • embeddings-fetch.ts - Fetch-based for Cloudflare Workers
    • batch-embeddings.ts - Batch processing with rate limiting
    • rag-with-vectorize.ts - Complete RAG implementation with Vectorize
    • semantic-search.ts - Cosine similarity and top-K search
    • clustering.ts - K-means clustering implementation

    References (references/)

    • model-comparison.md - Compare Gemini vs OpenAI vs Workers AI embeddings
    • vectorize-integration.md - Cloudflare Vectorize setup and patterns
    • rag-patterns.md - Complete RAG implementation strategies
    • dimension-guide.md - Choosing the right dimensions (768 vs 1536 vs 3072)
    • top-errors.md - 8 common errors and detailed solutions

    Scripts (scripts/)

    • check-versions.sh - Verify @google/genai package version is current

    Official Documentation

    • Embeddings Guide: https://ai.google.dev/gemini-api/docs/embeddings
    • Model Spec: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
    • Rate Limits: https://ai.google.dev/gemini-api/docs/rate-limits
    • SDK Reference: https://www.npmjs.com/package/@google/genai
    • Context7 Library ID: /websites/ai_google_dev_gemini-api

    Related Skills

    • google-gemini-api - Main Gemini API for text/image generation
    • cloudflare-vectorize - Vector database for storing embeddings
    • cloudflare-workers-ai - Workers AI embeddings (BGE models)

    Success Metrics

    Token Savings: ~60% compared to manual implementation Errors Prevented: 8 documented errors with solutions Production Tested: ✅ Verified in RAG applications Package Version: @google/genai@1.27.0 Last Updated: 2025-11-21


    License

    MIT License - Free to use in personal and commercial projects.


    Questions or Issues?

    • GitHub: https://github.com/secondsky/claude-skills
    • Email: maintainers@example.com
    Repository
    secondsky/claude-skills
    Files