Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    chekos

    gemini-imagegen

    chekos/gemini-imagegen
    Design
    1
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Low-level Gemini API skill for image generation.

    SKILL.md

    Image Generator

    This skill generates and edits images using Google's Gemini Nano Banana Pro model (gemini-3-pro-image-preview).

    IMPORTANT: Setup Required

    Before using this skill, the user must set the GEMINI_API_KEY environment variable:

    1. Get a free API key from Google AI Studio
    2. Export the key in your shell profile (~/.zshrc, ~/.bashrc, etc.):
      export GEMINI_API_KEY="your_api_key_here"
      
    3. Restart your terminal or run source ~/.zshrc (or ~/.bashrc)

    The skill will not work without this configuration.

    Pre-flight Check

    Before making any API call, verify the key is set:

    if [ -z "$GEMINI_API_KEY" ]; then
      echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile."
      exit 1
    fi
    

    If the key is missing, stop and tell the user to set it using the instructions above.

    Configuration

    Model: gemini-3-pro-image-preview

    API Key: Read from the GEMINI_API_KEY environment variable

    Iterating on User-Provided Images

    When the user provides a path to an image they want to edit or iterate on, use this workflow:

    Step 1: Read and encode the image to base64

    # Get the image path from user
    IMG_PATH="/path/to/user/image.png"
    
    # Detect mime type
    if [[ "$IMG_PATH" == *.png ]]; then
        MIME_TYPE="image/png"
    elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then
        MIME_TYPE="image/jpeg"
    elif [[ "$IMG_PATH" == *.webp ]]; then
        MIME_TYPE="image/webp"
    else
        MIME_TYPE="image/png"
    fi
    
    # Encode to base64 (works on both macOS and Linux)
    if [[ "$(uname)" == "Darwin" ]]; then
        IMG_BASE64=$(base64 -i "$IMG_PATH")
    else
        IMG_BASE64=$(base64 -w0 "$IMG_PATH")
    fi
    

    Step 2: Send image with edit prompt (File-Based Approach)

    IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause "argument list too long" errors.

    # User's edit request
    EDIT_PROMPT="Add a santa hat to the person in this image"
    
    # Write request to a JSON file (avoids command line length limits)
    cat > /tmp/gemini_request.json << JSONEOF
    {
      "contents": [{
        "parts": [
          {"text": "$EDIT_PROMPT"},
          {
            "inline_data": {
              "mime_type": "$MIME_TYPE",
              "data": "$IMG_BASE64"
            }
          }
        ]
      }],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"]
      }
    }
    JSONEOF
    
    # Call the API using the file
    curl -s -X POST \
      "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
      -H "x-goog-api-key: $GEMINI_API_KEY" \
      -H "Content-Type: application/json" \
      -d @/tmp/gemini_request.json > /tmp/gemini_response.json
    

    Step 3: Extract and save the edited image

    # Extract image from response and save
    python3 -c "
    import json
    import base64
    
    with open('/tmp/gemini_response.json') as f:
        data = json.load(f)
    
    for part in data['candidates'][0]['content']['parts']:
        if 'inlineData' in part:
            img_data = part['inlineData']['data']
            mime = part['inlineData']['mimeType']
            ext = 'png' if 'png' in mime else 'jpg'
            with open('edited_image.' + ext, 'wb') as out:
                out.write(base64.b64decode(img_data))
            print(f'Saved: edited_image.{ext}')
        elif 'text' in part:
            print(part['text'])
    "
    

    Complete Example (File-Based)

    For iterating on images, always use file-based requests:

    # Variables
    IMG_PATH="/path/to/image.png"
    EDIT_PROMPT="Make the background a sunset beach"
    OUTPUT_PATH="edited_output.png"
    # Detect mime type and encode
    MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg")
    IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")
    
    # Write request to file (required - base64 images are too large for command line)
    cat > /tmp/gemini_request.json << JSONEOF
    {
      "contents": [{
        "parts": [
          {"text": "$EDIT_PROMPT"},
          {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}}
        ]
      }],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"]
      }
    }
    JSONEOF
    
    # Call API and extract image
    curl -s -X POST \
      "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
      -H "x-goog-api-key: $GEMINI_API_KEY" \
      -H "Content-Type: application/json" \
      -d @/tmp/gemini_request.json > /tmp/gemini_response.json
    
    # Save the output image
    python3 -c "
    import json, base64
    with open('/tmp/gemini_response.json') as f:
        data = json.load(f)
    for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []):
        if 'inlineData' in part:
            with open('$OUTPUT_PATH', 'wb') as f:
                f.write(base64.b64decode(part['inlineData']['data']))
            print('Saved: $OUTPUT_PATH')
    "
    

    Multi-Image Input (Combine/Compose)

    To combine elements from multiple images (also uses file-based approach):

    IMG1_PATH="/path/to/image1.png"
    IMG2_PATH="/path/to/image2.png"
    PROMPT="Put the dress from the first image on the person in the second image"
    IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH")
    IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")
    
    # Write request to file
    cat > /tmp/gemini_request.json << JSONEOF
    {
      "contents": [{
        "parts": [
          {"text": "$PROMPT"},
          {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}},
          {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}}
        ]
      }],
      "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
    }
    JSONEOF
    
    curl -s -X POST \
      "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
      -H "x-goog-api-key: $GEMINI_API_KEY" \
      -H "Content-Type: application/json" \
      -d @/tmp/gemini_request.json > /tmp/gemini_response.json
    

    Capabilities

    Text-to-Image Generation

    • Generate high-quality images from text descriptions
    • Support for photorealistic, stylized, and artistic outputs
    • Accurate text rendering in images (logos, infographics, diagrams)

    Image Editing

    • Add or remove elements from images
    • Inpainting with semantic masking (edit specific parts)
    • Style transfer (apply artistic styles to photos)
    • Multi-image composition (combine elements from multiple images)

    Advanced Features

    • High Resolution: 1K, 2K, or 4K output
    • Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
    • Google Search Grounding: Generate images based on real-time data
    • Multi-turn Editing: Iteratively refine images through conversation
    • Up to 14 Reference Images: Combine multiple inputs for complex compositions

    API Usage

    Basic Text-to-Image (Python)

    from google import genai
    from google.genai import types
    
    client = genai.Client()
    
    response = client.models.generate_content(
        model="gemini-3-pro-image-preview",
        contents=["Your prompt here"],
        config=types.GenerateContentConfig(
            response_modalities=['TEXT', 'IMAGE'],
            image_config=types.ImageConfig(
                aspect_ratio="16:9",  # Optional
                image_size="2K"       # Optional: "1K", "2K", "4K"
            )
        )
    )
    
    for part in response.parts:
        if part.text is not None:
            print(part.text)
        elif part.inline_data is not None:
            image = part.as_image()
            image.save("generated_image.png")
    

    Basic Text-to-Image (JavaScript)

    import { GoogleGenAI } from "@google/genai";
    import * as fs from "node:fs";
    
    const ai = new GoogleGenAI({});
    
    const response = await ai.models.generateContent({
        model: "gemini-3-pro-image-preview",
        contents: "Your prompt here",
        config: {
            responseModalities: ['TEXT', 'IMAGE'],
            imageConfig: {
                aspectRatio: "16:9",
                imageSize: "2K"
            }
        }
    });
    
    for (const part of response.candidates[0].content.parts) {
        if (part.text) {
            console.log(part.text);
        } else if (part.inlineData) {
            const buffer = Buffer.from(part.inlineData.data, "base64");
            fs.writeFileSync("generated_image.png", buffer);
        }
    }
    

    REST API (curl)

    curl -s -X POST \
      "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
      -H "x-goog-api-key: $GEMINI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "contents": [{
          "parts": [{"text": "Your prompt here"}]
        }],
        "generationConfig": {
          "responseModalities": ["TEXT", "IMAGE"],
          "imageConfig": {
            "aspectRatio": "16:9",
            "imageSize": "2K"
          }
        }
      }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
    

    Image Editing (with input image)

    from google import genai
    from google.genai import types
    from PIL import Image
    
    client = genai.Client()
    
    input_image = Image.open('input.png')
    prompt = "Add a wizard hat to the cat in this image"
    
    response = client.models.generate_content(
        model="gemini-3-pro-image-preview",
        contents=[prompt, input_image],
        config=types.GenerateContentConfig(
            response_modalities=['TEXT', 'IMAGE']
        )
    )
    
    for part in response.parts:
        if part.inline_data is not None:
            image = part.as_image()
            image.save("edited_image.png")
    

    Multi-Image Composition

    from google import genai
    from google.genai import types
    from PIL import Image
    
    client = genai.Client()
    
    image1 = Image.open('dress.png')
    image2 = Image.open('model.png')
    prompt = "Put the dress from the first image on the model from the second image"
    
    response = client.models.generate_content(
        model="gemini-3-pro-image-preview",
        contents=[image1, image2, prompt],
        config=types.GenerateContentConfig(
            response_modalities=['TEXT', 'IMAGE'],
            image_config=types.ImageConfig(
                aspect_ratio="3:4",
                image_size="2K"
            )
        )
    )
    

    With Google Search Grounding

    from google import genai
    from google.genai import types
    
    client = genai.Client()
    
    response = client.models.generate_content(
        model="gemini-3-pro-image-preview",
        contents="Visualize the current weather forecast for San Francisco",
        config=types.GenerateContentConfig(
            response_modalities=['TEXT', 'IMAGE'],
            image_config=types.ImageConfig(aspect_ratio="16:9"),
            tools=[{"google_search": {}}]
        )
    )
    

    Prompting Best Practices

    1. Be Descriptive, Not Keyword-Based

    Instead of: cat, wizard hat, cute Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window

    2. Specify Style and Mood

    • Photography terms: "shot with 85mm lens", "soft bokeh background", "golden hour lighting"
    • Artistic styles: "in the style of Van Gogh", "minimalist illustration", "photorealistic"
    • Mood: "warm and cozy atmosphere", "dramatic noir lighting"

    3. For Text in Images

    Be explicit about:

    • The exact text to render
    • Font style (descriptively): "clean, bold, sans-serif font"
    • Placement and size

    4. For Editing

    • Describe what to change and what to preserve
    • Use "keep everything else unchanged"
    • Reference specific elements clearly

    5. For Product/Commercial Images

    Mention:

    • Lighting setup: "three-point softbox lighting"
    • Background: "clean white studio background"
    • Camera angle: "slightly elevated 45-degree shot"

    Resolution and Aspect Ratio Reference

    Aspect Ratio 1K Resolution 2K Resolution 4K Resolution
    1:1 1024x1024 2048x2048 4096x4096
    16:9 1376x768 2752x1536 5504x3072
    9:16 768x1376 1536x2752 3072x5504
    3:2 1264x848 2528x1696 5056x3392
    2:3 848x1264 1696x2528 3392x5056

    Common Use Cases

    Logo Creation

    Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
    The text should be in a clean, bold, sans-serif font.
    Black and white color scheme. Put the logo in a circle.
    

    Product Photography

    A high-resolution, studio-lit product photograph of a minimalist ceramic
    coffee mug in matte black on a polished concrete surface. Three-point
    softbox lighting with soft, diffused highlights. Slightly elevated
    45-degree camera angle. Sharp focus on steam rising from the coffee.
    

    Style Transfer

    Transform this photograph of a city street at night into Vincent van Gogh's
    'Starry Night' style. Preserve the composition but render with swirling,
    impasto brushstrokes and deep blues with bright yellows.
    

    Infographic

    Create a vibrant infographic explaining photosynthesis as a recipe.
    Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy).
    Style like a colorful kids' cookbook, suitable for 4th graders.
    

    Error Handling

    Common issues:

    • No image returned: Check that response_modalities includes 'IMAGE'
    • Safety filters: Some prompts may be blocked; try rephrasing
    • Rate limits: Implement exponential backoff for retries
    • Large images: For 4K, ensure sufficient timeout settings

    Dependencies

    To use the Python SDK:

    pip install google-genai pillow
    

    For JavaScript:

    npm install @google/genai
    

    Important Notes

    • All generated images include a SynthID watermark
    • The model uses a "thinking" process for complex prompts
    • For best text rendering, generate text first, then request image with that text
    • Images are not stored by the API - save outputs locally
    Repository
    chekos/bns-marketplace
    Files