Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    michaelboeding

    voice-generation

    michaelboeding/voice-generation
    AI & ML
    6
    2 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Use this skill for AI text-to-speech generation.

    SKILL.md

    Voice Generation Skill

    Generate realistic speech using AI (Google Gemini TTS, ElevenLabs, OpenAI TTS).

    Prerequisites

    At least one API key is required:

    • GOOGLE_API_KEY - For Google Gemini TTS (same key as video/image/music) ✅
    • ELEVENLABS_API_KEY - For ElevenLabs high-quality voice synthesis
    • OPENAI_API_KEY - For OpenAI TTS voices

    Available APIs

    Google Gemini TTS (Recommended - Same API Key)

    • Best for: Podcasts, dialogues, audiobooks with style control
    • Voices: 30 voices with natural language style control
    • Multi-speaker: Up to 2 speakers for dialogues ✅
    • Languages: 24 languages (auto-detected)
    • Features: Control style, accent, pace via prompts
    • Output: 24kHz WAV
    • API Key: Same GOOGLE_API_KEY as video/image/music ✅

    ElevenLabs (Best Quality)

    • Best for: Natural-sounding voices, voice cloning, long-form content
    • Voices: 100+ pre-made voices + custom voice cloning
    • Languages: 29+ languages
    • Models: Eleven Multilingual v2, Eleven Turbo v2

    OpenAI TTS (Simplest)

    • Best for: Quick, reliable text-to-speech with consistent quality
    • Voices: alloy, echo, fable, onyx, nova, shimmer
    • Models: tts-1 (fast), tts-1-hd (high quality)
    • Output: MP3, Opus, AAC, FLAC

    Workflow

    Step 1: Understand the Request

    Parse the user's voice request for:

    • Text content: What should be spoken?
    • Voice type: Male, female, specific character?
    • Tone: Professional, casual, dramatic, cheerful?
    • Use case: Narration, voiceover, audiobook, notification?
    • Language: English, Spanish, other?
    • Speed: Normal, slow, fast?

    Step 2: Select Voice and API

    Choose based on requirements:

    Use Case Recommended API Reason
    Default / Same key as video Gemini TTS Same GOOGLE_API_KEY ✅
    Multi-speaker dialogue Gemini TTS Up to 2 speakers built-in
    Style/accent control Gemini TTS Natural language prompts
    Voice cloning ElevenLabs Only API with cloning
    100+ voice options ElevenLabs Widest selection
    Audiobook/podcast ElevenLabs or Gemini Both excellent for long content
    Quick narration OpenAI TTS Fast, reliable
    Budget-conscious OpenAI TTS Lower cost

    Step 3: Prepare the Text

    Optimize text for speech:

    1. Add pauses: Use commas, periods for natural rhythm
    2. Spell out numbers: "1,234" → "one thousand two hundred thirty-four" (if needed)
    3. Handle acronyms: "NASA" vs "N.A.S.A." depending on pronunciation
    4. Mark emphasis: Some APIs support emphasis markers

    Example transformation:

    • Original: "The Q4 2024 results show a 15% YoY increase."
    • Optimized: "The Q4 2024 results show a fifteen percent year-over-year increase."

    Step 4: Generate the Audio

    Execute the appropriate script from ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/:

    For Google Gemini TTS (single speaker):

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
      --text "Welcome to our podcast!" \
      --voice "Charon"
    

    Gemini TTS with style direction:

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
      --text "Have a wonderful day!" \
      --voice "Puck" \
      --style "Say cheerfully with a British accent:"
    

    Gemini TTS multi-speaker (dialogue):

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
      --multi \
      --speaker "Host:Charon" \
      --speaker "Guest:Aoede" \
      --text "Host: Welcome to the show!
    Guest: Thanks for having me!"
    

    For ElevenLabs:

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
      --text "Your text here" \
      --voice "Rachel" \
      --model "eleven_multilingual_v2"
    

    For OpenAI TTS:

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
      --text "Your text here" \
      --voice "nova" \
      --model "tts-1-hd"
    

    List Gemini voices:

    python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py --list-voices
    

    Step 5: Deliver the Result

    1. Provide the generated audio file path
    2. Mention the voice and settings used
    3. Offer to:
      • Try a different voice
      • Adjust speed or tone
      • Use a different API
      • Generate in a different format

    Error Handling

    Missing API key: Inform the user which key is needed:

    • Gemini TTS: Same GOOGLE_API_KEY as video/image - https://aistudio.google.com/apikey
    • ElevenLabs: https://elevenlabs.io
    • OpenAI: https://platform.openai.com/api-keys

    Gemini TTS requires google-genai package: pip install google-genai

    Text too long: Split into chunks and concatenate, or suggest shorter text.

    Rate limit: Suggest waiting or trying a different API.

    Unsupported language: Suggest an alternative API that supports the language.

    Multi-speaker limit: Gemini TTS supports max 2 speakers. For more, use ElevenLabs with multiple calls.

    Voice Selection Guide

    Google Gemini TTS Voices (30 voices)

    Style Voices Best For
    Bright/Upbeat Zephyr, Puck, Aoede, Laomedeia Marketing, cheerful content
    Firm/Informative Charon, Kore, Orus, Rasalgethi News, tutorials, professional
    Soft/Warm Achernar, Sulafat, Vindemiatrix Meditation, gentle narration
    Smooth Algieba, Despina, Callirrhoe Audiobooks, storytelling
    Clear Erinome, Iapetus, Pulcherrima Instructions, clarity
    Character Fenrir (excitable), Enceladus (breathy), Algenib (gravelly), Gacrux (mature) Character voices, drama
    Friendly Achird, Zubenelgenubi (casual) Casual, conversational

    Gemini TTS Style Tips:

    • Use natural language: --style "Say angrily:" or --style "Whisper mysteriously:"
    • Specify accents: --style "Speak with a British accent from London:"
    • Control pace: --style "Speak slowly and deliberately:"
    • Combine: --style "Say excitedly with a Southern US accent:"

    OpenAI TTS Voices

    Voice Description Best For
    alloy Neutral, balanced General purpose
    echo Warm, conversational Podcasts, casual
    fable Expressive, British Storytelling
    onyx Deep, authoritative Narration, professional
    nova Friendly, upbeat Marketing, tutorials
    shimmer Soft, gentle Meditation, ASMR

    ElevenLabs Popular Voices

    Voice Description Best For
    Rachel Young female, American Narration, audiobooks
    Domi Young female, energetic Marketing, ads
    Bella Young female, soft Storytelling
    Antoni Young male, well-rounded Narration
    Josh Young male, deep Audiobooks
    Arnold Mature male, authoritative Documentary
    Adam Middle-aged male, deep Narration
    Sam Young male, raspy Character voices

    Best Practices

    For Narration

    • Use a consistent voice throughout
    • Add natural pauses between paragraphs
    • Consider pacing for the content type

    For Dialogue

    • Use different voices for different characters
    • Match voice characteristics to character descriptions
    • Adjust speed for emotional scenes

    For Accessibility

    • Use clear, well-paced speech
    • Avoid overly stylized voices
    • Test with screen readers if applicable

    API Comparison

    Feature Gemini TTS ElevenLabs OpenAI TTS
    API Key GOOGLE_API_KEY ✅ ELEVENLABS_API_KEY OPENAI_API_KEY
    Voice quality Excellent Excellent Very good
    Voice variety 30 voices 100+ voices 6 voices
    Multi-speaker ✅ Up to 2 ❌ No ❌ No
    Style control ✅ Natural language Limited ❌ No
    Voice cloning ❌ No ✅ Yes ❌ No
    Languages 24 29+ 50+
    Speed control Via prompts Yes Yes (0.25-4x)
    Max length 32k tokens 5,000 chars 4,096 chars
    Output format WAV (24kHz) MP3, WAV MP3, Opus, AAC, FLAC
    Same key as video/image ✅ Yes ❌ No ❌ No
    Repository
    michaelboeding/skills
    Files