Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Give agents more agency

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    digitalsamba

    elevenlabs

    digitalsamba/elevenlabs
    Design
    30

    About

    SKILL.md

    Install

    • Telegram
      Telegram
    • Slack
      Slack
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    • Download skill
    ├─
    ├─
    └─

    About

    Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games...

    SKILL.md

    ElevenLabs Audio Generation

    Requires ELEVENLABS_API_KEY in .env.

    Text-to-Speech

    from elevenlabs.client import ElevenLabs
    from elevenlabs import save, VoiceSettings
    import os
    
    client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
    
    audio = client.text_to_speech.convert(
        text="Welcome to my video!",
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_multilingual_v2",
        voice_settings=VoiceSettings(
            stability=0.5,
            similarity_boost=0.75,
            style=0.5,
            speed=1.0
        )
    )
    save(audio, "voiceover.mp3")
    

    Models

    Model Quality SSML Support Notes
    eleven_multilingual_v2 Highest consistency None Stable, production-ready, 29 languages
    eleven_flash_v2_5 Good <break>, <phoneme> Fast, supports pause/pronunciation tags
    eleven_turbo_v2_5 Good <break>, <phoneme> Fastest latency
    eleven_v3 Most expressive None Alpha — unreliable, needs prompt engineering

    Choose: multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).

    Voice Settings by Style

    Style stability similarity style speed
    Natural/professional 0.75-0.85 0.9 0.0-0.1 1.0
    Conversational 0.5-0.6 0.85 0.3-0.4 0.9-1.0
    Energetic/YouTuber 0.3-0.5 0.75 0.5-0.7 1.0-1.1

    Pauses Between Sections

    With flash/turbo models: Use SSML break tags inline:

    ...end of section. <break time="1.5s" /> Start of next...
    

    Max 3 seconds per break. Excessive breaks can cause speed artifacts.

    With multilingual_v2 / v3: No SSML support. Options:

    • Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
    • Post-process with ffmpeg: split audio and insert silence

    WARNING: ... (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.

    Pronunciation Control

    Phonetic spelling (any model): Write words as you want them pronounced:

    • Janus → Jan-us
    • nginx → engine-x
    • Use dashes, capitals, apostrophes to guide pronunciation

    SSML phoneme tags (flash/turbo only):

    <phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
    

    Iterative Workflow

    1. Generate → listen → identify pronunciation/pacing issues
    2. Adjust: phonetic spellings, break tags, voice settings
    3. Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.

    Voice Cloning

    Instant Voice Clone

    with open("sample.mp3", "rb") as f:
        voice = client.voices.ivc.create(
            name="My Voice",
            files=[f],
            remove_background_noise=True
        )
    print(f"Voice ID: {voice.voice_id}")
    
    • Use client.voices.ivc.create() (not client.voices.clone())
    • Pass file handles in binary mode ("rb"), not paths
    • Convert m4a first: ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3
    • Multiple samples (2-3 clips) improve accuracy
    • Save voice ID for reuse

    Professional Voice Clone: Requires Creator plan+, 30+ min audio. See reference.md.

    Sound Effects

    Max 22 seconds per generation.

    result = client.text_to_sound_effects.convert(
        text="Thunder rumbling followed by heavy rain",
        duration_seconds=10,
        prompt_influence=0.3
    )
    with open("thunder.mp3", "wb") as f:
        for chunk in result:
            f.write(chunk)
    

    Prompt tips: Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"

    Music Generation

    10 seconds to 5 minutes. Use client.music.compose() (not .generate()).

    result = client.music.compose(
        prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
        music_length_ms=60000,
        force_instrumental=True
    )
    with open("music.mp3", "wb") as f:
        for chunk in result:
            f.write(chunk)
    

    Prompt structure: Genre, mood, instruments, tempo, use case. Add "no vocals" or use force_instrumental=True for background music.

    Remotion Integration

    Complete Workflow: Script to Synchronized Scene

    VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
            ↓                  ↓               ↓                 ↓
      Scene narration    Generate MP3    Audio files     <Audio> component
      with durations     per scene       with timing     synced to scenes
    

    Step 1: Generate Per-Scene Audio

    Use the toolkit's voiceover tool to generate audio for each scene:

    # Generate voiceover files for each scene
    python tools/voiceover.py --scene-dir public/audio/scenes --json
    
    # Output:
    # public/audio/scenes/
    #   ├── scene-01-title.mp3
    #   ├── scene-02-problem.mp3
    #   ├── scene-03-solution.mp3
    #   └── manifest.json  (durations for each file)
    

    The manifest.json contains timing info:

    {
      "scenes": [
        { "file": "scene-01-title.mp3", "duration": 4.2 },
        { "file": "scene-02-problem.mp3", "duration": 12.8 },
        { "file": "scene-03-solution.mp3", "duration": 15.3 }
      ],
      "totalDuration": 32.3
    }
    

    Step 2: Use Audio in Remotion Composition

    // src/Composition.tsx
    import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
    
    // Import scene components
    import { TitleSlide } from './scenes/TitleSlide';
    import { ProblemSlide } from './scenes/ProblemSlide';
    import { SolutionSlide } from './scenes/SolutionSlide';
    
    // Scene durations (from manifest.json, converted to frames at 30fps)
    const SCENE_DURATIONS = {
      title: Math.ceil(4.2 * 30),      // 126 frames
      problem: Math.ceil(12.8 * 30),   // 384 frames
      solution: Math.ceil(15.3 * 30),  // 459 frames
    };
    
    export const MainComposition: React.FC = () => {
      return (
        <>
          {/* Scene sequence */}
          <Series>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
              <TitleSlide />
            </Series.Sequence>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
              <ProblemSlide />
            </Series.Sequence>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
              <SolutionSlide />
            </Series.Sequence>
          </Series>
    
          {/* Audio track - plays continuously across all scenes */}
          <Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
    
          {/* Optional: Background music at lower volume */}
          <Audio src={staticFile('audio/music.mp3')} volume={0.15} />
        </>
      );
    };
    

    Step 3: Per-Scene Audio (Alternative)

    For more control, add audio to each scene individually:

    // src/scenes/ProblemSlide.tsx
    import { Audio, staticFile, useCurrentFrame } from 'remotion';
    
    export const ProblemSlide: React.FC = () => {
      const frame = useCurrentFrame();
    
      return (
        <div style={{ /* slide styles */ }}>
          <h1>The Problem</h1>
          {/* Scene content */}
    
          {/* Audio starts when this scene starts (frame 0 of this sequence) */}
          <Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
        </div>
      );
    };
    

    Syncing Visuals to Voiceover

    Calculate scene duration from audio, not the other way around:

    // src/config/timing.ts
    import manifest from '../../public/audio/scenes/manifest.json';
    
    const FPS = 30;
    
    // Convert audio durations to frame counts
    export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
      const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
      acc[name] = Math.ceil(scene.duration * FPS);
      return acc;
    }, {} as Record<string, number>);
    
    // Usage in composition:
    // <Series.Sequence durationInFrames={sceneDurations.title}>
    

    Audio Timing Patterns

    import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
    
    // Fade in audio
    export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
      src,
      fadeFrames = 30
    }) => {
      const frame = useCurrentFrame();
      const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
        extrapolateRight: 'clamp',
      });
      return <Audio src={src} volume={volume} />;
    };
    
    // Delayed audio start
    export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
      src,
      delayFrames
    }) => (
      <Sequence from={delayFrames}>
        <Audio src={src} />
      </Sequence>
    );
    
    // Usage:
    // <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
    // <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
    

    Voiceover + Demo Video Sync

    When a scene has both voiceover and demo video:

    import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
    
    export const DemoScene: React.FC = () => {
      const { durationInFrames, fps } = useVideoConfig();
    
      // Calculate playback rate to fit demo into voiceover duration
      const demoDuration = 45; // seconds (original demo length)
      const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
      const playbackRate = demoDuration / sceneDuration;
    
      return (
        <>
          <OffthreadVideo
            src={staticFile('demos/feature-demo.mp4')}
            playbackRate={playbackRate}
          />
          <Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
        </>
      );
    };
    

    Error Handling

    import { Audio, staticFile, delayRender, continueRender } from 'remotion';
    import { useEffect, useState } from 'react';
    
    export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
      const [handle] = useState(() => delayRender());
      const [audioReady, setAudioReady] = useState(false);
    
      useEffect(() => {
        const audio = new window.Audio(src);
        audio.oncanplaythrough = () => {
          setAudioReady(true);
          continueRender(handle);
        };
        audio.onerror = () => {
          console.error(`Failed to load audio: ${src}`);
          continueRender(handle); // Continue without audio rather than hang
        };
      }, [src, handle]);
    
      if (!audioReady) return null;
      return <Audio src={src} />;
    };
    

    Toolkit Command: /generate-voiceover

    The /generate-voiceover command handles the full workflow:

    /generate-voiceover
    
    1. Reads VOICEOVER-SCRIPT.md
    2. Extracts narration for each scene
    3. Generates audio via ElevenLabs API
    4. Saves to public/audio/scenes/
    5. Creates manifest.json with durations
    6. Updates project.json with timing info
    

    Popular Voices

    • George: JBFqnCBsd6RMkjVDRZzb (warm narrator)
    • Rachel: 21m00Tcm4TlvDq8ikWAM (clear female)
    • Adam: pNInz6obpgDQGcFmaJgB (professional male)

    List all: client.voices.get_all()

    For full API docs, see reference.md.

    Recommended Servers
    JigsawStack Image Generation
    JigsawStack Image Generation
    Nanobanana
    Nanobanana
    DataForB2B
    DataForB2B
    Repository
    digitalsamba/claude-code-video-toolkit
    Files