Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    different-ai

    video-subtitle-cutter

    different-ai/video-subtitle-cutter
    AI & ML
    189
    3 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Transcribe video, analyze subtitles with AI, and cut video by removing filler words, pauses, and mistakes

    SKILL.md

    What I Do

    Automate video editing by:

    1. Transcribing video to timestamped subtitles (Whisper)
    2. Analyzing transcript with AI to identify cuts (filler words, pauses, mistakes)
    3. Generating FFmpeg commands to cut and concatenate clean segments
    4. Generating subtitles (SRT) for the final video

    CRITICAL: Always Re-encode (Never Use -c copy)

    The #1 mistake is using -c copy for cutting. This causes:

    • Frozen frames at cut points (1-8 seconds of freeze)
    • Audio/video sync issues
    • Glitchy playback

    Why? H.264 video uses keyframes (I-frames) every 2-10 seconds. -c copy can only cut at keyframes, so FFmpeg includes extra frames that display as frozen.

    Solution: Always re-encode segments with quality settings:

    # WRONG - causes freeze frames
    ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4
    
    # CORRECT - smooth cuts at any timestamp
    ffmpeg -ss 10 -i video.mp4 -t 5 \
      -c:v libx264 -preset fast -crf 18 \
      -c:a aac -b:a 192k \
      -avoid_negative_ts make_zero \
      segment.mp4
    

    Quality presets (CRF = Constant Rate Factor):

    • crf 15-17 = Near lossless (large files)
    • crf 18-20 = High quality (recommended)
    • crf 21-23 = Good quality (smaller files)
    • crf 24-28 = Medium quality (much smaller)

    Prerequisites

    # Install Whisper (choose one)
    pip install openai-whisper          # Local (requires Python 3.9+)
    # OR use OpenAI API (no local install needed)
    
    # Install FFmpeg
    brew install ffmpeg                  # macOS
    sudo apt install ffmpeg              # Linux
    

    Quick Start

    Step 1: Transcribe Video

    Option A: Local Whisper (free, slower)

    whisper video.mp4 --model medium --output_format json --output_dir ./
    

    Option B: OpenAI Whisper API (fast, paid)

    curl https://api.openai.com/v1/audio/transcriptions \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -F file="@video.mp4" \
      -F model="whisper-1" \
      -F response_format="verbose_json" \
      -F timestamp_granularities[]="segment" \
      > transcript.json
    

    Option C: Use ffmpeg to extract audio first (for large files)

    # Extract audio (much smaller file to upload)
    ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 audio.mp3
    
    # Then transcribe the audio
    whisper audio.mp3 --model medium --output_format json
    

    Step 2: Analyze Transcript for Cuts

    Feed the transcript to the AI with this prompt:

    Analyze this video transcript and identify segments to CUT (remove).
    
    TRANSCRIPT:
    {paste transcript.json segments here}
    
    Identify these issues:
    1. FILLER WORDS: "um", "uh", "like", "you know", "basically", "actually", "so", "right"
    2. FALSE STARTS: Incomplete sentences that restart ("I think— actually, let me...")
    3. LONG PAUSES: Gaps > 1.5 seconds between segments
    4. REPETITIONS: Same word/phrase repeated ("really really really")
    5. CORRECTIONS: "Wait, I meant...", "Sorry, let me rephrase..."
    6. TANGENTS: Off-topic rambling (use judgment)
    
    Return a JSON array of segments to KEEP (not cut):
    [
      {"start": 0.0, "end": 2.5, "text": "Welcome to this video"},
      {"start": 3.1, "end": 8.4, "text": "Today we're going to cover..."},
      ...
    ]
    
    Rules:
    - Merge adjacent keep segments if gap < 0.3s
    - Ensure cuts don't happen mid-word (check word boundaries)
    - Preserve natural speech rhythm (don't over-cut)
    - When in doubt, keep the segment
    

    Step 3: Generate FFmpeg Commands (High Quality)

    Once you have the keep segments, use this Python script for smooth cuts:

    import json
    import subprocess
    import os
    
    VIDEO_INPUT = "video.mp4"
    VIDEO_OUTPUT = "video_clean.mp4"
    SEGMENTS_FILE = "keep_segments.json"
    
    with open(SEGMENTS_FILE) as f:
        segments = json.load(f)
    
    segment_files = []
    for i, seg in enumerate(segments):
        outfile = f"temp_seg_{i:04d}.mp4"
        segment_files.append(outfile)
    
        # MUST re-encode for smooth cuts (no -c copy!)
        cmd = [
            'ffmpeg', '-y',
            '-ss', str(seg['start']),      # Seek BEFORE input (fast)
            '-i', VIDEO_INPUT,
            '-t', str(seg['end'] - seg['start']),  # Duration
            '-c:v', 'libx264',
            '-preset', 'fast',              # fast/medium/slow
            '-crf', '18',                   # Quality (lower = better, 15-23 recommended)
            '-c:a', 'aac',
            '-b:a', '192k',
            '-avoid_negative_ts', 'make_zero',  # Fix timestamp issues
            '-async', '1',                  # Sync audio
            outfile
        ]
        subprocess.run(cmd, capture_output=True)
        print(f"✓ Segment {i+1}/{len(segments)}")
    
    # Create concat file
    with open('temp_concat.txt', 'w') as f:
        for sf in segment_files:
            f.write(f"file '{sf}'\n")
    
    # Concatenate (can use -c copy here since all segments match)
    subprocess.run([
        'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
        '-i', 'temp_concat.txt',
        '-c', 'copy',
        VIDEO_OUTPUT
    ])
    
    # Cleanup
    for sf in segment_files:
        os.remove(sf)
    os.remove('temp_concat.txt')
    print(f"✓ Created: {VIDEO_OUTPUT}")
    

    Key flags explained:

    • -ss before -i: Fast seek (doesn't decode entire video)
    • -t: Duration of segment (not end time)
    • -crf 18: High quality encoding
    • -avoid_negative_ts make_zero: Fixes concat timestamp issues
    • -async 1: Keeps audio in sync

    Step 4: Generate Subtitles

    After creating the final video, generate fresh subtitles with Whisper:

    # Generate SRT subtitles for the cleaned video
    whisper video_clean.mp4 --model medium --output_format srt --output_dir ./
    
    # For higher accuracy (slower):
    whisper video_clean.mp4 --model large --output_format srt --language en
    
    # Output: video_clean.srt
    

    Burn subtitles into video (optional):

    # Embed subtitles permanently
    ffmpeg -i video_clean.mp4 -vf "subtitles=video_clean.srt:force_style='FontSize=24,FontName=Arial,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2'" -c:a copy video_with_subs.mp4
    

    Subtitle styling options:

    • FontSize=24 - Text size
    • FontName=Arial - Font face
    • PrimaryColour=&HFFFFFF - White text (BGR format)
    • OutlineColour=&H000000 - Black outline
    • Outline=2 - Outline thickness
    • MarginV=50 - Distance from bottom

    Complete Workflow Script (High Quality)

    #!/usr/bin/env python3
    """
    video_clean.py - Clean up video by removing filler words/pauses
    Uses re-encoding for smooth cuts (no freeze frames)
    """
    
    import json
    import subprocess
    import os
    import sys
    
    def get_duration(filepath):
        """Get video duration in seconds"""
        result = subprocess.run([
            'ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', filepath
        ], capture_output=True, text=True)
        return float(json.loads(result.stdout)['format']['duration'])
    
    def extract_segment(input_file, start, end, output_file, crf=18, preset='fast'):
        """Extract a segment with re-encoding for smooth cuts"""
        cmd = [
            'ffmpeg', '-y',
            '-ss', str(start),
            '-i', input_file,
            '-t', str(end - start),
            '-c:v', 'libx264',
            '-preset', preset,
            '-crf', str(crf),
            '-c:a', 'aac',
            '-b:a', '192k',
            '-avoid_negative_ts', 'make_zero',
            '-async', '1',
            output_file
        ]
        return subprocess.run(cmd, capture_output=True, text=True)
    
    def concatenate_segments(segment_files, output_file):
        """Concatenate segments into final video"""
        with open('temp_concat.txt', 'w') as f:
            for sf in segment_files:
                f.write(f"file '{sf}'\n")
    
        subprocess.run([
            'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
            '-i', 'temp_concat.txt',
            '-c', 'copy',
            output_file
        ], capture_output=True)
    
        os.remove('temp_concat.txt')
    
    def generate_subtitles(video_file, model='medium'):
        """Generate SRT subtitles using Whisper"""
        subprocess.run([
            'whisper', video_file,
            '--model', model,
            '--output_format', 'srt',
            '--output_dir', './'
        ])
    
    def main(video_input, segments, output_name, crf=18):
        """Main workflow"""
        segment_files = []
    
        print(f"\n{'='*50}")
        print(f"Processing: {video_input}")
        print(f"Quality: CRF {crf} (lower=better, 15-23 recommended)")
        print(f"{'='*50}\n")
    
        # Extract segments with re-encoding
        for i, seg in enumerate(segments):
            outfile = f"temp_seg_{i:04d}.mp4"
            segment_files.append(outfile)
    
            result = extract_segment(video_input, seg['start'], seg['end'], outfile, crf)
            if result.returncode == 0:
                duration = seg['end'] - seg['start']
                print(f"✓ Segment {i+1}/{len(segments)}: {duration:.1f}s")
            else:
                print(f"✗ Error on segment {i+1}")
                print(result.stderr[-500:])
    
        # Concatenate
        print("\nConcatenating segments...")
        concatenate_segments(segment_files, output_name)
    
        # Cleanup temp segments
        for sf in segment_files:
            os.remove(sf)
    
        # Generate subtitles
        print("\nGenerating subtitles...")
        generate_subtitles(output_name)
    
        # Stats
        orig_duration = get_duration(video_input)
        new_duration = get_duration(output_name)
        orig_size = os.path.getsize(video_input) / (1024*1024)
        new_size = os.path.getsize(output_name) / (1024*1024)
    
        print(f"\n{'='*50}")
        print(f"COMPLETE")
        print(f"{'='*50}")
        print(f"Original:  {orig_duration:.0f}s | {orig_size:.1f} MB")
        print(f"Output:    {new_duration:.0f}s | {new_size:.1f} MB")
        print(f"Removed:   {orig_duration - new_duration:.0f}s ({((orig_duration - new_duration)/orig_duration)*100:.0f}%)")
        print(f"Video:     {output_name}")
        print(f"Subtitles: {output_name.replace('.mp4', '.srt')}")
    
    if __name__ == '__main__':
        # Example usage
        VIDEO = "input.mp4"
        SEGMENTS = [
            {"start": 0.0, "end": 10.5},
            {"start": 12.3, "end": 25.0},
            # ... add your segments
        ]
        main(VIDEO, SEGMENTS, "output_clean.mp4", crf=18)
    

    AI Analysis Prompt Templates

    Basic Cleanup (Filler Words Only)

    Remove filler words from this transcript. Return segments to KEEP.
    
    Filler words to remove: um, uh, like, you know, basically, actually, so, right, I mean
    
    TRANSCRIPT SEGMENTS:
    {segments}
    
    Return JSON: [{"start": float, "end": float, "text": "cleaned text"}, ...]
    

    Aggressive Cleanup (Podcast/Interview)

    Clean this podcast transcript for a tight, professional edit.
    
    REMOVE:
    - All filler words (um, uh, like, you know, basically, so, right)
    - False starts and restarts
    - Pauses longer than 1 second
    - Repetitions
    - Off-topic tangents
    - "That's a great question" type filler responses
    - Excessive laughter/reactions (keep some for naturalness)
    
    KEEP:
    - Core content and insights
    - Natural transitions
    - Important reactions that add context
    
    TRANSCRIPT:
    {segments}
    
    Return JSON array of segments to KEEP with cleaned text.
    

    Light Cleanup (Preserve Natural Feel)

    Lightly clean this transcript while preserving natural speech patterns.
    
    ONLY REMOVE:
    - "Um" and "uh" when standalone (not part of thinking pause)
    - Obvious mistakes followed by corrections
    - Technical issues (coughs, phone rings, etc.)
    
    PRESERVE:
    - Natural "like" and "you know" that add personality
    - Thinking pauses that feel authentic
    - Personality quirks
    
    TRANSCRIPT:
    {segments}
    
    Return JSON array of segments to KEEP.
    

    Transcript Format Reference

    Whisper JSON Output

    {
      "text": "Full transcript text...",
      "segments": [
        {
          "id": 0,
          "start": 0.0,
          "end": 2.5,
          "text": " Welcome to this video.",
          "tokens": [50364, 5765, ...],
          "temperature": 0.0,
          "avg_logprob": -0.25,
          "compression_ratio": 1.2,
          "no_speech_prob": 0.01
        },
        {
          "id": 1,
          "start": 2.5,
          "end": 5.8,
          "text": " Um, so today we're going to...",
          ...
        }
      ],
      "language": "en"
    }
    

    Keep Segments Format (for FFmpeg)

    [
      { "start": 0.0, "end": 2.5, "text": "Welcome to this video." },
      { "start": 3.2, "end": 5.8, "text": "Today we're going to..." }
    ]
    

    Advanced: Word-Level Timestamps

    For precise filler word removal, use word-level timestamps:

    # Whisper with word timestamps
    whisper video.mp4 --model medium --word_timestamps True --output_format json
    

    This gives you:

    {
      "segments": [
        {
          "start": 0.0,
          "end": 2.5,
          "text": "Um welcome to this video",
          "words": [
            { "word": "Um", "start": 0.0, "end": 0.3 },
            { "word": "welcome", "start": 0.5, "end": 0.9 },
            { "word": "to", "start": 0.9, "end": 1.0 },
            { "word": "this", "start": 1.0, "end": 1.2 },
            { "word": "video", "start": 1.2, "end": 1.6 }
          ]
        }
      ]
    }
    

    Now you can cut precisely around "Um" (0.0-0.3) and keep "welcome to this video" (0.5-1.6).


    Troubleshooting

    Frozen Frames at Cut Points (MOST COMMON)

    Cause: Using -c copy which can only cut at keyframes.

    Solution: Always re-encode with -c:v libx264 -crf 18 (see examples above).

    Audio/Video Sync Issues

    Add these flags when extracting segments:

    ffmpeg -ss 10 -i video.mp4 -t 5 \
      -c:v libx264 -crf 18 \
      -c:a aac -b:a 192k \
      -avoid_negative_ts make_zero \  # Fix negative timestamps
      -async 1 \                       # Sync audio to video
      segment.mp4
    

    Cuts Sound Abrupt

    Add audio fade in/out to each segment:

    ffmpeg -ss 10 -i video.mp4 -t 5 \
      -c:v libx264 -crf 18 \
      -af "afade=t=in:st=0:d=0.05,afade=t=out:st=4.95:d=0.05" \
      -c:a aac segment.mp4
    

    Large Files Take Forever

    1. Use -preset fast or -preset veryfast (trades quality for speed)
    2. Extract audio first for transcription (much smaller)
    3. Use Whisper API instead of local model
    4. Process in parallel (multiple segments at once)
    # Faster encoding (slightly lower quality)
    ffmpeg ... -preset veryfast -crf 20 ...
    
    # Even faster for previews
    ffmpeg ... -preset ultrafast -crf 23 ...
    

    Whisper Misses Words

    • Use --model large for better accuracy
    • Use --language en to force English
    • Normalize audio first:
    ffmpeg -i video.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy normalized.mp4
    

    File Size Too Large After Re-encoding

    Increase CRF value (higher = smaller file, lower quality):

    # Original quality (large)
    -crf 18
    
    # Good quality (medium)
    -crf 22
    
    # Acceptable quality (small)
    -crf 26
    

    Integration with OpenCode

    When using this skill in OpenCode:

    1. Extract audio (faster transcription):

      ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 temp_audio.mp3 -y
      
    2. Transcribe with Whisper:

      whisper temp_audio.mp3 --model medium --output_format json --output_dir ./
      
    3. Read transcript.json and analyze segments

    4. Identify segments to KEEP based on:

      • Removing filler words (um, uh, like, you know)
      • Removing long pauses (>1.5s gaps)
      • Removing false starts and repetitions
      • For "shorts style": Keep only hook + key points + CTA
    5. Re-encode and concatenate (MUST re-encode, never -c copy):

      # Use the Python script above with crf=18 for quality
      
    6. Generate subtitles for final video:

      whisper output.mp4 --model medium --output_format srt
      
    7. Report results with before/after stats

    Quality Settings Reference

    Use Case CRF Preset Notes
    Archive/Master 15-17 slow Near lossless, large files
    YouTube/Vimeo 18-20 medium High quality, recommended
    Social Media 21-23 fast Good quality, smaller
    Preview/Draft 24-28 veryfast Quick renders

    Anti-Patterns (DO NOT DO)

    # WRONG: -c copy causes freeze frames
    ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4
    
    # WRONG: -to instead of -t with -ss before -i
    ffmpeg -ss 10 -i video.mp4 -to 15 ...  # -to is absolute, not relative
    
    # WRONG: Missing timestamp fix flags
    ffmpeg ... -c:v libx264 ...  # Missing -avoid_negative_ts
    
    Recommended Servers
    Gemini
    Gemini
    vastlint - IAB XML VAST validator and linter
    vastlint - IAB XML VAST validator and linter
    Youtube
    Youtube
    Repository
    different-ai/agent-bank
    Files