video-subtitle-cutter

different-ai/video-subtitle-cutter

AI & ML

189

3 installs

About

SKILL.md

video-subtitle-cutter

different-ai/video-subtitle-cutter

AI & ML

189

3 installs

About

Transcribe video, analyze subtitles with AI, and cut video by removing filler words, pauses, and mistakes

SKILL.md

What I Do

Automate video editing by:

Transcribing video to timestamped subtitles (Whisper)
Analyzing transcript with AI to identify cuts (filler words, pauses, mistakes)
Generating FFmpeg commands to cut and concatenate clean segments
Generating subtitles (SRT) for the final video

CRITICAL: Always Re-encode (Never Use `-c copy`)

The #1 mistake is using -c copy for cutting. This causes:

Frozen frames at cut points (1-8 seconds of freeze)
Audio/video sync issues
Glitchy playback

Why? H.264 video uses keyframes (I-frames) every 2-10 seconds. -c copy can only cut at keyframes, so FFmpeg includes extra frames that display as frozen.

Solution: Always re-encode segments with quality settings:

# WRONG - causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# CORRECT - smooth cuts at any timestamp
ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -preset fast -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \
  segment.mp4

Quality presets (CRF = Constant Rate Factor):

crf 15-17 = Near lossless (large files)
crf 18-20 = High quality (recommended)
crf 21-23 = Good quality (smaller files)
crf 24-28 = Medium quality (much smaller)

Prerequisites

# Install Whisper (choose one)
pip install openai-whisper          # Local (requires Python 3.9+)
# OR use OpenAI API (no local install needed)

# Install FFmpeg
brew install ffmpeg                  # macOS
sudo apt install ffmpeg              # Linux

Quick Start

Step 1: Transcribe Video

Option A: Local Whisper (free, slower)

whisper video.mp4 --model medium --output_format json --output_dir ./

Option B: OpenAI Whisper API (fast, paid)

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file="@video.mp4" \
  -F model="whisper-1" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="segment" \
  > transcript.json

Option C: Use ffmpeg to extract audio first (for large files)

# Extract audio (much smaller file to upload)
ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 audio.mp3

# Then transcribe the audio
whisper audio.mp3 --model medium --output_format json

Step 2: Analyze Transcript for Cuts

Feed the transcript to the AI with this prompt:

Analyze this video transcript and identify segments to CUT (remove).

TRANSCRIPT:
{paste transcript.json segments here}

Identify these issues:
1. FILLER WORDS: "um", "uh", "like", "you know", "basically", "actually", "so", "right"
2. FALSE STARTS: Incomplete sentences that restart ("I think— actually, let me...")
3. LONG PAUSES: Gaps > 1.5 seconds between segments
4. REPETITIONS: Same word/phrase repeated ("really really really")
5. CORRECTIONS: "Wait, I meant...", "Sorry, let me rephrase..."
6. TANGENTS: Off-topic rambling (use judgment)

Return a JSON array of segments to KEEP (not cut):
[
  {"start": 0.0, "end": 2.5, "text": "Welcome to this video"},
  {"start": 3.1, "end": 8.4, "text": "Today we're going to cover..."},
  ...
]

Rules:
- Merge adjacent keep segments if gap < 0.3s
- Ensure cuts don't happen mid-word (check word boundaries)
- Preserve natural speech rhythm (don't over-cut)
- When in doubt, keep the segment

Step 3: Generate FFmpeg Commands (High Quality)

Once you have the keep segments, use this Python script for smooth cuts:

import json
import subprocess
import os

VIDEO_INPUT = "video.mp4"
VIDEO_OUTPUT = "video_clean.mp4"
SEGMENTS_FILE = "keep_segments.json"

with open(SEGMENTS_FILE) as f:
    segments = json.load(f)

segment_files = []
for i, seg in enumerate(segments):
    outfile = f"temp_seg_{i:04d}.mp4"
    segment_files.append(outfile)

    # MUST re-encode for smooth cuts (no -c copy!)
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(seg['start']),      # Seek BEFORE input (fast)
        '-i', VIDEO_INPUT,
        '-t', str(seg['end'] - seg['start']),  # Duration
        '-c:v', 'libx264',
        '-preset', 'fast',              # fast/medium/slow
        '-crf', '18',                   # Quality (lower = better, 15-23 recommended)
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',  # Fix timestamp issues
        '-async', '1',                  # Sync audio
        outfile
    ]
    subprocess.run(cmd, capture_output=True)
    print(f"✓ Segment {i+1}/{len(segments)}")

# Create concat file
with open('temp_concat.txt', 'w') as f:
    for sf in segment_files:
        f.write(f"file '{sf}'\n")

# Concatenate (can use -c copy here since all segments match)
subprocess.run([
    'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
    '-i', 'temp_concat.txt',
    '-c', 'copy',
    VIDEO_OUTPUT
])

# Cleanup
for sf in segment_files:
    os.remove(sf)
os.remove('temp_concat.txt')
print(f"✓ Created: {VIDEO_OUTPUT}")

Key flags explained:

-ss before -i: Fast seek (doesn't decode entire video)
-t: Duration of segment (not end time)
-crf 18: High quality encoding
-avoid_negative_ts make_zero: Fixes concat timestamp issues
-async 1: Keeps audio in sync

Step 4: Generate Subtitles

After creating the final video, generate fresh subtitles with Whisper:

# Generate SRT subtitles for the cleaned video
whisper video_clean.mp4 --model medium --output_format srt --output_dir ./

# For higher accuracy (slower):
whisper video_clean.mp4 --model large --output_format srt --language en

# Output: video_clean.srt

Burn subtitles into video (optional):

# Embed subtitles permanently
ffmpeg -i video_clean.mp4 -vf "subtitles=video_clean.srt:force_style='FontSize=24,FontName=Arial,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2'" -c:a copy video_with_subs.mp4

Subtitle styling options:

FontSize=24 - Text size
FontName=Arial - Font face
PrimaryColour=&HFFFFFF - White text (BGR format)
OutlineColour=&H000000 - Black outline
Outline=2 - Outline thickness
MarginV=50 - Distance from bottom

Complete Workflow Script (High Quality)

#!/usr/bin/env python3
"""
video_clean.py - Clean up video by removing filler words/pauses
Uses re-encoding for smooth cuts (no freeze frames)
"""

import json
import subprocess
import os
import sys

def get_duration(filepath):
    """Get video duration in seconds"""
    result = subprocess.run([
        'ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', filepath
    ], capture_output=True, text=True)
    return float(json.loads(result.stdout)['format']['duration'])

def extract_segment(input_file, start, end, output_file, crf=18, preset='fast'):
    """Extract a segment with re-encoding for smooth cuts"""
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(start),
        '-i', input_file,
        '-t', str(end - start),
        '-c:v', 'libx264',
        '-preset', preset,
        '-crf', str(crf),
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',
        '-async', '1',
        output_file
    ]
    return subprocess.run(cmd, capture_output=True, text=True)

def concatenate_segments(segment_files, output_file):
    """Concatenate segments into final video"""
    with open('temp_concat.txt', 'w') as f:
        for sf in segment_files:
            f.write(f"file '{sf}'\n")

    subprocess.run([
        'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
        '-i', 'temp_concat.txt',
        '-c', 'copy',
        output_file
    ], capture_output=True)

    os.remove('temp_concat.txt')

def generate_subtitles(video_file, model='medium'):
    """Generate SRT subtitles using Whisper"""
    subprocess.run([
        'whisper', video_file,
        '--model', model,
        '--output_format', 'srt',
        '--output_dir', './'
    ])

def main(video_input, segments, output_name, crf=18):
    """Main workflow"""
    segment_files = []

    print(f"\n{'='*50}")
    print(f"Processing: {video_input}")
    print(f"Quality: CRF {crf} (lower=better, 15-23 recommended)")
    print(f"{'='*50}\n")

    # Extract segments with re-encoding
    for i, seg in enumerate(segments):
        outfile = f"temp_seg_{i:04d}.mp4"
        segment_files.append(outfile)

        result = extract_segment(video_input, seg['start'], seg['end'], outfile, crf)
        if result.returncode == 0:
            duration = seg['end'] - seg['start']
            print(f"✓ Segment {i+1}/{len(segments)}: {duration:.1f}s")
        else:
            print(f"✗ Error on segment {i+1}")
            print(result.stderr[-500:])

    # Concatenate
    print("\nConcatenating segments...")
    concatenate_segments(segment_files, output_name)

    # Cleanup temp segments
    for sf in segment_files:
        os.remove(sf)

    # Generate subtitles
    print("\nGenerating subtitles...")
    generate_subtitles(output_name)

    # Stats
    orig_duration = get_duration(video_input)
    new_duration = get_duration(output_name)
    orig_size = os.path.getsize(video_input) / (1024*1024)
    new_size = os.path.getsize(output_name) / (1024*1024)

    print(f"\n{'='*50}")
    print(f"COMPLETE")
    print(f"{'='*50}")
    print(f"Original:  {orig_duration:.0f}s | {orig_size:.1f} MB")
    print(f"Output:    {new_duration:.0f}s | {new_size:.1f} MB")
    print(f"Removed:   {orig_duration - new_duration:.0f}s ({((orig_duration - new_duration)/orig_duration)*100:.0f}%)")
    print(f"Video:     {output_name}")
    print(f"Subtitles: {output_name.replace('.mp4', '.srt')}")

if __name__ == '__main__':
    # Example usage
    VIDEO = "input.mp4"
    SEGMENTS = [
        {"start": 0.0, "end": 10.5},
        {"start": 12.3, "end": 25.0},
        # ... add your segments
    ]
    main(VIDEO, SEGMENTS, "output_clean.mp4", crf=18)

AI Analysis Prompt Templates

Basic Cleanup (Filler Words Only)

Remove filler words from this transcript. Return segments to KEEP.

Filler words to remove: um, uh, like, you know, basically, actually, so, right, I mean

TRANSCRIPT SEGMENTS:
{segments}

Return JSON: [{"start": float, "end": float, "text": "cleaned text"}, ...]

Aggressive Cleanup (Podcast/Interview)

Clean this podcast transcript for a tight, professional edit.

REMOVE:
- All filler words (um, uh, like, you know, basically, so, right)
- False starts and restarts
- Pauses longer than 1 second
- Repetitions
- Off-topic tangents
- "That's a great question" type filler responses
- Excessive laughter/reactions (keep some for naturalness)

KEEP:
- Core content and insights
- Natural transitions
- Important reactions that add context

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP with cleaned text.

Light Cleanup (Preserve Natural Feel)

Lightly clean this transcript while preserving natural speech patterns.

ONLY REMOVE:
- "Um" and "uh" when standalone (not part of thinking pause)
- Obvious mistakes followed by corrections
- Technical issues (coughs, phone rings, etc.)

PRESERVE:
- Natural "like" and "you know" that add personality
- Thinking pauses that feel authentic
- Personality quirks

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP.

Transcript Format Reference

Whisper JSON Output

{
  "text": "Full transcript text...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": " Welcome to this video.",
      "tokens": [50364, 5765, ...],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 2.5,
      "end": 5.8,
      "text": " Um, so today we're going to...",
      ...
    }
  ],
  "language": "en"
}

Keep Segments Format (for FFmpeg)

[
  { "start": 0.0, "end": 2.5, "text": "Welcome to this video." },
  { "start": 3.2, "end": 5.8, "text": "Today we're going to..." }
]

Advanced: Word-Level Timestamps

For precise filler word removal, use word-level timestamps:

# Whisper with word timestamps
whisper video.mp4 --model medium --word_timestamps True --output_format json

This gives you:

{
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Um welcome to this video",
      "words": [
        { "word": "Um", "start": 0.0, "end": 0.3 },
        { "word": "welcome", "start": 0.5, "end": 0.9 },
        { "word": "to", "start": 0.9, "end": 1.0 },
        { "word": "this", "start": 1.0, "end": 1.2 },
        { "word": "video", "start": 1.2, "end": 1.6 }
      ]
    }
  ]
}

Now you can cut precisely around "Um" (0.0-0.3) and keep "welcome to this video" (0.5-1.6).

Troubleshooting

Frozen Frames at Cut Points (MOST COMMON)

Cause: Using -c copy which can only cut at keyframes.

Solution: Always re-encode with -c:v libx264 -crf 18 (see examples above).

Audio/Video Sync Issues

Add these flags when extracting segments:

ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \  # Fix negative timestamps
  -async 1 \                       # Sync audio to video
  segment.mp4

Cuts Sound Abrupt

Add audio fade in/out to each segment:

ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -af "afade=t=in:st=0:d=0.05,afade=t=out:st=4.95:d=0.05" \
  -c:a aac segment.mp4

Large Files Take Forever

Use -preset fast or -preset veryfast (trades quality for speed)
Extract audio first for transcription (much smaller)
Use Whisper API instead of local model
Process in parallel (multiple segments at once)

# Faster encoding (slightly lower quality)
ffmpeg ... -preset veryfast -crf 20 ...

# Even faster for previews
ffmpeg ... -preset ultrafast -crf 23 ...

Whisper Misses Words

Use --model large for better accuracy
Use --language en to force English
Normalize audio first:

ffmpeg -i video.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy normalized.mp4

File Size Too Large After Re-encoding

Increase CRF value (higher = smaller file, lower quality):

# Original quality (large)
-crf 18

# Good quality (medium)
-crf 22

# Acceptable quality (small)
-crf 26

Integration with OpenCode

When using this skill in OpenCode:

Extract audio (faster transcription):

ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 temp_audio.mp3 -y

Transcribe with Whisper:

whisper temp_audio.mp3 --model medium --output_format json --output_dir ./

Read transcript.json and analyze segments
Identify segments to KEEP based on:
- Removing filler words (um, uh, like, you know)
- Removing long pauses (>1.5s gaps)
- Removing false starts and repetitions
- For "shorts style": Keep only hook + key points + CTA
Re-encode and concatenate (MUST re-encode, never -c copy):
```
# Use the Python script above with crf=18 for quality
```

Generate subtitles for final video:

whisper output.mp4 --model medium --output_format srt

Report results with before/after stats

Quality Settings Reference

Use Case	CRF	Preset	Notes
Archive/Master	15-17	slow	Near lossless, large files
YouTube/Vimeo	18-20	medium	High quality, recommended
Social Media	21-23	fast	Good quality, smaller
Preview/Draft	24-28	veryfast	Quick renders

Anti-Patterns (DO NOT DO)

# WRONG: -c copy causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# WRONG: -to instead of -t with -ss before -i
ffmpeg -ss 10 -i video.mp4 -to 15 ...  # -to is absolute, not relative

# WRONG: Missing timestamp fix flags
ffmpeg ... -c:v libx264 ...  # Missing -avoid_negative_ts

About

SKILL.md

About

Transcribe video, analyze subtitles with AI, and cut video by removing filler words, pauses, and mistakes

SKILL.md

What I Do

Automate video editing by:

Transcribing video to timestamped subtitles (Whisper)
Analyzing transcript with AI to identify cuts (filler words, pauses, mistakes)
Generating FFmpeg commands to cut and concatenate clean segments
Generating subtitles (SRT) for the final video

CRITICAL: Always Re-encode (Never Use `-c copy`)

The #1 mistake is using -c copy for cutting. This causes:

Frozen frames at cut points (1-8 seconds of freeze)
Audio/video sync issues
Glitchy playback

Why? H.264 video uses keyframes (I-frames) every 2-10 seconds. -c copy can only cut at keyframes, so FFmpeg includes extra frames that display as frozen.

Solution: Always re-encode segments with quality settings:

# WRONG - causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# CORRECT - smooth cuts at any timestamp
ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -preset fast -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \
  segment.mp4

Quality presets (CRF = Constant Rate Factor):

crf 15-17 = Near lossless (large files)
crf 18-20 = High quality (recommended)
crf 21-23 = Good quality (smaller files)
crf 24-28 = Medium quality (much smaller)

Prerequisites

# Install Whisper (choose one)
pip install openai-whisper          # Local (requires Python 3.9+)
# OR use OpenAI API (no local install needed)

# Install FFmpeg
brew install ffmpeg                  # macOS
sudo apt install ffmpeg              # Linux

Quick Start

Step 1: Transcribe Video

Option A: Local Whisper (free, slower)

whisper video.mp4 --model medium --output_format json --output_dir ./

Option B: OpenAI Whisper API (fast, paid)

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file="@video.mp4" \
  -F model="whisper-1" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="segment" \
  > transcript.json

Option C: Use ffmpeg to extract audio first (for large files)

# Extract audio (much smaller file to upload)
ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 audio.mp3

# Then transcribe the audio
whisper audio.mp3 --model medium --output_format json

Step 2: Analyze Transcript for Cuts

Feed the transcript to the AI with this prompt:

Analyze this video transcript and identify segments to CUT (remove).

TRANSCRIPT:
{paste transcript.json segments here}

Identify these issues:
1. FILLER WORDS: "um", "uh", "like", "you know", "basically", "actually", "so", "right"
2. FALSE STARTS: Incomplete sentences that restart ("I think— actually, let me...")
3. LONG PAUSES: Gaps > 1.5 seconds between segments
4. REPETITIONS: Same word/phrase repeated ("really really really")
5. CORRECTIONS: "Wait, I meant...", "Sorry, let me rephrase..."
6. TANGENTS: Off-topic rambling (use judgment)

Return a JSON array of segments to KEEP (not cut):
[
  {"start": 0.0, "end": 2.5, "text": "Welcome to this video"},
  {"start": 3.1, "end": 8.4, "text": "Today we're going to cover..."},
  ...
]

Rules:
- Merge adjacent keep segments if gap < 0.3s
- Ensure cuts don't happen mid-word (check word boundaries)
- Preserve natural speech rhythm (don't over-cut)
- When in doubt, keep the segment

Step 3: Generate FFmpeg Commands (High Quality)

Once you have the keep segments, use this Python script for smooth cuts:

import json
import subprocess
import os

VIDEO_INPUT = "video.mp4"
VIDEO_OUTPUT = "video_clean.mp4"
SEGMENTS_FILE = "keep_segments.json"

with open(SEGMENTS_FILE) as f:
    segments = json.load(f)

segment_files = []
for i, seg in enumerate(segments):
    outfile = f"temp_seg_{i:04d}.mp4"
    segment_files.append(outfile)

    # MUST re-encode for smooth cuts (no -c copy!)
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(seg['start']),      # Seek BEFORE input (fast)
        '-i', VIDEO_INPUT,
        '-t', str(seg['end'] - seg['start']),  # Duration
        '-c:v', 'libx264',
        '-preset', 'fast',              # fast/medium/slow
        '-crf', '18',                   # Quality (lower = better, 15-23 recommended)
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',  # Fix timestamp issues
        '-async', '1',                  # Sync audio
        outfile
    ]
    subprocess.run(cmd, capture_output=True)
    print(f"✓ Segment {i+1}/{len(segments)}")

# Create concat file
with open('temp_concat.txt', 'w') as f:
    for sf in segment_files:
        f.write(f"file '{sf}'\n")

# Concatenate (can use -c copy here since all segments match)
subprocess.run([
    'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
    '-i', 'temp_concat.txt',
    '-c', 'copy',
    VIDEO_OUTPUT
])

# Cleanup
for sf in segment_files:
    os.remove(sf)
os.remove('temp_concat.txt')
print(f"✓ Created: {VIDEO_OUTPUT}")

Key flags explained:

-ss before -i: Fast seek (doesn't decode entire video)
-t: Duration of segment (not end time)
-crf 18: High quality encoding
-avoid_negative_ts make_zero: Fixes concat timestamp issues
-async 1: Keeps audio in sync

Step 4: Generate Subtitles

After creating the final video, generate fresh subtitles with Whisper:

# Generate SRT subtitles for the cleaned video
whisper video_clean.mp4 --model medium --output_format srt --output_dir ./

# For higher accuracy (slower):
whisper video_clean.mp4 --model large --output_format srt --language en

# Output: video_clean.srt

Burn subtitles into video (optional):

# Embed subtitles permanently
ffmpeg -i video_clean.mp4 -vf "subtitles=video_clean.srt:force_style='FontSize=24,FontName=Arial,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2'" -c:a copy video_with_subs.mp4

Subtitle styling options:

FontSize=24 - Text size
FontName=Arial - Font face
PrimaryColour=&HFFFFFF - White text (BGR format)
OutlineColour=&H000000 - Black outline
Outline=2 - Outline thickness
MarginV=50 - Distance from bottom

Complete Workflow Script (High Quality)

#!/usr/bin/env python3
"""
video_clean.py - Clean up video by removing filler words/pauses
Uses re-encoding for smooth cuts (no freeze frames)
"""

import json
import subprocess
import os
import sys

def get_duration(filepath):
    """Get video duration in seconds"""
    result = subprocess.run([
        'ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', filepath
    ], capture_output=True, text=True)
    return float(json.loads(result.stdout)['format']['duration'])

def extract_segment(input_file, start, end, output_file, crf=18, preset='fast'):
    """Extract a segment with re-encoding for smooth cuts"""
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(start),
        '-i', input_file,
        '-t', str(end - start),
        '-c:v', 'libx264',
        '-preset', preset,
        '-crf', str(crf),
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',
        '-async', '1',
        output_file
    ]
    return subprocess.run(cmd, capture_output=True, text=True)

def concatenate_segments(segment_files, output_file):
    """Concatenate segments into final video"""
    with open('temp_concat.txt', 'w') as f:
        for sf in segment_files:
            f.write(f"file '{sf}'\n")

    subprocess.run([
        'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
        '-i', 'temp_concat.txt',
        '-c', 'copy',
        output_file
    ], capture_output=True)

    os.remove('temp_concat.txt')

def generate_subtitles(video_file, model='medium'):
    """Generate SRT subtitles using Whisper"""
    subprocess.run([
        'whisper', video_file,
        '--model', model,
        '--output_format', 'srt',
        '--output_dir', './'
    ])

def main(video_input, segments, output_name, crf=18):
    """Main workflow"""
    segment_files = []

    print(f"\n{'='*50}")
    print(f"Processing: {video_input}")
    print(f"Quality: CRF {crf} (lower=better, 15-23 recommended)")
    print(f"{'='*50}\n")

    # Extract segments with re-encoding
    for i, seg in enumerate(segments):
        outfile = f"temp_seg_{i:04d}.mp4"
        segment_files.append(outfile)

        result = extract_segment(video_input, seg['start'], seg['end'], outfile, crf)
        if result.returncode == 0:
            duration = seg['end'] - seg['start']
            print(f"✓ Segment {i+1}/{len(segments)}: {duration:.1f}s")
        else:
            print(f"✗ Error on segment {i+1}")
            print(result.stderr[-500:])

    # Concatenate
    print("\nConcatenating segments...")
    concatenate_segments(segment_files, output_name)

    # Cleanup temp segments
    for sf in segment_files:
        os.remove(sf)

    # Generate subtitles
    print("\nGenerating subtitles...")
    generate_subtitles(output_name)

    # Stats
    orig_duration = get_duration(video_input)
    new_duration = get_duration(output_name)
    orig_size = os.path.getsize(video_input) / (1024*1024)
    new_size = os.path.getsize(output_name) / (1024*1024)

    print(f"\n{'='*50}")
    print(f"COMPLETE")
    print(f"{'='*50}")
    print(f"Original:  {orig_duration:.0f}s | {orig_size:.1f} MB")
    print(f"Output:    {new_duration:.0f}s | {new_size:.1f} MB")
    print(f"Removed:   {orig_duration - new_duration:.0f}s ({((orig_duration - new_duration)/orig_duration)*100:.0f}%)")
    print(f"Video:     {output_name}")
    print(f"Subtitles: {output_name.replace('.mp4', '.srt')}")

if __name__ == '__main__':
    # Example usage
    VIDEO = "input.mp4"
    SEGMENTS = [
        {"start": 0.0, "end": 10.5},
        {"start": 12.3, "end": 25.0},
        # ... add your segments
    ]
    main(VIDEO, SEGMENTS, "output_clean.mp4", crf=18)

AI Analysis Prompt Templates

Basic Cleanup (Filler Words Only)

Remove filler words from this transcript. Return segments to KEEP.

Filler words to remove: um, uh, like, you know, basically, actually, so, right, I mean

TRANSCRIPT SEGMENTS:
{segments}

Return JSON: [{"start": float, "end": float, "text": "cleaned text"}, ...]

Aggressive Cleanup (Podcast/Interview)

Clean this podcast transcript for a tight, professional edit.

REMOVE:
- All filler words (um, uh, like, you know, basically, so, right)
- False starts and restarts
- Pauses longer than 1 second
- Repetitions
- Off-topic tangents
- "That's a great question" type filler responses
- Excessive laughter/reactions (keep some for naturalness)

KEEP:
- Core content and insights
- Natural transitions
- Important reactions that add context

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP with cleaned text.

Light Cleanup (Preserve Natural Feel)

Lightly clean this transcript while preserving natural speech patterns.

ONLY REMOVE:
- "Um" and "uh" when standalone (not part of thinking pause)
- Obvious mistakes followed by corrections
- Technical issues (coughs, phone rings, etc.)

PRESERVE:
- Natural "like" and "you know" that add personality
- Thinking pauses that feel authentic
- Personality quirks

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP.

Transcript Format Reference

Whisper JSON Output

{
  "text": "Full transcript text...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": " Welcome to this video.",
      "tokens": [50364, 5765, ...],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 2.5,
      "end": 5.8,
      "text": " Um, so today we're going to...",
      ...
    }
  ],
  "language": "en"
}

Keep Segments Format (for FFmpeg)

[
  { "start": 0.0, "end": 2.5, "text": "Welcome to this video." },
  { "start": 3.2, "end": 5.8, "text": "Today we're going to..." }
]

Advanced: Word-Level Timestamps

For precise filler word removal, use word-level timestamps:

# Whisper with word timestamps
whisper video.mp4 --model medium --word_timestamps True --output_format json

This gives you:

{
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Um welcome to this video",
      "words": [
        { "word": "Um", "start": 0.0, "end": 0.3 },
        { "word": "welcome", "start": 0.5, "end": 0.9 },
        { "word": "to", "start": 0.9, "end": 1.0 },
        { "word": "this", "start": 1.0, "end": 1.2 },
        { "word": "video", "start": 1.2, "end": 1.6 }
      ]
    }
  ]
}

Now you can cut precisely around "Um" (0.0-0.3) and keep "welcome to this video" (0.5-1.6).

Troubleshooting

Frozen Frames at Cut Points (MOST COMMON)

Cause: Using -c copy which can only cut at keyframes.

Solution: Always re-encode with -c:v libx264 -crf 18 (see examples above).

Audio/Video Sync Issues

Add these flags when extracting segments:

ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \  # Fix negative timestamps
  -async 1 \                       # Sync audio to video
  segment.mp4

Cuts Sound Abrupt

Add audio fade in/out to each segment:

ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -af "afade=t=in:st=0:d=0.05,afade=t=out:st=4.95:d=0.05" \
  -c:a aac segment.mp4

Large Files Take Forever

Use -preset fast or -preset veryfast (trades quality for speed)
Extract audio first for transcription (much smaller)
Use Whisper API instead of local model
Process in parallel (multiple segments at once)

# Faster encoding (slightly lower quality)
ffmpeg ... -preset veryfast -crf 20 ...

# Even faster for previews
ffmpeg ... -preset ultrafast -crf 23 ...

Whisper Misses Words

Use --model large for better accuracy
Use --language en to force English
Normalize audio first:

ffmpeg -i video.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy normalized.mp4

File Size Too Large After Re-encoding

Increase CRF value (higher = smaller file, lower quality):

# Original quality (large)
-crf 18

# Good quality (medium)
-crf 22

# Acceptable quality (small)
-crf 26

Integration with OpenCode

When using this skill in OpenCode:

Extract audio (faster transcription):

ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 temp_audio.mp3 -y

Transcribe with Whisper:

whisper temp_audio.mp3 --model medium --output_format json --output_dir ./

Read transcript.json and analyze segments
Identify segments to KEEP based on:
- Removing filler words (um, uh, like, you know)
- Removing long pauses (>1.5s gaps)
- Removing false starts and repetitions
- For "shorts style": Keep only hook + key points + CTA
Re-encode and concatenate (MUST re-encode, never -c copy):
```
# Use the Python script above with crf=18 for quality
```

Generate subtitles for final video:

whisper output.mp4 --model medium --output_format srt

Report results with before/after stats

Quality Settings Reference

Use Case	CRF	Preset	Notes
Archive/Master	15-17	slow	Near lossless, large files
YouTube/Vimeo	18-20	medium	High quality, recommended
Social Media	21-23	fast	Good quality, smaller
Preview/Draft	24-28	veryfast	Quick renders

Anti-Patterns (DO NOT DO)

# WRONG: -c copy causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# WRONG: -to instead of -t with -ss before -i
ffmpeg -ss 10 -i video.mp4 -to 15 ...  # -to is absolute, not relative

# WRONG: Missing timestamp fix flags
ffmpeg ... -c:v libx264 ...  # Missing -avoid_negative_ts

video-subtitle-cutter

About

SKILL.md

video-subtitle-cutter

About

SKILL.md

What I Do

CRITICAL: Always Re-encode (Never Use -c copy)

Prerequisites

Quick Start

Step 1: Transcribe Video

Step 2: Analyze Transcript for Cuts

Step 3: Generate FFmpeg Commands (High Quality)

Step 4: Generate Subtitles

Complete Workflow Script (High Quality)

AI Analysis Prompt Templates

Basic Cleanup (Filler Words Only)

Aggressive Cleanup (Podcast/Interview)

Light Cleanup (Preserve Natural Feel)

Transcript Format Reference

Whisper JSON Output

Keep Segments Format (for FFmpeg)

Advanced: Word-Level Timestamps

Troubleshooting

Frozen Frames at Cut Points (MOST COMMON)

Audio/Video Sync Issues

Cuts Sound Abrupt

Large Files Take Forever

Whisper Misses Words

File Size Too Large After Re-encoding

Integration with OpenCode

Quality Settings Reference

Anti-Patterns (DO NOT DO)

About

SKILL.md

About

SKILL.md

What I Do

CRITICAL: Always Re-encode (Never Use -c copy)

Prerequisites

Quick Start

Step 1: Transcribe Video

Step 2: Analyze Transcript for Cuts

Step 3: Generate FFmpeg Commands (High Quality)

Step 4: Generate Subtitles

Complete Workflow Script (High Quality)

AI Analysis Prompt Templates

Basic Cleanup (Filler Words Only)

Aggressive Cleanup (Podcast/Interview)

Light Cleanup (Preserve Natural Feel)

Transcript Format Reference

Whisper JSON Output

Keep Segments Format (for FFmpeg)

Advanced: Word-Level Timestamps

Troubleshooting

Frozen Frames at Cut Points (MOST COMMON)

Audio/Video Sync Issues

Cuts Sound Abrupt

Large Files Take Forever

Whisper Misses Words

File Size Too Large After Re-encoding

Integration with OpenCode

Quality Settings Reference

Anti-Patterns (DO NOT DO)

CRITICAL: Always Re-encode (Never Use `-c copy`)

CRITICAL: Always Re-encode (Never Use `-c copy`)