ffmpeg-captions-subtitles

JosiahSiegel/ffmpeg-captions-subtitles

Coding

2 installs

About

SKILL.md

ffmpeg-captions-subtitles

JosiahSiegel/ffmpeg-captions-subtitles

Coding

2 installs

About

Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20)...

SKILL.md

name: ffmpeg-captions-subtitles description: Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) Burning subtitles (hardcoding SRT/ASS/VTT), (2) Adding soft subtitle tracks, (3) Extracting subtitles from video, (4) Subtitle format conversion, (5) Styled captions (font, color, outline, shadow), (6) Subtitle positioning and alignment, (7) CEA-608/708 closed captions, (8) Text overlays with drawtext, (9) Whisper AI automatic transcription (FFmpeg 8.0+ with VAD, multi-language, GPU), (10) Batch subtitle processing. Provides: Format reference tables, styling parameter guide, position alignment charts, Whisper model comparison, VAD configuration, dynamic text examples, accessibility best practices. Ensures: Professional captions with proper styling and accessibility compliance.

CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).

Documentation Guidelines

NEVER create new documentation files unless explicitly requested by the user.

Quick Reference

Task	Command
Burn SRT	`ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4`
Burn ASS	`ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4`
Add soft sub	`ffmpeg -i video.mp4 -i subs.srt -c copy -c:s mov_text output.mp4`
Extract sub	`ffmpeg -i video.mkv -map 0:s:0 output.srt`
Style subs	`-vf "subtitles=s.srt:force_style='FontSize=24,PrimaryColour=&HFFFFFF'"`
Text overlay	`-vf "drawtext=text='Hello':x=10:y=10:fontsize=24:fontcolor=white"`

Format	Extension	Best For
SRT	.srt	Simple, universal
ASS	.ass	Styled, animated, anime
VTT	.vtt	Web/HTML5 video

When to Use This Skill

Use for subtitle and caption operations:

Hardcoding (burning) subtitles into video
Adding soft subtitle tracks to containers
Extracting subtitles from MKV/MP4
Styling captions (font, color, position)
Dynamic text overlays

FFmpeg Captions and Subtitles (2025)

Complete guide to working with subtitles, closed captions, and text overlays using FFmpeg.

Subtitle Format Reference

Supported Formats

Format	Extension	Features	Use Case
SubRip	.srt	Simple timing + text	Universal, web
Advanced SubStation Alpha	.ass/.ssa	Rich styling, positioning, effects	Anime, styled subs
WebVTT	.vtt	Web standard, cues, styling	HTML5 video
TTML/DFXP	.ttml/.dfxp	Broadcast, accessibility	Streaming services
MOV Text	.mov (embedded)	QuickTime native	Apple ecosystem
DVB Subtitle	(embedded)	Bitmap-based	European broadcast
PGS	.sup	Blu-ray bitmap subtitles	Blu-ray
CEA-608/708	(embedded)	Closed captions	US broadcast, streaming

Format Characteristics

SRT (SubRip):
- Simple text-based format
- Supports basic HTML tags (<b>, <i>, <u>)
- Widely compatible
- No positioning or advanced styling

ASS/SSA:
- Advanced styling (fonts, colors, outlines)
- Precise positioning anywhere on screen
- Animation and effects support
- Karaoke timing

WebVTT:
- HTML5 standard format
- CSS-like styling
- Cue settings for positioning
- Speaker identification

Burning Subtitles (Hardcoding)

Basic Subtitle Burn-in

# Burn SRT subtitles
ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4

# Burn ASS/SSA subtitles (preserves styling)
ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4

# Burn subtitles from MKV container
ffmpeg -i video.mkv -vf "subtitles=video.mkv" output.mp4

# Burn specific subtitle track (index 0)
ffmpeg -i video.mkv -vf "subtitles=video.mkv:si=0" output.mp4

# Burn subtitles with stream index
ffmpeg -i video.mkv -vf "subtitles=video.mkv:stream_index=1" output.mp4

Styled Subtitle Burn-in

# Force style (overrides subtitle styling)
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1'" \
  output.mp4

# Yellow subtitles with black outline
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=28,PrimaryColour=&H00FFFF,OutlineColour=&H000000,Outline=3'" \
  output.mp4

# Larger font for accessibility
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=36,Bold=1'" \
  output.mp4

ASS Style Parameters

Parameter	Description	Example
FontName	Font family	`FontName=Arial`
FontSize	Size in points	`FontSize=24`
PrimaryColour	Text color (AABBGGRR)	`PrimaryColour=&HFFFFFF`
SecondaryColour	Karaoke color	`SecondaryColour=&H00FFFF`
OutlineColour	Outline/border color	`OutlineColour=&H000000`
BackColour	Shadow/background color	`BackColour=&H80000000`
Bold	Bold text (0/1)	`Bold=1`
Italic	Italic text (0/1)	`Italic=1`
Underline	Underlined text (0/1)	`Underline=1`
Outline	Outline width	`Outline=2`
Shadow	Shadow depth	`Shadow=1`
Alignment	Position (numpad style)	`Alignment=2`
MarginL/R/V	Margins in pixels	`MarginV=50`

Color Format (ASS)

ASS uses &HAABBGGRR format (Alpha, Blue, Green, Red):
- White: &HFFFFFF or &H00FFFFFF
- Black: &H000000 or &H00000000
- Yellow: &H00FFFF (00-Blue, FF-Green, FF-Red)
- Red: &H0000FF
- Blue: &HFF0000
- 50% transparent black: &H80000000

Adding Subtitle Tracks (Soft Subs)

Embed SRT as Track

# Add SRT to MP4 (MOV text)
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s mov_text \
  output.mp4

# Add SRT to MKV
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s srt \
  output.mkv

# Add SRT to WebM (WebVTT)
ffmpeg -i video.webm -i subs.srt \
  -c copy -c:s webvtt \
  output.webm

Multiple Subtitle Tracks

# Add multiple languages
ffmpeg -i video.mp4 -i subs_en.srt -i subs_es.srt -i subs_fr.srt \
  -map 0:v -map 0:a -map 1 -map 2 -map 3 \
  -c copy -c:s mov_text \
  -metadata:s:s:0 language=eng -metadata:s:s:0 title="English" \
  -metadata:s:s:1 language=spa -metadata:s:s:1 title="Spanish" \
  -metadata:s:s:2 language=fra -metadata:s:s:2 title="French" \
  output.mp4

# Add ASS subtitles to MKV (preserves styling)
ffmpeg -i video.mp4 -i styled.ass \
  -map 0 -map 1 \
  -c copy -c:s ass \
  output.mkv

Set Default Subtitle Track

# Set subtitle as default
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s mov_text \
  -disposition:s:0 default \
  output.mp4

# Set forced subtitles (always display)
ffmpeg -i video.mp4 -i forced.srt \
  -c copy -c:s mov_text \
  -disposition:s:0 forced \
  output.mp4

Extracting Subtitles

Extract to External File

# Extract first subtitle track to SRT
ffmpeg -i video.mkv -map 0:s:0 output.srt

# Extract specific subtitle stream
ffmpeg -i video.mkv -map 0:s:1 output.srt

# Extract to ASS format
ffmpeg -i video.mkv -map 0:s:0 output.ass

# Extract to WebVTT
ffmpeg -i video.mkv -map 0:s:0 output.vtt

# Extract all subtitle tracks
ffmpeg -i video.mkv -map 0:s subs_%d.srt

List Available Subtitle Tracks

# Show all streams including subtitles
ffprobe -v error -show_entries stream=index,codec_name,codec_type:stream_tags=language,title \
  -of csv=p=0 video.mkv

# Show only subtitle streams
ffprobe -v error -select_streams s \
  -show_entries stream=index,codec_name:stream_tags=language,title \
  -of csv=p=0 video.mkv

Convert Subtitle Formats

# SRT to ASS
ffmpeg -i subs.srt subs.ass

# ASS to SRT (loses styling)
ffmpeg -i subs.ass subs.srt

# SRT to WebVTT
ffmpeg -i subs.srt subs.vtt

# WebVTT to SRT
ffmpeg -i subs.vtt subs.srt

Text Overlays (drawtext)

Basic Text Overlay

# Simple text overlay
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Hello World':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Centered text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Centered Text':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white" \
  output.mp4

# Text at bottom center
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Bottom Text':x=(w-tw)/2:y=h-th-20:fontsize=32:fontcolor=white" \
  output.mp4

Styled Text Overlays

# Text with background box
ffmpeg -i video.mp4 \
  -vf "drawtext=text='With Background':x=10:y=10:fontsize=24:fontcolor=white:box=1:boxcolor=black@0.5:boxborderw=5" \
  output.mp4

# Text with outline (border)
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Outlined Text':x=10:y=10:fontsize=48:fontcolor=white:borderw=3:bordercolor=black" \
  output.mp4

# Shadow effect
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Shadow Text':x=10:y=10:fontsize=36:fontcolor=white:shadowcolor=black:shadowx=3:shadowy=3" \
  output.mp4

# Custom font
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Custom Font':fontfile=/path/to/font.ttf:fontsize=24:fontcolor=white" \
  output.mp4

Dynamic Text

# Display timestamp
ffmpeg -i video.mp4 \
  -vf "drawtext=text='%{pts\:hms}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Display frame number
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Frame\: %{n}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Display current date/time
ffmpeg -i video.mp4 \
  -vf "drawtext=text='%{localtime\:%Y-%m-%d %H\\\:%M\\\:%S}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Text from file
ffmpeg -i video.mp4 \
  -vf "drawtext=textfile=message.txt:x=10:y=10:fontsize=24:fontcolor=white:reload=1" \
  output.mp4

Animated Text

# Scrolling text (credits)
ffmpeg -i video.mp4 \
  -vf "drawtext=textfile=credits.txt:x=(w-tw)/2:y=h-t*50:fontsize=24:fontcolor=white" \
  output.mp4

# Fade in text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Fade In':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white:alpha='if(lt(t,1),t,1)'" \
  output.mp4

# Text appears at specific time (3 seconds)
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Appears at 3s':x=10:y=10:fontsize=24:fontcolor=white:enable='gte(t,3)'" \
  output.mp4

# Text visible between 2-5 seconds
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Visible 2-5s':x=10:y=10:fontsize=24:fontcolor=white:enable='between(t,2,5)'" \
  output.mp4

# Blinking text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='BLINK':x=10:y=10:fontsize=36:fontcolor=red:alpha='if(mod(floor(t*2),2),1,0)'" \
  output.mp4

Whisper AI Integration (FFmpeg 8.0+)

FFmpeg 8.0 integrates OpenAI's Whisper speech recognition model via whisper.cpp, enabling fully local and offline automatic transcription and subtitle generation.

Features

Speech to Text: High accuracy transcription
Multi-language: 99 languages supported
Flexible Output: SRT subtitles, JSON, or plain text
GPU Acceleration: Uses system GPU by default
Voice Activity Detection: Silero VAD support for better segmentation
Real-time: Works with live audio streams

Generate SRT Subtitles

# Generate subtitles from video using Whisper
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=auto:queue=3:destination=output.srt:format=srt" \
  -f null -

# Specify language for better accuracy
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:destination=english.srt:format=srt" \
  -f null -

# Use larger model for higher quality
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-medium.bin:language=auto:destination=output.srt:format=srt" \
  -f null -

Live Transcription

# Live transcription from microphone (Linux PulseAudio)
ffmpeg -loglevel warning -f pulse -i default \
  -af "highpass=f=200,lowpass=f=3000,whisper=model=ggml-medium-q5_0.bin:language=en:queue=10:destination=-:format=json:vad_model=for-tests-silero-v5.1.2-ggml.bin" \
  -f null -

# Live transcription from microphone (Windows)
ffmpeg -loglevel warning -f dshow -i audio="Microphone" \
  -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
  -f null -

# Live transcription from microphone (macOS)
ffmpeg -loglevel warning -f avfoundation -i ":0" \
  -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
  -f null -

Display Live Subtitles on Video

# Whisper writes to frame metadata, drawtext reads it
ffmpeg -i input.mp4 \
  -af "whisper=model=ggml-base.en.bin:language=en" \
  -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=24:fontcolor=white:x=10:y=h-th-10:box=1:boxcolor=black@0.5:boxborderw=5" \
  output_with_subtitles.mp4

# Centered subtitles with styling
ffmpeg -i input.mp4 \
  -af "whisper=model=ggml-base.bin:language=auto" \
  -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=32:fontcolor=white:x=(w-tw)/2:y=h-th-50:borderw=2:bordercolor=black" \
  output.mp4

JSON Output for Post-Processing

# Generate JSON for custom processing
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=auto:destination=output.json:format=json" \
  -f null -

With Voice Activity Detection (VAD)

# Use Silero VAD for better speech detection
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=for-tests-silero-v5.1.2-ggml.bin:vad_threshold=0.5" \
  -f null -

# VAD parameters for noisy audio
ffmpeg -i noisy_video.mp4 -vn \
  -af "whisper=model=ggml-medium.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=silero.bin:vad_threshold=0.6:vad_min_speech_duration=0.2:vad_min_silence_duration=0.3" \
  -f null -

Disable GPU (CPU-only processing)

# Force CPU processing (slower but works without GPU)
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:use_gpu=false:destination=output.srt:format=srt" \
  -f null -

Whisper Models

Model	Size	Speed	Quality	VRAM	Recommended For
tiny	39 MB	Fastest	Basic	~1 GB	Quick previews, low-resource
base	74 MB	Fast	Good	~1 GB	General use, balanced
small	244 MB	Medium	Better	~2 GB	Higher accuracy
medium	769 MB	Slow	High	~5 GB	Quality-critical
large	1.55 GB	Slowest	Best	~10 GB	Maximum accuracy

Whisper Filter Parameters

Parameter	Description	Default
`model`	Path to GGML model file	Required
`language`	Language code or "auto"	"auto"
`format`	Output format: "text", "srt", "json"	"text"
`destination`	Output file path or "-" for stdout	Required
`queue`	Buffer size in seconds	3
`vad_model`	Path to Silero VAD model	None
`vad_threshold`	VAD sensitivity (0-1)	0.5
`vad_min_speech_duration`	Min speech duration (seconds)	0.1
`vad_min_silence_duration`	Min silence duration (seconds)	0.5
`use_gpu`	Enable GPU acceleration	true

Model Download

Download GGML models from the whisper.cpp project:

# Download base model
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

# Download medium model for better quality
curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin

CEA-608/708 Closed Captions

Extract Closed Captions

# Extract CEA-608 captions from ATSC stream
ffmpeg -f lavfi -i "movie=broadcast.ts[out0+subcc]" -map 0:1 captions.srt

# Extract from video with embedded CC
ffmpeg -i video_with_cc.mp4 \
  -filter_complex "[0:v]format=yuv420p[v];[0:v]crop=1:1:0:0[c]" \
  -map "[c]" -c:v libx264 -f null - 2>&1 | grep -A 1 "Closed caption"

Add CEA-608 Captions

# Embed CEA-608 captions (requires eia608 line)
ffmpeg -i video.mp4 -i captions.scc \
  -c:v libx264 -c:a copy \
  -vf "movie=captions.scc[captions];[0:v][captions]overlay" \
  output.mp4

Subtitle Positioning

ASS Alignment Values

7 (top-left)     8 (top-center)     9 (top-right)
4 (mid-left)     5 (mid-center)     6 (mid-right)
1 (bottom-left)  2 (bottom-center)  3 (bottom-right)

Position Examples

# Top center subtitles
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=8,MarginV=20'" \
  output.mp4

# Left-aligned subtitles
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=1,MarginL=50,MarginV=30'" \
  output.mp4

# Right side for speaker identification
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=3,MarginR=50'" \
  output.mp4

Multiple Subtitle Positions

# Two subtitle tracks at different positions
ffmpeg -i video.mp4 \
  -vf "[in]subtitles=speaker1.srt:force_style='Alignment=1,MarginL=50'[tmp];\
       [tmp]subtitles=speaker2.srt:force_style='Alignment=3,MarginR=50'" \
  output.mp4

Batch Processing

Burn Subtitles to Multiple Videos

#!/bin/bash
# burn_subs_batch.sh

for video in *.mp4; do
    base="${video%.mp4}"
    if [ -f "${base}.srt" ]; then
        ffmpeg -i "$video" \
          -vf "subtitles=${base}.srt:force_style='FontSize=24,Outline=2'" \
          -c:a copy \
          "output/${base}_subbed.mp4"
    fi
done

Convert Subtitle Format Batch

#!/bin/bash
# convert_subs.sh

for srt in *.srt; do
    base="${srt%.srt}"
    ffmpeg -i "$srt" "${base}.vtt"
done

Troubleshooting

Common Issues

"Unable to find a suitable output format"

# Specify output format explicitly
ffmpeg -i video.mkv -map 0:s:0 -f srt output.srt

Garbled characters in subtitles

# Force UTF-8 encoding
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:charenc=UTF-8" \
  output.mp4

Font not found

# Specify fonts directory
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:fontsdir=/path/to/fonts" \
  output.mp4

# List available fonts
fc-list : family | sort | uniq

Subtitle timing offset

# Delay subtitles by 2 seconds
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:itsoffset=2" \
  output.mp4

# Or use setpts to adjust
ffmpeg -i subs.srt -itsoffset 2 delayed.srt

Subtitle not showing on high-res video

# Scale subtitle rendering to video resolution
ffmpeg -i video_4k.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=48,Outline=3'" \
  output.mp4

Verification Commands

# Check if subtitles are present
ffprobe -v error -select_streams s -show_entries stream=codec_name -of default=nw=1 video.mp4

# Count subtitle lines
grep -c "^[0-9]" subs.srt

# Validate SRT format
ffmpeg -i subs.srt -f null -

Best Practices

Accessibility

Use high contrast colors (white on dark, yellow on dark)
Minimum font size of 24pt for standard video
Include speaker identification for multiple speakers
Position subtitles to avoid obscuring important visual content
Keep lines short (42 characters max per line)
Display duration: minimum 1 second, maximum 7 seconds per caption

Quality

Use ASS format for styled subtitles (anime, music videos)
Use SRT for simple dialogue
Use WebVTT for web delivery
Preserve original styling when possible
Test on target devices before distribution

Performance

Burn subtitles only when necessary (streaming, compatibility)
Prefer soft subs for archival and flexibility
Use hardware encoding when burning subtitles to large files
Process in parallel for batch operations

This guide covers FFmpeg subtitle and caption operations. For text overlays with shapes and graphics, see the shapes-graphics skill.

About

SKILL.md

About

Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20)...

SKILL.md

name: ffmpeg-captions-subtitles description: Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) Burning subtitles (hardcoding SRT/ASS/VTT), (2) Adding soft subtitle tracks, (3) Extracting subtitles from video, (4) Subtitle format conversion, (5) Styled captions (font, color, outline, shadow), (6) Subtitle positioning and alignment, (7) CEA-608/708 closed captions, (8) Text overlays with drawtext, (9) Whisper AI automatic transcription (FFmpeg 8.0+ with VAD, multi-language, GPU), (10) Batch subtitle processing. Provides: Format reference tables, styling parameter guide, position alignment charts, Whisper model comparison, VAD configuration, dynamic text examples, accessibility best practices. Ensures: Professional captions with proper styling and accessibility compliance.

CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).

Documentation Guidelines

NEVER create new documentation files unless explicitly requested by the user.

Quick Reference

Task	Command
Burn SRT	`ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4`
Burn ASS	`ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4`
Add soft sub	`ffmpeg -i video.mp4 -i subs.srt -c copy -c:s mov_text output.mp4`
Extract sub	`ffmpeg -i video.mkv -map 0:s:0 output.srt`
Style subs	`-vf "subtitles=s.srt:force_style='FontSize=24,PrimaryColour=&HFFFFFF'"`
Text overlay	`-vf "drawtext=text='Hello':x=10:y=10:fontsize=24:fontcolor=white"`

Format	Extension	Best For
SRT	.srt	Simple, universal
ASS	.ass	Styled, animated, anime
VTT	.vtt	Web/HTML5 video

When to Use This Skill

Use for subtitle and caption operations:

Hardcoding (burning) subtitles into video
Adding soft subtitle tracks to containers
Extracting subtitles from MKV/MP4
Styling captions (font, color, position)
Dynamic text overlays

FFmpeg Captions and Subtitles (2025)

Complete guide to working with subtitles, closed captions, and text overlays using FFmpeg.

Subtitle Format Reference

Supported Formats

Format	Extension	Features	Use Case
SubRip	.srt	Simple timing + text	Universal, web
Advanced SubStation Alpha	.ass/.ssa	Rich styling, positioning, effects	Anime, styled subs
WebVTT	.vtt	Web standard, cues, styling	HTML5 video
TTML/DFXP	.ttml/.dfxp	Broadcast, accessibility	Streaming services
MOV Text	.mov (embedded)	QuickTime native	Apple ecosystem
DVB Subtitle	(embedded)	Bitmap-based	European broadcast
PGS	.sup	Blu-ray bitmap subtitles	Blu-ray
CEA-608/708	(embedded)	Closed captions	US broadcast, streaming

Format Characteristics

SRT (SubRip):
- Simple text-based format
- Supports basic HTML tags (<b>, <i>, <u>)
- Widely compatible
- No positioning or advanced styling

ASS/SSA:
- Advanced styling (fonts, colors, outlines)
- Precise positioning anywhere on screen
- Animation and effects support
- Karaoke timing

WebVTT:
- HTML5 standard format
- CSS-like styling
- Cue settings for positioning
- Speaker identification

Burning Subtitles (Hardcoding)

Basic Subtitle Burn-in

# Burn SRT subtitles
ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4

# Burn ASS/SSA subtitles (preserves styling)
ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4

# Burn subtitles from MKV container
ffmpeg -i video.mkv -vf "subtitles=video.mkv" output.mp4

# Burn specific subtitle track (index 0)
ffmpeg -i video.mkv -vf "subtitles=video.mkv:si=0" output.mp4

# Burn subtitles with stream index
ffmpeg -i video.mkv -vf "subtitles=video.mkv:stream_index=1" output.mp4

Styled Subtitle Burn-in

# Force style (overrides subtitle styling)
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1'" \
  output.mp4

# Yellow subtitles with black outline
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=28,PrimaryColour=&H00FFFF,OutlineColour=&H000000,Outline=3'" \
  output.mp4

# Larger font for accessibility
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=36,Bold=1'" \
  output.mp4

ASS Style Parameters

Parameter	Description	Example
FontName	Font family	`FontName=Arial`
FontSize	Size in points	`FontSize=24`
PrimaryColour	Text color (AABBGGRR)	`PrimaryColour=&HFFFFFF`
SecondaryColour	Karaoke color	`SecondaryColour=&H00FFFF`
OutlineColour	Outline/border color	`OutlineColour=&H000000`
BackColour	Shadow/background color	`BackColour=&H80000000`
Bold	Bold text (0/1)	`Bold=1`
Italic	Italic text (0/1)	`Italic=1`
Underline	Underlined text (0/1)	`Underline=1`
Outline	Outline width	`Outline=2`
Shadow	Shadow depth	`Shadow=1`
Alignment	Position (numpad style)	`Alignment=2`
MarginL/R/V	Margins in pixels	`MarginV=50`

Color Format (ASS)

ASS uses &HAABBGGRR format (Alpha, Blue, Green, Red):
- White: &HFFFFFF or &H00FFFFFF
- Black: &H000000 or &H00000000
- Yellow: &H00FFFF (00-Blue, FF-Green, FF-Red)
- Red: &H0000FF
- Blue: &HFF0000
- 50% transparent black: &H80000000

Adding Subtitle Tracks (Soft Subs)

Embed SRT as Track

# Add SRT to MP4 (MOV text)
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s mov_text \
  output.mp4

# Add SRT to MKV
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s srt \
  output.mkv

# Add SRT to WebM (WebVTT)
ffmpeg -i video.webm -i subs.srt \
  -c copy -c:s webvtt \
  output.webm

Multiple Subtitle Tracks

# Add multiple languages
ffmpeg -i video.mp4 -i subs_en.srt -i subs_es.srt -i subs_fr.srt \
  -map 0:v -map 0:a -map 1 -map 2 -map 3 \
  -c copy -c:s mov_text \
  -metadata:s:s:0 language=eng -metadata:s:s:0 title="English" \
  -metadata:s:s:1 language=spa -metadata:s:s:1 title="Spanish" \
  -metadata:s:s:2 language=fra -metadata:s:s:2 title="French" \
  output.mp4

# Add ASS subtitles to MKV (preserves styling)
ffmpeg -i video.mp4 -i styled.ass \
  -map 0 -map 1 \
  -c copy -c:s ass \
  output.mkv

Set Default Subtitle Track

# Set subtitle as default
ffmpeg -i video.mp4 -i subs.srt \
  -c copy -c:s mov_text \
  -disposition:s:0 default \
  output.mp4

# Set forced subtitles (always display)
ffmpeg -i video.mp4 -i forced.srt \
  -c copy -c:s mov_text \
  -disposition:s:0 forced \
  output.mp4

Extracting Subtitles

Extract to External File

# Extract first subtitle track to SRT
ffmpeg -i video.mkv -map 0:s:0 output.srt

# Extract specific subtitle stream
ffmpeg -i video.mkv -map 0:s:1 output.srt

# Extract to ASS format
ffmpeg -i video.mkv -map 0:s:0 output.ass

# Extract to WebVTT
ffmpeg -i video.mkv -map 0:s:0 output.vtt

# Extract all subtitle tracks
ffmpeg -i video.mkv -map 0:s subs_%d.srt

List Available Subtitle Tracks

# Show all streams including subtitles
ffprobe -v error -show_entries stream=index,codec_name,codec_type:stream_tags=language,title \
  -of csv=p=0 video.mkv

# Show only subtitle streams
ffprobe -v error -select_streams s \
  -show_entries stream=index,codec_name:stream_tags=language,title \
  -of csv=p=0 video.mkv

Convert Subtitle Formats

# SRT to ASS
ffmpeg -i subs.srt subs.ass

# ASS to SRT (loses styling)
ffmpeg -i subs.ass subs.srt

# SRT to WebVTT
ffmpeg -i subs.srt subs.vtt

# WebVTT to SRT
ffmpeg -i subs.vtt subs.srt

Text Overlays (drawtext)

Basic Text Overlay

# Simple text overlay
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Hello World':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Centered text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Centered Text':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white" \
  output.mp4

# Text at bottom center
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Bottom Text':x=(w-tw)/2:y=h-th-20:fontsize=32:fontcolor=white" \
  output.mp4

Styled Text Overlays

# Text with background box
ffmpeg -i video.mp4 \
  -vf "drawtext=text='With Background':x=10:y=10:fontsize=24:fontcolor=white:box=1:boxcolor=black@0.5:boxborderw=5" \
  output.mp4

# Text with outline (border)
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Outlined Text':x=10:y=10:fontsize=48:fontcolor=white:borderw=3:bordercolor=black" \
  output.mp4

# Shadow effect
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Shadow Text':x=10:y=10:fontsize=36:fontcolor=white:shadowcolor=black:shadowx=3:shadowy=3" \
  output.mp4

# Custom font
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Custom Font':fontfile=/path/to/font.ttf:fontsize=24:fontcolor=white" \
  output.mp4

Dynamic Text

# Display timestamp
ffmpeg -i video.mp4 \
  -vf "drawtext=text='%{pts\:hms}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Display frame number
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Frame\: %{n}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Display current date/time
ffmpeg -i video.mp4 \
  -vf "drawtext=text='%{localtime\:%Y-%m-%d %H\\\:%M\\\:%S}':x=10:y=10:fontsize=24:fontcolor=white" \
  output.mp4

# Text from file
ffmpeg -i video.mp4 \
  -vf "drawtext=textfile=message.txt:x=10:y=10:fontsize=24:fontcolor=white:reload=1" \
  output.mp4

Animated Text

# Scrolling text (credits)
ffmpeg -i video.mp4 \
  -vf "drawtext=textfile=credits.txt:x=(w-tw)/2:y=h-t*50:fontsize=24:fontcolor=white" \
  output.mp4

# Fade in text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Fade In':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white:alpha='if(lt(t,1),t,1)'" \
  output.mp4

# Text appears at specific time (3 seconds)
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Appears at 3s':x=10:y=10:fontsize=24:fontcolor=white:enable='gte(t,3)'" \
  output.mp4

# Text visible between 2-5 seconds
ffmpeg -i video.mp4 \
  -vf "drawtext=text='Visible 2-5s':x=10:y=10:fontsize=24:fontcolor=white:enable='between(t,2,5)'" \
  output.mp4

# Blinking text
ffmpeg -i video.mp4 \
  -vf "drawtext=text='BLINK':x=10:y=10:fontsize=36:fontcolor=red:alpha='if(mod(floor(t*2),2),1,0)'" \
  output.mp4

Whisper AI Integration (FFmpeg 8.0+)

FFmpeg 8.0 integrates OpenAI's Whisper speech recognition model via whisper.cpp, enabling fully local and offline automatic transcription and subtitle generation.

Features

Speech to Text: High accuracy transcription
Multi-language: 99 languages supported
Flexible Output: SRT subtitles, JSON, or plain text
GPU Acceleration: Uses system GPU by default
Voice Activity Detection: Silero VAD support for better segmentation
Real-time: Works with live audio streams

Generate SRT Subtitles

# Generate subtitles from video using Whisper
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=auto:queue=3:destination=output.srt:format=srt" \
  -f null -

# Specify language for better accuracy
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:destination=english.srt:format=srt" \
  -f null -

# Use larger model for higher quality
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-medium.bin:language=auto:destination=output.srt:format=srt" \
  -f null -

Live Transcription

# Live transcription from microphone (Linux PulseAudio)
ffmpeg -loglevel warning -f pulse -i default \
  -af "highpass=f=200,lowpass=f=3000,whisper=model=ggml-medium-q5_0.bin:language=en:queue=10:destination=-:format=json:vad_model=for-tests-silero-v5.1.2-ggml.bin" \
  -f null -

# Live transcription from microphone (Windows)
ffmpeg -loglevel warning -f dshow -i audio="Microphone" \
  -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
  -f null -

# Live transcription from microphone (macOS)
ffmpeg -loglevel warning -f avfoundation -i ":0" \
  -af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
  -f null -

Display Live Subtitles on Video

# Whisper writes to frame metadata, drawtext reads it
ffmpeg -i input.mp4 \
  -af "whisper=model=ggml-base.en.bin:language=en" \
  -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=24:fontcolor=white:x=10:y=h-th-10:box=1:boxcolor=black@0.5:boxborderw=5" \
  output_with_subtitles.mp4

# Centered subtitles with styling
ffmpeg -i input.mp4 \
  -af "whisper=model=ggml-base.bin:language=auto" \
  -vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=32:fontcolor=white:x=(w-tw)/2:y=h-th-50:borderw=2:bordercolor=black" \
  output.mp4

JSON Output for Post-Processing

# Generate JSON for custom processing
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=auto:destination=output.json:format=json" \
  -f null -

With Voice Activity Detection (VAD)

# Use Silero VAD for better speech detection
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=for-tests-silero-v5.1.2-ggml.bin:vad_threshold=0.5" \
  -f null -

# VAD parameters for noisy audio
ffmpeg -i noisy_video.mp4 -vn \
  -af "whisper=model=ggml-medium.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=silero.bin:vad_threshold=0.6:vad_min_speech_duration=0.2:vad_min_silence_duration=0.3" \
  -f null -

Disable GPU (CPU-only processing)

# Force CPU processing (slower but works without GPU)
ffmpeg -i video.mp4 -vn \
  -af "whisper=model=ggml-base.bin:language=en:use_gpu=false:destination=output.srt:format=srt" \
  -f null -

Whisper Models

Model	Size	Speed	Quality	VRAM	Recommended For
tiny	39 MB	Fastest	Basic	~1 GB	Quick previews, low-resource
base	74 MB	Fast	Good	~1 GB	General use, balanced
small	244 MB	Medium	Better	~2 GB	Higher accuracy
medium	769 MB	Slow	High	~5 GB	Quality-critical
large	1.55 GB	Slowest	Best	~10 GB	Maximum accuracy

Whisper Filter Parameters

Parameter	Description	Default
`model`	Path to GGML model file	Required
`language`	Language code or "auto"	"auto"
`format`	Output format: "text", "srt", "json"	"text"
`destination`	Output file path or "-" for stdout	Required
`queue`	Buffer size in seconds	3
`vad_model`	Path to Silero VAD model	None
`vad_threshold`	VAD sensitivity (0-1)	0.5
`vad_min_speech_duration`	Min speech duration (seconds)	0.1
`vad_min_silence_duration`	Min silence duration (seconds)	0.5
`use_gpu`	Enable GPU acceleration	true

Model Download

Download GGML models from the whisper.cpp project:

# Download base model
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

# Download medium model for better quality
curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin

CEA-608/708 Closed Captions

Extract Closed Captions

# Extract CEA-608 captions from ATSC stream
ffmpeg -f lavfi -i "movie=broadcast.ts[out0+subcc]" -map 0:1 captions.srt

# Extract from video with embedded CC
ffmpeg -i video_with_cc.mp4 \
  -filter_complex "[0:v]format=yuv420p[v];[0:v]crop=1:1:0:0[c]" \
  -map "[c]" -c:v libx264 -f null - 2>&1 | grep -A 1 "Closed caption"

Add CEA-608 Captions

# Embed CEA-608 captions (requires eia608 line)
ffmpeg -i video.mp4 -i captions.scc \
  -c:v libx264 -c:a copy \
  -vf "movie=captions.scc[captions];[0:v][captions]overlay" \
  output.mp4

Subtitle Positioning

ASS Alignment Values

7 (top-left)     8 (top-center)     9 (top-right)
4 (mid-left)     5 (mid-center)     6 (mid-right)
1 (bottom-left)  2 (bottom-center)  3 (bottom-right)

Position Examples

# Top center subtitles
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=8,MarginV=20'" \
  output.mp4

# Left-aligned subtitles
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=1,MarginL=50,MarginV=30'" \
  output.mp4

# Right side for speaker identification
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:force_style='Alignment=3,MarginR=50'" \
  output.mp4

Multiple Subtitle Positions

# Two subtitle tracks at different positions
ffmpeg -i video.mp4 \
  -vf "[in]subtitles=speaker1.srt:force_style='Alignment=1,MarginL=50'[tmp];\
       [tmp]subtitles=speaker2.srt:force_style='Alignment=3,MarginR=50'" \
  output.mp4

Batch Processing

Burn Subtitles to Multiple Videos

#!/bin/bash
# burn_subs_batch.sh

for video in *.mp4; do
    base="${video%.mp4}"
    if [ -f "${base}.srt" ]; then
        ffmpeg -i "$video" \
          -vf "subtitles=${base}.srt:force_style='FontSize=24,Outline=2'" \
          -c:a copy \
          "output/${base}_subbed.mp4"
    fi
done

Convert Subtitle Format Batch

#!/bin/bash
# convert_subs.sh

for srt in *.srt; do
    base="${srt%.srt}"
    ffmpeg -i "$srt" "${base}.vtt"
done

Troubleshooting

Common Issues

"Unable to find a suitable output format"

# Specify output format explicitly
ffmpeg -i video.mkv -map 0:s:0 -f srt output.srt

Garbled characters in subtitles

# Force UTF-8 encoding
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:charenc=UTF-8" \
  output.mp4

Font not found

# Specify fonts directory
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:fontsdir=/path/to/fonts" \
  output.mp4

# List available fonts
fc-list : family | sort | uniq

Subtitle timing offset

# Delay subtitles by 2 seconds
ffmpeg -i video.mp4 \
  -vf "subtitles=subs.srt:itsoffset=2" \
  output.mp4

# Or use setpts to adjust
ffmpeg -i subs.srt -itsoffset 2 delayed.srt

Subtitle not showing on high-res video

# Scale subtitle rendering to video resolution
ffmpeg -i video_4k.mp4 \
  -vf "subtitles=subs.srt:force_style='FontSize=48,Outline=3'" \
  output.mp4

Verification Commands

# Check if subtitles are present
ffprobe -v error -select_streams s -show_entries stream=codec_name -of default=nw=1 video.mp4

# Count subtitle lines
grep -c "^[0-9]" subs.srt

# Validate SRT format
ffmpeg -i subs.srt -f null -

Best Practices

Accessibility

Use high contrast colors (white on dark, yellow on dark)
Minimum font size of 24pt for standard video
Include speaker identification for multiple speakers
Position subtitles to avoid obscuring important visual content
Keep lines short (42 characters max per line)
Display duration: minimum 1 second, maximum 7 seconds per caption

Quality

Use ASS format for styled subtitles (anime, music videos)
Use SRT for simple dialogue
Use WebVTT for web delivery
Preserve original styling when possible
Test on target devices before distribution

Performance

Burn subtitles only when necessary (streaming, compatibility)
Prefer soft subs for archival and flexibility
Use hardware encoding when burning subtitles to large files
Process in parallel for batch operations

This guide covers FFmpeg subtitle and caption operations. For text overlays with shapes and graphics, see the shapes-graphics skill.