High-performance CLI for working with SQL dump files: split/merge by table, analyze contents, validate integrity, convert between MySQL/PostgreSQL/SQLite/MSSQL, create FK-safe samples, shard...
This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.
Use sql-splitter when:
.sql, .sql.gz, .sql.bz2, .sql.xz, .sql.zst)Do not use sql-splitter when:
psql/mysql/sqlcmd directly)Split a dump into per-table files.
sql-splitter split dump.sql --output tables/ --progress
sql-splitter split dump.sql --tables users,orders --output tables/
sql-splitter split dump.sql --schema-only --output schema/
sql-splitter split dump.sql --data-only --output data/
Merge per-table files back into a single dump.
sql-splitter merge tables/ --output restored.sql
sql-splitter merge tables/ --output restored.sql --transaction
sql-splitter merge tables/ --exclude logs,cache --output restored.sql
Get statistics about a dump (read-only).
sql-splitter analyze dump.sql --progress
sql-splitter analyze "dumps/*.sql" --fail-fast
Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).
sql-splitter convert mysql.sql --to postgres --output pg.sql
sql-splitter convert pg_dump.sql --to mysql --output mysql.sql
sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
sql-splitter convert mysql.sql --to mssql --output mssql.sql
sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"
Check dump integrity (syntax, encoding, PK/FK).
sql-splitter validate dump.sql --strict --progress
sql-splitter validate "dumps/*.sql" --json --fail-fast
sql-splitter validate dump.sql --no-fk-checks --progress
Create reduced datasets with FK preservation.
sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders
sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42
Extract tenant-specific data.
sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql
sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/
Compare two SQL dumps for schema and data changes.
sql-splitter diff old.sql new.sql --progress
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
sql-splitter diff old.sql new.sql --format json --output diff.json
sql-splitter diff old.sql new.sql --format sql --output migration.sql
sql-splitter diff old.sql new.sql --tables users,orders --progress
sql-splitter diff old.sql new.sql --verbose # Show sample PKs
sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at" # Ignore columns
sql-splitter diff old.sql new.sql --primary-key logs:ts+msg # Override PK
sql-splitter diff old.sql new.sql --allow-no-pk # Tables without PK
Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.
# Using YAML config file
sql-splitter redact dump.sql --output safe.sql --config redact.yaml
# Using CLI flags
sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"
# Mask credit cards (keep last 4 digits)
sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card"
# Generate config by analyzing input file
sql-splitter redact dump.sql --generate-config --output redact.yaml
# Reproducible with seed
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42
# Validate config only
sql-splitter redact dump.sql --config redact.yaml --validate
# With specific locale for fake data
sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de
Strategies:
--null "pattern": Replace with NULL--hash "pattern": SHA256 hash (deterministic, preserves FK integrity)--fake "pattern": Generate realistic fake data--mask "pattern=column": Partial masking--constant "column=value": Fixed value replacementFake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.
Generate Entity-Relationship Diagrams (ERD) from SQL dumps.
# Interactive HTML ERD with dark/light mode and panzoom
sql-splitter graph dump.sql --output schema.html
# Graphviz DOT format with ERD-style tables
sql-splitter graph dump.sql --output schema.dot
# Mermaid erDiagram syntax (paste into GitHub/GitLab)
sql-splitter graph dump.sql --output schema.mmd --format mermaid
# JSON with full schema details
sql-splitter graph dump.sql --json
# Filter tables
sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*"
# Show only circular dependencies
sql-splitter graph dump.sql --cycles-only
# Focus on specific table and its dependencies
sql-splitter graph dump.sql --table orders --transitive
# Show tables that depend on users
sql-splitter graph dump.sql --table users --reverse
Reorder SQL dump in topological FK order for safe imports.
# Rewrite in safe import order
sql-splitter order dump.sql --output ordered.sql
# Check for cycles without rewriting
sql-splitter order dump.sql --check
# Reverse order (for DROP operations)
sql-splitter order dump.sql --reverse --output drop_order.sql
Run SQL analytics on dump files using embedded DuckDB (no database required).
# Single query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
# Interactive REPL
sql-splitter query dump.sql --interactive
# Export to JSON/CSV
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json
sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv
# With caching (400x faster on repeated queries)
sql-splitter query dump.sql "SELECT ..." --cache
# Disk mode for large dumps (>2GB auto-enabled)
sql-splitter query huge.sql "SELECT ..." --disk
# Filter tables to import (faster startup)
sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users
# Memory limit
sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB
# Cache management
sql-splitter query --list-cache
sql-splitter query --clear-cache
REPL commands:
.tables - List all tables.schema [table] - Show schema.describe <table> - Describe table.count <table> - Count rows.sample <table> [n] - Sample rows.format <fmt> - Set output format (table, json, csv, tsv).export <file> <query> - Export query results.exit - Exit REPLBefore using any dump from an external source:
Validate integrity
sql-splitter validate path/to/dump.sql.gz --strict --progress
If validation fails, check:
--dialect=postgres, --dialect=mysql, or --dialect=mssqlAnalyze structure
sql-splitter analyze path/to/dump.sql.gz --progress
For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):
Validate source
sql-splitter validate source.sql.gz --strict --progress
Convert dialect
sql-splitter convert source.sql.gz --to postgres --output target.sql --strict
# or for MSSQL
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
Validate converted output
sql-splitter validate target.sql --dialect=postgres --strict
Or stream directly
sql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"
For creating smaller realistic data for development:
Analyze to understand sizes
sql-splitter analyze prod.sql.zst --progress
Sample with FK preservation
sql-splitter sample prod.sql.zst \
--output dev_seed.sql \
--percent 10 \
--preserve-relations \
--progress
Restore to dev database
psql "$DEV_DB" < dev_seed.sql
For validating dumps in CI pipelines:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict
Parse with jq:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \
| jq '.results[] | select(.passed == false)'
When the user needs to edit specific tables:
Split
sql-splitter split dump.sql --output tables/ --progress
Edit the per-table files (tables/users.sql, etc.)
Merge back
sql-splitter merge tables/ --output updated.sql --transaction
For multi-tenant databases:
Identify tenant column (often tenant_id, account_id, company_id)
Extract tenant data
sql-splitter shard dump.sql \
--tenant-value 12345 \
--tenant-column tenant_id \
--output tenant_12345.sql \
--progress
For detecting schema or data changes between two versions:
Full comparison (schema + data)
sql-splitter diff old_dump.sql new_dump.sql --progress
Schema-only comparison (fast, no data parsing)
sql-splitter diff old.sql new.sql --schema-only
Generate migration script
sql-splitter diff old.sql new.sql --format sql --output migration.sql
JSON output for automation
sql-splitter diff old.sql new.sql --format json | jq '.summary'
For creating safe development/testing datasets:
Generate redaction config by analyzing dump
sql-splitter redact dump.sql --generate-config --output redact.yaml
Review and customize the generated config
Apply redaction
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress
Or use inline patterns for quick redaction
sql-splitter redact dump.sql --output safe.sql \
--null "*.ssn,*.tax_id" \
--hash "*.email" \
--fake "*.name,*.phone"
Validate the redacted output
sql-splitter validate safe.sql --strict
For understanding complex database schemas:
Generate interactive ERD
sql-splitter graph dump.sql --output schema.html
# Opens in browser with dark/light mode, zoom/pan
For documentation (Mermaid)
sql-splitter graph dump.sql --output docs/schema.mmd --format mermaid
# Paste into GitHub/GitLab/Notion
Focus on specific area
# What does orders depend on?
sql-splitter graph dump.sql --table orders --transitive --output orders.html
# What depends on users?
sql-splitter graph dump.sql --table users --reverse --output users_deps.html
Find circular dependencies
sql-splitter graph dump.sql --cycles-only
For ensuring FK constraints don't fail during restore:
Check for cycles
sql-splitter order dump.sql --check
Reorder if needed
sql-splitter order dump.sql --output ordered.sql
For DROP operations (reverse order)
sql-splitter order dump.sql --reverse --output drop_order.sql
For running SQL queries on dump files without loading into a database:
Quick exploratory query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
Interactive exploration (REPL)
sql-splitter query dump.sql --interactive
# sql> .tables
# sql> SELECT * FROM orders LIMIT 10
# sql> .count users
Export analysis results
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv
Repeated queries with caching (400x speedup)
# First run imports and caches
sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache
# Subsequent runs use cache
sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache
Complex analytics
sql-splitter query dump.sql "
SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name
ORDER BY total_spent DESC
LIMIT 10
" -f json
| Goal | Flags |
|---|---|
| CI validation | --strict --fail-fast --json |
| Safe exploration | --dry-run --progress |
| Reproducible sampling | --seed 42 --preserve-relations |
| Fast progress feedback | --progress |
| Compressed output | Pipe to gzip -c or zstd -c |
If auto-detection fails, specify explicitly:
sql-splitter validate dump.sql --dialect=postgres
sql-splitter validate mssql_dump.sql --dialect=mssql
sql-splitter convert dump.sql --from mysql --to postgres --output out.sql
sql-splitter convert dump.sql --from mssql --to mysql --output out.sql
--json output for specific errors--no-fk-checks to skip expensive integrity checkssample before full operationsWhen using this skill:
.sql files or dump-related tasks--dry-run first, then --progress for feedback