Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Give agents more agency

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    HelgeSverre

    sql-splitter

    HelgeSverre/sql-splitter
    Data & Analytics
    1

    About

    SKILL.md

    Install

    • Telegram
      Telegram
    • Slack
      Slack
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    • Download skill
    ├─
    ├─
    └─

    About

    High-performance CLI for working with SQL dump files: split/merge by table, analyze contents, validate integrity, convert between MySQL/PostgreSQL/SQLite/MSSQL, create FK-safe samples, shard...

    SKILL.md

    sql-splitter Skill

    This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.

    When to Use This Skill

    Use sql-splitter when:

    • The user mentions SQL dump files (.sql, .sql.gz, .sql.bz2, .sql.xz, .sql.zst)
    • The user wants to migrate, restore, or work with database dump files
    • The user needs to validate, analyze, split, merge, convert, sample, shard, or query dumps
    • Working with MySQL, PostgreSQL, SQLite, or MSSQL dump formats
    • The user wants to run SQL analytics on a dump file without loading it into a database

    When NOT to Use This Skill

    Do not use sql-splitter when:

    • Running complex ad-hoc SQL queries against a live database (use psql/mysql/sqlcmd directly)
    • No dump file exists; only a running database is available
    • The user needs interactive data editing rather than dump manipulation
    • Working with dialects beyond MySQL/PostgreSQL/SQLite/MSSQL
    • Working with MSSQL binary backup files (.bak) or DACPAC/BACPAC formats (only script-based .sql dumps are supported)

    Command Reference

    split

    Split a dump into per-table files.

    sql-splitter split dump.sql --output tables/ --progress
    sql-splitter split dump.sql --tables users,orders --output tables/
    sql-splitter split dump.sql --schema-only --output schema/
    sql-splitter split dump.sql --data-only --output data/
    

    merge

    Merge per-table files back into a single dump.

    sql-splitter merge tables/ --output restored.sql
    sql-splitter merge tables/ --output restored.sql --transaction
    sql-splitter merge tables/ --exclude logs,cache --output restored.sql
    

    analyze

    Get statistics about a dump (read-only).

    sql-splitter analyze dump.sql --progress
    sql-splitter analyze "dumps/*.sql" --fail-fast
    

    convert

    Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).

    sql-splitter convert mysql.sql --to postgres --output pg.sql
    sql-splitter convert pg_dump.sql --to mysql --output mysql.sql
    sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql
    sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
    sql-splitter convert mysql.sql --to mssql --output mssql.sql
    sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"
    

    validate

    Check dump integrity (syntax, encoding, PK/FK).

    sql-splitter validate dump.sql --strict --progress
    sql-splitter validate "dumps/*.sql" --json --fail-fast
    sql-splitter validate dump.sql --no-fk-checks --progress
    

    sample

    Create reduced datasets with FK preservation.

    sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations
    sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations
    sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders
    sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42
    

    shard

    Extract tenant-specific data.

    sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql
    sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/
    

    diff

    Compare two SQL dumps for schema and data changes.

    sql-splitter diff old.sql new.sql --progress
    sql-splitter diff old.sql new.sql --schema-only
    sql-splitter diff old.sql new.sql --data-only
    sql-splitter diff old.sql new.sql --format json --output diff.json
    sql-splitter diff old.sql new.sql --format sql --output migration.sql
    sql-splitter diff old.sql new.sql --tables users,orders --progress
    sql-splitter diff old.sql new.sql --verbose                           # Show sample PKs
    sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at"     # Ignore columns
    sql-splitter diff old.sql new.sql --primary-key logs:ts+msg           # Override PK
    sql-splitter diff old.sql new.sql --allow-no-pk                       # Tables without PK
    

    redact

    Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.

    # Using YAML config file
    sql-splitter redact dump.sql --output safe.sql --config redact.yaml
    
    # Using CLI flags
    sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"
    
    # Mask credit cards (keep last 4 digits)
    sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card"
    
    # Generate config by analyzing input file
    sql-splitter redact dump.sql --generate-config --output redact.yaml
    
    # Reproducible with seed
    sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42
    
    # Validate config only
    sql-splitter redact dump.sql --config redact.yaml --validate
    
    # With specific locale for fake data
    sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de
    

    Strategies:

    • --null "pattern": Replace with NULL
    • --hash "pattern": SHA256 hash (deterministic, preserves FK integrity)
    • --fake "pattern": Generate realistic fake data
    • --mask "pattern=column": Partial masking
    • --constant "column=value": Fixed value replacement

    Fake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.

    graph

    Generate Entity-Relationship Diagrams (ERD) from SQL dumps.

    # Interactive HTML ERD with dark/light mode and panzoom
    sql-splitter graph dump.sql --output schema.html
    
    # Graphviz DOT format with ERD-style tables
    sql-splitter graph dump.sql --output schema.dot
    
    # Mermaid erDiagram syntax (paste into GitHub/GitLab)
    sql-splitter graph dump.sql --output schema.mmd --format mermaid
    
    # JSON with full schema details
    sql-splitter graph dump.sql --json
    
    # Filter tables
    sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*"
    
    # Show only circular dependencies
    sql-splitter graph dump.sql --cycles-only
    
    # Focus on specific table and its dependencies
    sql-splitter graph dump.sql --table orders --transitive
    
    # Show tables that depend on users
    sql-splitter graph dump.sql --table users --reverse
    

    order

    Reorder SQL dump in topological FK order for safe imports.

    # Rewrite in safe import order
    sql-splitter order dump.sql --output ordered.sql
    
    # Check for cycles without rewriting
    sql-splitter order dump.sql --check
    
    # Reverse order (for DROP operations)
    sql-splitter order dump.sql --reverse --output drop_order.sql
    

    query

    Run SQL analytics on dump files using embedded DuckDB (no database required).

    # Single query
    sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
    
    # Interactive REPL
    sql-splitter query dump.sql --interactive
    
    # Export to JSON/CSV
    sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json
    sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv
    
    # With caching (400x faster on repeated queries)
    sql-splitter query dump.sql "SELECT ..." --cache
    
    # Disk mode for large dumps (>2GB auto-enabled)
    sql-splitter query huge.sql "SELECT ..." --disk
    
    # Filter tables to import (faster startup)
    sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users
    
    # Memory limit
    sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB
    
    # Cache management
    sql-splitter query --list-cache
    sql-splitter query --clear-cache
    

    REPL commands:

    • .tables - List all tables
    • .schema [table] - Show schema
    • .describe <table> - Describe table
    • .count <table> - Count rows
    • .sample <table> [n] - Sample rows
    • .format <fmt> - Set output format (table, json, csv, tsv)
    • .export <file> <query> - Export query results
    • .exit - Exit REPL

    Step-by-Step Patterns

    Pattern 1: Validate Before Use

    Before using any dump from an external source:

    1. Validate integrity

      sql-splitter validate path/to/dump.sql.gz --strict --progress
      
    2. If validation fails, check:

      • Incorrect dialect? Try --dialect=postgres, --dialect=mysql, or --dialect=mssql
      • Encoding issues? Report specific errors to user
      • Truncated file? Check file size and completeness
      • For MSSQL: Ensure GO batch separators are on their own lines
    3. Analyze structure

      sql-splitter analyze path/to/dump.sql.gz --progress
      

    Pattern 2: Database Migration

    For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):

    1. Validate source

      sql-splitter validate source.sql.gz --strict --progress
      
    2. Convert dialect

      sql-splitter convert source.sql.gz --to postgres --output target.sql --strict
      # or for MSSQL
      sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
      
    3. Validate converted output

      sql-splitter validate target.sql --dialect=postgres --strict
      
    4. Or stream directly

      sql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"
      

    Pattern 3: Create Dev Dataset

    For creating smaller realistic data for development:

    1. Analyze to understand sizes

      sql-splitter analyze prod.sql.zst --progress
      
    2. Sample with FK preservation

      sql-splitter sample prod.sql.zst \
        --output dev_seed.sql \
        --percent 10 \
        --preserve-relations \
        --progress
      
    3. Restore to dev database

      psql "$DEV_DB" < dev_seed.sql
      

    Pattern 4: CI Validation Gate

    For validating dumps in CI pipelines:

    sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict
    

    Parse with jq:

    sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \
      | jq '.results[] | select(.passed == false)'
    

    Pattern 5: Per-Table Editing

    When the user needs to edit specific tables:

    1. Split

      sql-splitter split dump.sql --output tables/ --progress
      
    2. Edit the per-table files (tables/users.sql, etc.)

    3. Merge back

      sql-splitter merge tables/ --output updated.sql --transaction
      

    Pattern 6: Tenant Extraction

    For multi-tenant databases:

    1. Identify tenant column (often tenant_id, account_id, company_id)

    2. Extract tenant data

      sql-splitter shard dump.sql \
        --tenant-value 12345 \
        --tenant-column tenant_id \
        --output tenant_12345.sql \
        --progress
      

    Pattern 7: Comparing Dumps for Changes

    For detecting schema or data changes between two versions:

    1. Full comparison (schema + data)

      sql-splitter diff old_dump.sql new_dump.sql --progress
      
    2. Schema-only comparison (fast, no data parsing)

      sql-splitter diff old.sql new.sql --schema-only
      
    3. Generate migration script

      sql-splitter diff old.sql new.sql --format sql --output migration.sql
      
    4. JSON output for automation

      sql-splitter diff old.sql new.sql --format json | jq '.summary'
      

    Pattern 8: Data Anonymization

    For creating safe development/testing datasets:

    1. Generate redaction config by analyzing dump

      sql-splitter redact dump.sql --generate-config --output redact.yaml
      
    2. Review and customize the generated config

    3. Apply redaction

      sql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress
      
    4. Or use inline patterns for quick redaction

      sql-splitter redact dump.sql --output safe.sql \
        --null "*.ssn,*.tax_id" \
        --hash "*.email" \
        --fake "*.name,*.phone"
      
    5. Validate the redacted output

      sql-splitter validate safe.sql --strict
      

    Pattern 9: Schema Visualization

    For understanding complex database schemas:

    1. Generate interactive ERD

      sql-splitter graph dump.sql --output schema.html
      # Opens in browser with dark/light mode, zoom/pan
      
    2. For documentation (Mermaid)

      sql-splitter graph dump.sql --output docs/schema.mmd --format mermaid
      # Paste into GitHub/GitLab/Notion
      
    3. Focus on specific area

      # What does orders depend on?
      sql-splitter graph dump.sql --table orders --transitive --output orders.html
      
      # What depends on users?
      sql-splitter graph dump.sql --table users --reverse --output users_deps.html
      
    4. Find circular dependencies

      sql-splitter graph dump.sql --cycles-only
      

    Pattern 10: Safe Import Order

    For ensuring FK constraints don't fail during restore:

    1. Check for cycles

      sql-splitter order dump.sql --check
      
    2. Reorder if needed

      sql-splitter order dump.sql --output ordered.sql
      
    3. For DROP operations (reverse order)

      sql-splitter order dump.sql --reverse --output drop_order.sql
      

    Pattern 11: Ad-hoc SQL Analytics

    For running SQL queries on dump files without loading into a database:

    1. Quick exploratory query

      sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
      
    2. Interactive exploration (REPL)

      sql-splitter query dump.sql --interactive
      # sql> .tables
      # sql> SELECT * FROM orders LIMIT 10
      # sql> .count users
      
    3. Export analysis results

      sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv
      
    4. Repeated queries with caching (400x speedup)

      # First run imports and caches
      sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache
      
      # Subsequent runs use cache
      sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache
      
    5. Complex analytics

      sql-splitter query dump.sql "
        SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
        FROM users u
        JOIN orders o ON u.id = o.user_id
        GROUP BY u.name
        ORDER BY total_spent DESC
        LIMIT 10
      " -f json
      

    Common Flag Combinations

    Goal Flags
    CI validation --strict --fail-fast --json
    Safe exploration --dry-run --progress
    Reproducible sampling --seed 42 --preserve-relations
    Fast progress feedback --progress
    Compressed output Pipe to gzip -c or zstd -c

    Error Handling

    Dialect Detection Issues

    If auto-detection fails, specify explicitly:

    sql-splitter validate dump.sql --dialect=postgres
    sql-splitter validate mssql_dump.sql --dialect=mssql
    sql-splitter convert dump.sql --from mysql --to postgres --output out.sql
    sql-splitter convert dump.sql --from mssql --to mysql --output out.sql
    

    Validation Failures

    • Parse --json output for specific errors
    • Check for encoding issues, missing FKs, duplicate PKs
    • Use --no-fk-checks to skip expensive integrity checks

    Large Files

    • sql-splitter uses constant ~50MB memory
    • Downstream tools may be bottlenecks
    • Test with sample before full operations

    Implementation Checklist

    When using this skill:

    1. Detect applicability: Check for .sql files or dump-related tasks
    2. Clarify intent: Validation? Conversion? Sampling? Splitting?
    3. Choose pattern: Map goal to one of the patterns above
    4. Propose plan: Explain steps before executing
    5. Use safe flags: --dry-run first, then --progress for feedback
    6. Summarize results: Report success/failure with key stats
    Recommended Servers
    ThinAir Data
    ThinAir Data
    PlanetScale
    PlanetScale
    Supabase
    Supabase
    Repository
    helgesverre/sql-splitter
    Files