PostgreSQL patterns for reviewing migrations and writing efficient queries. Use when reviewing Alembic migrations, optimizing queries, or debugging database issues.

PostgreSQL Patterns

Problem Statement

Alembic generates migrations but doesn't understand PostgreSQL performance implications. This skill covers reviewing migrations for PostgreSQL-specific issues and writing efficient queries.

Pattern: Index Review

When to Add Indexes

-- ✅ ADD INDEX: Foreign keys (almost always)
CREATE INDEX ix_assessments_user_id ON assessments (user_id);

-- ✅ ADD INDEX: Frequently filtered columns
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ ADD INDEX: Columns in WHERE + ORDER BY together
CREATE INDEX ix_assessments_user_status ON assessments (user_id, status);

-- ✅ ADD INDEX: Columns used in JOIN conditions
CREATE INDEX ix_answers_question_id ON answers (question_id);

When NOT to Add Indexes

-- ❌ SKIP: Small tables (< 1000 rows)
-- ❌ SKIP: Write-heavy tables with rare reads
-- ❌ SKIP: Low cardinality columns alone (boolean, status with 3 values)
-- ❌ SKIP: Columns rarely used in WHERE/JOIN/ORDER BY

Index Column Order Matters

-- For query: WHERE user_id = ? AND status = ? ORDER BY created_at
-- ✅ CORRECT: Most selective first, ORDER BY column last
CREATE INDEX ix_assessments_user_status_created 
ON assessments (user_id, status, created_at);

-- ❌ WRONG: Order doesn't match query pattern
CREATE INDEX ix_assessments_created_status_user 
ON assessments (created_at, status, user_id);

Pattern: Partial Indexes

Problem: Full index on column where you only query subset of values.

-- Full index (indexes all rows)
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ BETTER: Partial index (only active assessments)
CREATE INDEX ix_assessments_active 
ON assessments (user_id, created_at) 
WHERE status = 'active';

-- Use case: "Get user's active assessments sorted by date"
-- The partial index is smaller and faster

-- Common patterns:
-- WHERE deleted_at IS NULL (soft deletes)
-- WHERE status != 'archived'
-- WHERE is_active = true

In Alembic:

op.execute("""
    CREATE INDEX ix_assessments_active 
    ON assessments (user_id, created_at) 
    WHERE status = 'active'
""")

Pattern: JSONB Indexes

-- GIN index for @> (contains) queries
CREATE INDEX ix_settings_data ON user_settings USING GIN (data);

-- Query: Find users with specific setting
SELECT * FROM user_settings WHERE data @> '{"theme": "dark"}';

-- Expression index for specific JSON path
CREATE INDEX ix_settings_theme ON user_settings ((data->>'theme'));

-- Query: Find by specific key
SELECT * FROM user_settings WHERE data->>'theme' = 'dark';

Pattern: Concurrent Index Creation

Problem: CREATE INDEX locks the table. On large tables, this blocks writes.

-- ❌ BLOCKS WRITES during creation
CREATE INDEX ix_events_user_id ON events (user_id);

-- ✅ DOESN'T BLOCK (but slower to create)
CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id);

In Alembic:

# Must disable transaction for CONCURRENTLY
def upgrade():
    op.execute("COMMIT")  # End current transaction
    op.execute(
        "CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id)"
    )

Pattern: Query Performance Analysis

-- EXPLAIN ANALYZE shows actual execution
EXPLAIN ANALYZE 
SELECT * FROM assessments 
WHERE user_id = 'abc-123' AND status = 'active';

-- What to look for:
-- ✅ "Index Scan" or "Index Only Scan" - good
-- ❌ "Seq Scan" on large table - needs index
-- ❌ "Sort" with high cost - consider index on ORDER BY column
-- ❌ "Nested Loop" with many rows - might need different join strategy

Key metrics:

cost: Estimated units (lower is better)
rows: Estimated row count
actual time: Real milliseconds
loops: How many times executed

Pattern: UUID Performance

-- UUIDs as primary keys have tradeoffs
-- ❌ Random UUIDs (uuid4) cause index fragmentation
-- ✅ Time-ordered UUIDs (uuid7) maintain insertion order

-- If using uuid4, consider:
-- 1. BRIN index for time-ordered queries (if you have created_at)
-- 2. Covering indexes to avoid heap fetches
-- 3. Accept some fragmentation (usually fine under 10M rows)

Pattern: Constraint Review

-- ✅ GOOD: Named constraints (can be dropped/modified)
ALTER TABLE assessments 
ADD CONSTRAINT fk_assessments_user_id 
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

-- ❌ BAD: Unnamed constraints (auto-generated names are ugly)
ALTER TABLE assessments 
ADD FOREIGN KEY (user_id) REFERENCES users(id);

-- ✅ GOOD: CHECK constraints for data integrity
ALTER TABLE assessments 
ADD CONSTRAINT chk_assessments_rating 
CHECK (rating >= 1.0 AND rating <= 5.5);

-- ✅ GOOD: Unique constraints with meaningful names
ALTER TABLE users 
ADD CONSTRAINT uq_users_email UNIQUE (email);

Pattern: Bulk Operations

-- ❌ SLOW: Row-by-row updates
UPDATE users SET role = 'member' WHERE id = 'id1';
UPDATE users SET role = 'member' WHERE id = 'id2';
-- ... thousands more

-- ✅ FAST: Batch update
UPDATE users SET role = 'member' 
WHERE id IN ('id1', 'id2', 'id3', ...);

-- ✅ FAST: Update with subquery
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT user_id FROM legacy_members WHERE migrated = false
);

-- For very large updates, batch to avoid long locks:
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT id FROM users 
    WHERE role IS NULL 
    LIMIT 10000
);
-- Run in loop until no rows affected

Pattern: Table Locking Awareness

Know what locks what:

Operation	Lock Type	Blocks
SELECT	AccessShare	Nothing
INSERT/UPDATE/DELETE	RowExclusive	Nothing (row-level)
CREATE INDEX	ShareLock	INSERT/UPDATE/DELETE
CREATE INDEX CONCURRENTLY	ShareUpdateExclusive	Other schema changes
ALTER TABLE (most)	AccessExclusive	Everything
DROP TABLE	AccessExclusive	Everything

Danger zone:

-- ❌ LOCKS ENTIRE TABLE
ALTER TABLE users ADD COLUMN bio TEXT NOT NULL DEFAULT '';

-- ✅ MINIMAL LOCKING (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN bio TEXT;  -- Fast, nullable
-- Then backfill with UPDATE in batches
-- Then: ALTER TABLE users ALTER COLUMN bio SET NOT NULL;

Pattern: Connection Management

-- Check active connections
SELECT 
    datname,
    usename,
    application_name,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE datname = 'your_db';

-- Kill long-running query
SELECT pg_cancel_backend(pid);  -- Graceful
SELECT pg_terminate_backend(pid);  -- Force

-- Check for locks
SELECT 
    l.locktype,
    l.relation::regclass,
    l.mode,
    l.granted,
    a.usename,
    a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

Pattern: Data Type Choices

Use Case	Type	Notes
Primary key	`UUID`	Use uuid7 for ordering if possible
Foreign key	Match parent type
Timestamps	`TIMESTAMPTZ`	Always with timezone
Money	`NUMERIC(12,2)`	Never FLOAT
JSON data	`JSONB`	Not JSON (JSONB is faster)
Short strings	`VARCHAR(n)`	With reasonable limit
Long text	`TEXT`	No length limit
Boolean	`BOOLEAN`	Not integer
Enum-like	`VARCHAR` or native ENUM	VARCHAR is more flexible

Migration Review Checklist (PostgreSQL-Specific)

Large table indexes use CONCURRENTLY
Foreign keys have ON DELETE behavior specified
Constraints have explicit names
Non-nullable columns on existing tables use 3-step process
Indexes match actual query patterns
Partial indexes considered for filtered queries
No unnecessary indexes on small tables
JSONB columns have appropriate GIN indexes if queried
UUIDs: aware of fragmentation implications
TIMESTAMPTZ used for all timestamps (not TIMESTAMP)

Useful Diagnostic Queries

-- Table sizes
SELECT 
    relname as table,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Index usage
SELECT 
    indexrelname as index,
    idx_scan as times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;  -- Unused indexes at top

-- Slow queries (if pg_stat_statements enabled)
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

PostgreSQL patterns for reviewing migrations and writing efficient queries. Use when reviewing Alembic migrations, optimizing queries, or debugging database issues.

PostgreSQL Patterns

Problem Statement

Alembic generates migrations but doesn't understand PostgreSQL performance implications. This skill covers reviewing migrations for PostgreSQL-specific issues and writing efficient queries.

Pattern: Index Review

When to Add Indexes

-- ✅ ADD INDEX: Foreign keys (almost always)
CREATE INDEX ix_assessments_user_id ON assessments (user_id);

-- ✅ ADD INDEX: Frequently filtered columns
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ ADD INDEX: Columns in WHERE + ORDER BY together
CREATE INDEX ix_assessments_user_status ON assessments (user_id, status);

-- ✅ ADD INDEX: Columns used in JOIN conditions
CREATE INDEX ix_answers_question_id ON answers (question_id);

When NOT to Add Indexes

-- ❌ SKIP: Small tables (< 1000 rows)
-- ❌ SKIP: Write-heavy tables with rare reads
-- ❌ SKIP: Low cardinality columns alone (boolean, status with 3 values)
-- ❌ SKIP: Columns rarely used in WHERE/JOIN/ORDER BY

Index Column Order Matters

-- For query: WHERE user_id = ? AND status = ? ORDER BY created_at
-- ✅ CORRECT: Most selective first, ORDER BY column last
CREATE INDEX ix_assessments_user_status_created 
ON assessments (user_id, status, created_at);

-- ❌ WRONG: Order doesn't match query pattern
CREATE INDEX ix_assessments_created_status_user 
ON assessments (created_at, status, user_id);

Pattern: Partial Indexes

Problem: Full index on column where you only query subset of values.

-- Full index (indexes all rows)
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ BETTER: Partial index (only active assessments)
CREATE INDEX ix_assessments_active 
ON assessments (user_id, created_at) 
WHERE status = 'active';

-- Use case: "Get user's active assessments sorted by date"
-- The partial index is smaller and faster

-- Common patterns:
-- WHERE deleted_at IS NULL (soft deletes)
-- WHERE status != 'archived'
-- WHERE is_active = true

In Alembic:

op.execute("""
    CREATE INDEX ix_assessments_active 
    ON assessments (user_id, created_at) 
    WHERE status = 'active'
""")

Pattern: JSONB Indexes

-- GIN index for @> (contains) queries
CREATE INDEX ix_settings_data ON user_settings USING GIN (data);

-- Query: Find users with specific setting
SELECT * FROM user_settings WHERE data @> '{"theme": "dark"}';

-- Expression index for specific JSON path
CREATE INDEX ix_settings_theme ON user_settings ((data->>'theme'));

-- Query: Find by specific key
SELECT * FROM user_settings WHERE data->>'theme' = 'dark';

Pattern: Concurrent Index Creation

Problem: CREATE INDEX locks the table. On large tables, this blocks writes.

-- ❌ BLOCKS WRITES during creation
CREATE INDEX ix_events_user_id ON events (user_id);

-- ✅ DOESN'T BLOCK (but slower to create)
CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id);

In Alembic:

# Must disable transaction for CONCURRENTLY
def upgrade():
    op.execute("COMMIT")  # End current transaction
    op.execute(
        "CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id)"
    )

Pattern: Query Performance Analysis

-- EXPLAIN ANALYZE shows actual execution
EXPLAIN ANALYZE 
SELECT * FROM assessments 
WHERE user_id = 'abc-123' AND status = 'active';

-- What to look for:
-- ✅ "Index Scan" or "Index Only Scan" - good
-- ❌ "Seq Scan" on large table - needs index
-- ❌ "Sort" with high cost - consider index on ORDER BY column
-- ❌ "Nested Loop" with many rows - might need different join strategy

Key metrics:

cost: Estimated units (lower is better)
rows: Estimated row count
actual time: Real milliseconds
loops: How many times executed

Pattern: UUID Performance

-- UUIDs as primary keys have tradeoffs
-- ❌ Random UUIDs (uuid4) cause index fragmentation
-- ✅ Time-ordered UUIDs (uuid7) maintain insertion order

-- If using uuid4, consider:
-- 1. BRIN index for time-ordered queries (if you have created_at)
-- 2. Covering indexes to avoid heap fetches
-- 3. Accept some fragmentation (usually fine under 10M rows)

Pattern: Constraint Review

-- ✅ GOOD: Named constraints (can be dropped/modified)
ALTER TABLE assessments 
ADD CONSTRAINT fk_assessments_user_id 
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

-- ❌ BAD: Unnamed constraints (auto-generated names are ugly)
ALTER TABLE assessments 
ADD FOREIGN KEY (user_id) REFERENCES users(id);

-- ✅ GOOD: CHECK constraints for data integrity
ALTER TABLE assessments 
ADD CONSTRAINT chk_assessments_rating 
CHECK (rating >= 1.0 AND rating <= 5.5);

-- ✅ GOOD: Unique constraints with meaningful names
ALTER TABLE users 
ADD CONSTRAINT uq_users_email UNIQUE (email);

Pattern: Bulk Operations

-- ❌ SLOW: Row-by-row updates
UPDATE users SET role = 'member' WHERE id = 'id1';
UPDATE users SET role = 'member' WHERE id = 'id2';
-- ... thousands more

-- ✅ FAST: Batch update
UPDATE users SET role = 'member' 
WHERE id IN ('id1', 'id2', 'id3', ...);

-- ✅ FAST: Update with subquery
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT user_id FROM legacy_members WHERE migrated = false
);

-- For very large updates, batch to avoid long locks:
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT id FROM users 
    WHERE role IS NULL 
    LIMIT 10000
);
-- Run in loop until no rows affected

Pattern: Table Locking Awareness

Know what locks what:

Operation	Lock Type	Blocks
SELECT	AccessShare	Nothing
INSERT/UPDATE/DELETE	RowExclusive	Nothing (row-level)
CREATE INDEX	ShareLock	INSERT/UPDATE/DELETE
CREATE INDEX CONCURRENTLY	ShareUpdateExclusive	Other schema changes
ALTER TABLE (most)	AccessExclusive	Everything
DROP TABLE	AccessExclusive	Everything

Danger zone:

-- ❌ LOCKS ENTIRE TABLE
ALTER TABLE users ADD COLUMN bio TEXT NOT NULL DEFAULT '';

-- ✅ MINIMAL LOCKING (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN bio TEXT;  -- Fast, nullable
-- Then backfill with UPDATE in batches
-- Then: ALTER TABLE users ALTER COLUMN bio SET NOT NULL;

Pattern: Connection Management

-- Check active connections
SELECT 
    datname,
    usename,
    application_name,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE datname = 'your_db';

-- Kill long-running query
SELECT pg_cancel_backend(pid);  -- Graceful
SELECT pg_terminate_backend(pid);  -- Force

-- Check for locks
SELECT 
    l.locktype,
    l.relation::regclass,
    l.mode,
    l.granted,
    a.usename,
    a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

Pattern: Data Type Choices

Use Case	Type	Notes
Primary key	`UUID`	Use uuid7 for ordering if possible
Foreign key	Match parent type
Timestamps	`TIMESTAMPTZ`	Always with timezone
Money	`NUMERIC(12,2)`	Never FLOAT
JSON data	`JSONB`	Not JSON (JSONB is faster)
Short strings	`VARCHAR(n)`	With reasonable limit
Long text	`TEXT`	No length limit
Boolean	`BOOLEAN`	Not integer
Enum-like	`VARCHAR` or native ENUM	VARCHAR is more flexible

Migration Review Checklist (PostgreSQL-Specific)

Large table indexes use CONCURRENTLY
Foreign keys have ON DELETE behavior specified
Constraints have explicit names
Non-nullable columns on existing tables use 3-step process
Indexes match actual query patterns
Partial indexes considered for filtered queries
No unnecessary indexes on small tables
JSONB columns have appropriate GIN indexes if queried
UUIDs: aware of fragmentation implications
TIMESTAMPTZ used for all timestamps (not TIMESTAMP)

Useful Diagnostic Queries

-- Table sizes
SELECT 
    relname as table,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Index usage
SELECT 
    indexrelname as index,
    idx_scan as times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;  -- Unused indexes at top

-- Slow queries (if pg_stat_statements enabled)
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

postgres-patterns

About

SKILL.md

postgres-patterns

About

SKILL.md

PostgreSQL Patterns

Problem Statement

Pattern: Index Review

When to Add Indexes

When NOT to Add Indexes

Index Column Order Matters

Pattern: Partial Indexes

Pattern: JSONB Indexes

Pattern: Concurrent Index Creation

Pattern: Query Performance Analysis

Pattern: UUID Performance

Pattern: Constraint Review

Pattern: Bulk Operations

Pattern: Table Locking Awareness

Pattern: Connection Management

Pattern: Data Type Choices

Migration Review Checklist (PostgreSQL-Specific)

Useful Diagnostic Queries

About

SKILL.md

About

SKILL.md

PostgreSQL Patterns

Problem Statement

Pattern: Index Review

When to Add Indexes

When NOT to Add Indexes

Index Column Order Matters

Pattern: Partial Indexes

Pattern: JSONB Indexes

Pattern: Concurrent Index Creation

Pattern: Query Performance Analysis

Pattern: UUID Performance

Pattern: Constraint Review

Pattern: Bulk Operations

Pattern: Table Locking Awareness

Pattern: Connection Management

Pattern: Data Type Choices

Migration Review Checklist (PostgreSQL-Specific)

Useful Diagnostic Queries