Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    elliotjlt

    pre-mortem

    elliotjlt/pre-mortem
    Planning
    11
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Before starting ANY significant task (feature build, refactor, integration, migration, or architectural change), first imagine the project has failed. Generate 3-5 specific failure scenarios, assess...

    SKILL.md

    Pre-mortem

    Elite engineers instinctively ask "how could this fail?" before writing code. Claude does the opposite - jumps to implementation, discovers problems mid-way, patches reactively. This skill encodes prospective hindsight: imagine failure, then prevent it.

    Why This Works

    Gary Klein's research on prospective hindsight shows teams who imagine failure identify 30% more risks than teams who just "plan carefully." The mental shift from "how do we succeed?" to "why did we fail?" unlocks different thinking.

    Instructions

    Activate before ANY task involving: - New feature implementation - External service integration (payments, auth, APIs) - Database migrations or schema changes - Refactoring across multiple files - Dependency upgrades - Infrastructure changes - Anything the user calls "significant" or "important"

    Step 1: Pause Before Implementing

    Do NOT start writing code. First, run analysis:

    python scripts/analyse_risk.py --task "<task description>" --path .
    

    Or manually assess by examining:

    • Files that will be touched
    • External dependencies involved
    • Data/state that could be corrupted
    • User-facing impact if broken

    Step 2: Generate Failure Scenarios

    Imagine it's 2 weeks later. The task failed. Ask: "Why did it fail?"

    Generate 3-5 specific failure scenarios. For each:

    Component Description
    Scenario Concrete failure description
    Likelihood HIGH / MEDIUM / LOW
    Impact What breaks if this happens
    Detection How would we know it failed
    Mitigation How to prevent or reduce risk
    Common failure patterns to consider:

    Integration failures:

    • Auth/credentials misconfigured
    • Webhook signatures not validated
    • Rate limits not handled
    • Timeout/retry logic missing
    • Test vs production environment confusion

    Data failures:

    • Migration corrupts existing data
    • Missing null checks
    • Race conditions on concurrent writes
    • No rollback path
    • Foreign key violations

    Logic failures:

    • Edge cases not handled
    • Off-by-one errors in loops/pagination
    • Timezone/locale assumptions
    • Currency/precision errors
    • State machine invalid transitions

    Operational failures:

    • No logging for debugging
    • Secrets committed to repo
    • No monitoring/alerting
    • Deployment breaks rollback
    • Missing feature flags

    Step 3: Present Pre-mortem

    Format output as:

    Pre-mortem Analysis: [Task Name]
    
    If this fails in 2 weeks, it's probably because:
    
    1. HIGH: [Scenario]
       Impact: [What breaks]
       Mitigation: [How to prevent]
    
    2. HIGH: [Scenario]
       Impact: [What breaks]
       Mitigation: [How to prevent]
    
    3. MED: [Scenario]
       Impact: [What breaks]
       Mitigation: [How to prevent]
    
    4. MED: [Scenario]
       Impact: [What breaks]
       Mitigation: [How to prevent]
    
    5. LOW: [Scenario]
       Impact: [What breaks]
       Mitigation: [How to prevent]
    
    Adjusted Plan:
    [Reordered implementation addressing HIGH risks first]
    
    Proceed with this plan?
    

    Step 4: Adjust Implementation Order

    Reorder the implementation plan to:

    1. Address HIGH risk mitigations first
    2. Build verification/detection early
    3. Create rollback path before risky changes
    4. Add logging before complex logic
    5. Then proceed with main implementation

    Step 5: Get Acknowledgment

    Do NOT proceed until user confirms. They may:

    • Accept the plan
    • Add risks you missed
    • Deprioritise certain mitigations
    • Ask for more detail on specific risks

    Risk Severity Guide

    **HIGH** - Must mitigate before proceeding: - Data loss or corruption possible - Security vulnerability - Money/billing affected - No rollback path - Silent failure (looks fine, actually broken)

    MEDIUM - Mitigate or explicitly accept risk:

    • Poor user experience
    • Performance degradation
    • Technical debt accumulation
    • Manual intervention needed if broken

    LOW - Note but proceed:

    • Cosmetic issues
    • Minor inconvenience
    • Easy to fix post-launch
    • Caught by existing tests

    NEVER

    • Start significant tasks without pre-mortem
    • Generate generic risks (be specific to THIS task)
    • Skip high-risk mitigations to save time
    • Assume "it probably won't happen"
    • Proceed without user acknowledgment

    ALWAYS

    • Be specific: "webhook signature validation" not "security issues"
    • Include detection method for each risk
    • Reorder plan based on risk findings
    • Surface risks even if they're awkward
    • Acknowledge when you're uncertain about a risk

    Examples

    Example 1: Payment Integration

    User: "Add Stripe payment integration"

    Pre-mortem Analysis: Stripe Payment Integration
    
    If this fails in 2 weeks, it's probably because:
    
    1. HIGH: Webhook signature validation wrong
       Impact: Payments process but our system never knows - orders stuck in limbo
       Detection: Customer complaints about "paid but not delivered"
       Mitigation: Test webhooks with Stripe CLI locally before writing business logic
    
    2. HIGH: No idempotency keys
       Impact: Network retry causes duplicate charges
       Detection: Customer charged twice, refund requests
       Mitigation: Implement idempotency keys from the start, not retrofitted
    
    3. HIGH: Test vs live key confusion
       Impact: Real charges in development, or test mode in production
       Detection: Angry customers or "payments not working" in prod
       Mitigation: Env validation on startup, assert key prefix matches environment
    
    4. MED: Missing error states in UI
       Impact: User sees spinner forever, retries, double charge
       Detection: Support tickets, rage clicks in analytics
       Mitigation: Design loading/error/success states before API integration
    
    5. MED: No payment logging
       Impact: Can't debug failed payments, refund disputes unwinnable
       Detection: "We have no record of this" situations
       Mitigation: Log every payment event with correlation ID
    
    Adjusted Plan:
    1. Set up Stripe CLI + webhook endpoint (test signature validation)
    2. Implement idempotency key generation
    3. Add environment validation (key prefix checks)
    4. Build payment service with full logging
    5. Create UI states (loading/error/success)
    6. Then: actual payment flow implementation
    7. End-to-end test with Stripe test cards
    
    Proceed with this plan?
    

    Example 2: Database Migration

    User: "Add soft delete to all models"

    Pre-mortem Analysis: Soft Delete Migration
    
    If this fails in 2 weeks, it's probably because:
    
    1. HIGH: Existing queries return "deleted" records
       Impact: Users see deleted data, privacy/legal issues
       Detection: "Why is this showing up? I deleted it"
       Mitigation: Add default scope, audit ALL existing queries before migration
    
    2. HIGH: Unique constraints now broken
       Impact: Can't recreate record with same email/username after soft delete
       Detection: "Email already exists" on new signup
       Mitigation: Make unique constraints exclude deleted records, or use different approach
    
    3. HIGH: No rollback path
       Impact: If migration fails midway, data integrity unknown
       Detection: 500 errors, inconsistent state
       Mitigation: Test on production clone first, have restore procedure ready
    
    4. MED: Cascade deletes still hard-delete related records
       Impact: Parent soft-deleted but children gone forever
       Detection: Missing related data after restore
       Mitigation: Audit all ON DELETE CASCADE, convert to soft delete
    
    5. MED: Analytics/reporting now wrong
       Impact: Counts include deleted records, metrics inflated
       Detection: "Numbers don't match the UI"
       Mitigation: Update all aggregate queries to filter deleted
    
    Adjusted Plan:
    1. Audit all existing queries (grep for affected tables)
    2. Audit unique constraints and cascades
    3. Design soft delete schema (deleted_at column approach)
    4. Create migration with rollback script
    5. Test on staging with production data clone
    6. Update queries one-by-one with tests
    7. Then: deploy migration
    8. Monitor for "deleted showing" errors
    
    Proceed with this plan?
    

    Example 3: Minor Task (Pre-mortem Not Needed)

    User: "Fix the typo in the error message"

    This is a trivial change (single string edit). Pre-mortem not needed.
    
    Proceeding directly.
    
    What DOESN'T work:
    • Generic risks: "Security issues" is useless. "Webhook signature validation wrong" is actionable.
    • Skipping for "simple" tasks: The "simple" migration that corrupts production data.
    • Listing risks without mitigations: Identifying problems without solutions is just anxiety.
    • Not reordering the plan: If you identify HIGH risks but still implement in original order, you've wasted the exercise.
    • Perfunctory pre-mortems: Going through motions without real imagination. Ask "how would this ACTUALLY fail?"
    • Only technical risks: Missing "team doesn't understand the change" or "deploy on Friday before vacation"
    • Assuming user will read carefully: Present the most critical risks prominently, not buried in a list.
    • Pre-mortem paralysis: 15 risks identified, nothing gets built. Focus on HIGH risks, accept some LOW risk.
    Recommended Servers
    Neon
    Neon
    Vercel
    Vercel
    Linear
    Linear
    Repository
    elliotjlt/claude-skill-potions
    Files