Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    elliotjlt

    prove-it

    elliotjlt/prove-it
    Coding
    11

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Before declaring any task complete, actually verify the outcome. Run the code. Test the fix. Check the output. Claude's training optimizes for plausible-looking output, not verified-correct output...

    SKILL.md

    Prove It

    Claude generates code by pattern-matching on training data. Something can look syntactically perfect, follow best practices, and still be wrong. The model optimizes for "looks right" not "works right." Verification is a separate cognitive step that must be explicitly triggered. This skill closes the loop between implementation and proof.

    Why This Matters (Technical Reality)

    Claude's limitations that this skill addresses:

    1. Generation vs Execution I generate code but don't run it. I predict what it would do based on patterns. My confidence comes from "this looks like working code I've seen" not from "I executed this and observed the result."

    2. Training Signal Mismatch My training optimizes for plausible next-token prediction, not outcome verification. Saying "Done!" feels natural. Verifying feels like extra work. But verification is where correctness actually lives.

    3. Pattern-Matching Blindness Code that matches common patterns feels correct. But subtle bugs hide in the gaps between patterns. Off-by-one errors. Wrong variable names. Missing edge cases. These "look right" but aren't.

    4. Confidence-Correctness Gap High confidence in my output doesn't correlate with actual correctness. I'm often most confident when I'm most wrong, because the wrong answer pattern-matched strongly.

    5. No Feedback Loop I generate sequentially. I don't naturally go back and check. Without explicit verification, errors compound silently.

    When To Verify

    ALWAYS verify before declaring complete:

    Code Changes:

    • New functions or modules
    • Bug fixes
    • Refactoring
    • Configuration changes
    • Build/deploy scripts

    Fixes:

    • "Fixed the bug" - did you reproduce and confirm it's gone?
    • "Resolved the error" - did you trigger the error path again?
    • "Updated the config" - did you restart and test?

    Claims:

    • Factual statements that matter to the decision
    • "This will work because..." - did you prove it?
    • "The file contains..." - did you actually read it?

    Instructions

    Step 1: Catch The Victory Lap

    Before saying any of these:

    • "Done!"
    • "That should work"
    • "I've implemented..."
    • "The fix is..."
    • "Complete"

    STOP. You haven't verified yet.

    Step 2: Determine Verification Method

    Change Type Verification
    New code Run it with test input
    Bug fix Reproduce original bug, confirm fixed
    Function change Call the function, check output
    Config change Restart service, test affected feature
    Build script Run the build
    API endpoint Make a request
    UI change Describe what user should see, or screenshot

    Step 3: Actually Verify

    # Don't just write the test - run it
    python -m pytest tests/test_new_feature.py
    
    # Don't just fix the code - prove the fix
    python -c "from module import func; print(func(edge_case))"
    
    # Don't just update config - verify it loads
    node -e "console.log(require('./config.js'))"
    

    Step 4: Report With Evidence

    Verified:
    
    What I changed:
      - Added input validation to user_signup()
    
    How I verified:
      - Ran: python -c "from auth import user_signup; user_signup('')"
      - Expected: ValidationError
      - Got: ValidationError("Email required")
    
    Proof that it works. Done.
    

    Verification Patterns

    Pattern 1: The Smoke Test

    Minimal test that proves basic functionality:

    # After writing a new function
    python -c "from new_module import new_func; print(new_func('test'))"
    

    If this crashes, you're not done.

    Pattern 2: The Regression Check

    After fixing a bug, trigger the original failure:

    # Bug was: crash on empty input
    python -c "from module import func; func('')"
    # Should not crash anymore
    

    Pattern 3: The Build Gate

    Before claiming code is complete:

    # Does it at least compile/parse?
    python -m py_compile new_file.py
    npm run build
    cargo check
    

    Pattern 4: The Integration Smell Test

    After changes that affect multiple components:

    # Start the service
    npm run dev &
    
    # Hit the affected endpoint
    curl http://localhost:3000/affected-route
    
    # Check for expected response
    

    The Verification Checklist

    Before declaring done:

    [ ] Code runs without syntax errors
    [ ] Function produces expected output on happy path
    [ ] Edge case handled (empty, null, boundary)
    [ ] Original bug/issue no longer reproduces
    [ ] No new errors introduced (check logs)
    [ ] Build passes (if applicable)
    

    NEVER

    • Say "done" without running the code
    • Assume it works because it looks right
    • Skip verification because you're confident
    • Claim a bug is fixed without reproducing it first
    • Trust that generated code is correct
    • Let pattern-matching substitute for proof

    ALWAYS

    • Run new code at least once
    • Test the specific thing that was broken
    • Show verification output in your response
    • Admit when you can't verify (and say why)
    • Treat verification as required, not optional

    Examples

    Example 1: Bad (No Verification)

    User: "Fix the divide by zero error"
    
    *edits code to add check*
    
    Done! Added a check for zero before dividing. The function now returns
    None if the divisor is zero.
    

    This is a lie. I didn't verify anything.

    Example 1: Good (With Verification)

    User: "Fix the divide by zero error"
    
    *edits code to add check*
    
    Let me verify the fix:
    
    $ python -c "from math_utils import safe_divide; print(safe_divide(10, 0))"
    None
    
    $ python -c "from math_utils import safe_divide; print(safe_divide(10, 2))"
    5.0
    
    Verified:
    - Division by zero returns None (was: crashed)
    - Normal division still works
    - Fix confirmed.
    

    Example 2: Admitting Inability To Verify

    User: "Update the production config"
    
    I've updated config/production.json with the new API endpoint.
    
    Verification status: UNABLE TO VERIFY
    - This is production config
    - I can't safely test against production
    - Recommend: Test in staging first, or have ops verify after deploy
    
    Change is made but unverified. Proceed with caution.
    

    Example 3: Build Verification

    User: "Add TypeScript types to the utils"
    
    *adds types*
    
    Verifying:
    
    $ npm run typecheck
    src/utils.ts - 0 errors
    
    $ npm run build
    Build successful - dist/utils.js generated
    
    Verified: Types compile without errors, build passes.
    
    What DOESN'T work:
    • "I'll verify later": Later never comes. Verify immediately or it won't happen.
    • "It looks right": Pattern-matching is not verification. Code that looks perfect can be completely wrong.
    • "The tests will catch it": Tests you haven't run don't catch anything. Run them.
    • "It's a small change": Small changes cause big breaks. One-character typos crash production.
    • Verifying only the happy path: The bug is in the edge case you didn't test.
    • Reading the code as verification: Reading is not running. Execute it.
    • "It worked before": Before is not now. Dependencies change. Environments differ.
    • Trusting the type checker: Types compile doesn't mean logic is correct.

    The Hard Truth

    When I say "Done!" without verifying, I'm not reporting a fact. I'm making a prediction based on pattern-matching. Sometimes that prediction is wrong.

    Verification converts prediction into observation. It's the difference between "this should work" and "this works."

    One is a guess. One is proof.

    Prove it.

    Recommended Servers
    Google Tasks
    Google Tasks
    Asana
    Asana
    Ticktick
    Ticktick
    Repository
    elliotjlt/claude-skill-potions
    Files