characterization-testing

spacholski1225/characterization-testing

Coding

About

SKILL.md

characterization-testing

spacholski1225/characterization-testing

Coding

About

Create tests that describe what legacy code actually does (not what it should do) as safety net before refactoring

SKILL.md

Characterization Testing

Overview

Characterization tests capture current behavior of legacy code, warts and all. They're a safety net before refactoring, not a specification of correctness.

Core principle: Document what IS, not what SHOULD BE. Fix behavior later, after safety net exists.

This is NOT unit testing. Unit tests specify desired behavior. Characterization tests document actual behavior.

When to Use

Use characterization testing when:

Legacy code has no automated tests
Unclear what code is supposed to do
Before refactoring risky/critical areas
Documentation doesn't match reality
Need safety net without understanding all edge cases

Don't use when:

Code already has comprehensive tests
You're implementing new features (use TDD instead)
Code is so simple testing is unnecessary

The Iron Law

NO REFACTORING WITHOUT CHARACTERIZATION TESTS FIRST

Refactoring without tests = gambling with production. Always create safety net first.

The Process

Step 1: Identify Target

Choose smallest meaningful unit to characterize:

Single function/method (best starting point)
Single class (if functions are tightly coupled)
Module (if class boundaries unclear)

Start small. You can always expand coverage later.

Step 2: Write Failing Test

Write test with unknown expectation:

test('processes user data', () => {
  const result = processUserData({ name: 'John', age: 30 });
  expect(result).toEqual(/* ??? what does it return? */);
});

Don't guess. Leave expectation blank or use placeholder.

Step 3: Run and Capture

Run the test. It will fail. Copy the actual output exactly:

$ npm test
FAIL: expected ???, received { fullName: 'John', isAdult: true, category: 'standard' }

This is the characterization: what the code actually does right now.

Step 4: Lock In Behavior

Update test with actual output:

test('processes user data', () => {
  const result = processUserData({ name: 'John', age: 30 });
  expect(result).toEqual({
    fullName: 'John',
    isAdult: true,
    category: 'standard'
  });
});

Run test again → should pass. You've characterized the behavior.

Step 5: Add Edge Cases

Find weird inputs and capture outputs:

test('handles missing age', () => {
  const result = processUserData({ name: 'John' });
  // Run test, see what happens, lock it in
  expect(result).toEqual({
    fullName: 'John',
    isAdult: false,
    category: 'unknown'
  });
});

test('handles negative age (current behavior - BUG)', () => {
  const result = processUserData({ name: 'John', age: -5 });
  // This is wrong but it's what code does now
  expect(result).toEqual({
    fullName: 'John',
    isAdult: true,  // BUG: negative age treated as adult!
    category: 'standard'
  });
});

test('handles empty name', () => {
  const result = processUserData({ name: '', age: 30 });
  expect(result).toEqual({
    fullName: '',
    isAdult: true,
    category: 'standard'
  });
});

test('handles null input', () => {
  // Might throw error, might return null - capture what happens
  expect(() => processUserData(null)).toThrow('Cannot read property');
});

Key insight: You're documenting bugs, not fixing them. Tests show what code does, including incorrect behavior.

Step 6: Document Known Issues

Mark tests for known bugs:

test.skip('FIXME: should reject negative age', () => {
  // This is what SHOULD happen (not what happens now)
  expect(() => processUserData({ name: 'John', age: -5 }))
    .toThrow('Invalid age: must be non-negative');
});

test('handles negative age (CURRENT BEHAVIOR - BUG)', () => {
  // This is what ACTUALLY happens now
  const result = processUserData({ name: 'John', age: -5 });
  expect(result.isAdult).toBe(true); // Wrong! But it's current behavior
});

Why both tests?

.skip test shows desired behavior (for future)
Active test locks in current behavior (prevents regressions during refactoring)

Step 7: Verify Coverage

Ensure main execution paths covered:

Happy path (valid inputs)
Edge cases (empty, null, undefined, zero, negative)
Boundary values (max/min for your domain)
Error cases (invalid inputs, external failures)

Not 100% code coverage. Focus on behavior coverage: scenarios that matter.

Checklist

Identified smallest testable unit
Wrote test with unknown expectation (???)
Ran test and captured actual output
Locked in current behavior (test passes)
Added edge cases (empty, null, invalid, boundary values)
Documented known bugs with comments
Created .skip tests for desired behavior (future fixes)
All tests pass (green for current behavior)
Tests cover main execution paths

Example: Full Workflow

Legacy code we need to refactor:

function calculateDiscount(user, cart) {
  let total = 0;
  for (let i = 0; i < cart.items.length; i++) {
    total += cart.items[i].price * cart.items[i].quantity;
  }

  if (user.isPremium) {
    total = total * 0.9;
  }

  if (cart.items.length > 5) {
    total = total * 0.95;
  }

  return Math.round(total * 100) / 100;
}

Characterization tests:

describe('calculateDiscount - characterization', () => {
  test('standard user, small cart', () => {
    const user = { isPremium: false };
    const cart = {
      items: [
        { price: 10, quantity: 2 },
        { price: 5, quantity: 1 }
      ]
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(25); // 10*2 + 5*1 = 25
  });

  test('premium user gets 10% discount', () => {
    const user = { isPremium: true };
    const cart = { items: [{ price: 100, quantity: 1 }] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(90); // 100 * 0.9 = 90
  });

  test('more than 5 items gets additional 5% discount', () => {
    const user = { isPremium: false };
    const cart = {
      items: Array(6).fill({ price: 10, quantity: 1 })
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(57); // 60 * 0.95 = 57
  });

  test('premium + bulk discounts stack (CURRENT BEHAVIOR)', () => {
    const user = { isPremium: true };
    const cart = {
      items: Array(6).fill({ price: 10, quantity: 1 })
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(51.3); // 60 * 0.9 * 0.95 = 51.3
  });

  test('empty cart returns 0', () => {
    const user = { isPremium: false };
    const cart = { items: [] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(0);
  });

  test('missing isPremium field (CURRENT BEHAVIOR - BUG?)', () => {
    const user = {}; // no isPremium field
    const cart = { items: [{ price: 100, quantity: 1 }] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(100); // Falsy check treats missing as non-premium
  });

  test('null user throws error', () => {
    const cart = { items: [{ price: 100, quantity: 1 }] };

    expect(() => calculateDiscount(null, cart))
      .toThrow("Cannot read property 'isPremium' of null");
  });
});

Now safe to refactor! If refactoring breaks these tests, you've changed behavior (maybe accidentally).

Anti-Patterns

❌ Fixing Bugs While Characterizing

Bad:

test('negative price should be rejected', () => {
  expect(() => calculateDiscount(user, { items: [{ price: -10, quantity: 1 }] }))
    .toThrow('Invalid price');
});

This is what SHOULD happen, not what DOES happen. You're writing specification, not characterization.

Good:

test('negative price (CURRENT BEHAVIOR - BUG)', () => {
  const user = { isPremium: false };
  const cart = { items: [{ price: -10, quantity: 1 }] };

  const result = calculateDiscount(user, cart);
  expect(result).toBe(-10); // Bug: negative total! But this is current behavior
});

test.skip('FIXME: negative price should be rejected', () => {
  // This is desired future behavior
  expect(() => calculateDiscount(user, { items: [{ price: -10, quantity: 1 }] }))
    .toThrow('Invalid price');
});

❌ Refactoring Before Tests

Bad:

1. Look at legacy code
2. "This is messy, let me clean it up"
3. Refactor
4. Add tests

Good:

1. Look at legacy code
2. Add characterization tests
3. Verify tests pass
4. Refactor with confidence
5. Tests still pass → safe refactoring

❌ Mocking Everything

Bad:

test('calls database with correct params', () => {
  const mockDB = jest.fn();
  processUserData(mockDB, user);
  expect(mockDB).toHaveBeenCalledWith('users', { id: 123 });
});

This tests interactions, not behavior. You don't know what the function returns.

Good:

test('processes user data from database', () => {
  // Use real database or test database
  const result = processUserData({ id: 123 });
  expect(result).toEqual({ name: 'John', email: 'john@example.com' });
});

Characterization tests should test real behavior with real dependencies when possible.

❌ Skipping "Embarrassing" Bugs

Bad:

// I found this bug but I'm not going to test it because it's embarrassing

Good:

test('allows XSS in user input (CURRENT BEHAVIOR - SECURITY BUG)', () => {
  const result = renderUserProfile({ name: '<script>alert("xss")</script>' });
  expect(result).toContain('<script>alert("xss")</script>');
  // Bug exists! But test documents it so we can fix it later
});

Document all bugs, especially security issues. Better to know than to be surprised.

Common Rationalizations

Excuse	Reality
"Code is too complex to test"	Characterization tests don't need full understanding. Capture behavior empirically.
"I'll refactor, then add tests"	Refactoring without tests = hoping you didn't break anything. Tests first.
"Tests will take too long"	Hours of characterization vs days of debugging production. Tests are faster.
"I know what the code should do"	Great! But what does it actually do? They might differ.
"I'll just be careful"	You will miss edge cases. Tests catch what you forget.
"Bugs are embarrassing to test"	Documented bugs can be fixed. Hidden bugs cause incidents.

After Characterization

Now you have safety net. Next steps:

Refactor with confidence - Tests catch if you break something
Fix bugs one at a time - Update characterization test to desired behavior
Add unit tests - For new features, use TDD going forward
Remove characterization tests - Once you have proper unit tests covering behavior

Characterization tests are temporary. They're scaffolding for refactoring, not permanent test suite.

Integration with Other Skills

skills/analysis/code-archaeology - Understand code before characterizing
skills/refactoring/strangler-fig-pattern - Replace characterized code safely
skills/testing/test-driven-development - Add new features with TDD after characterization
skills/refactoring/seam-finding - Find boundaries for characterization
skills/safety/approval-testing - Alternative for complex outputs

Remember

Characterization tests document what IS, not what SHOULD BE
Run test → capture output → lock it in
Document bugs, don't fix them (yet)
Tests are safety net for refactoring
NO REFACTORING without characterization tests first
Characterization tests are temporary scaffolding

About

SKILL.md

About

Create tests that describe what legacy code actually does (not what it should do) as safety net before refactoring

SKILL.md

Characterization Testing

Overview

Characterization tests capture current behavior of legacy code, warts and all. They're a safety net before refactoring, not a specification of correctness.

Core principle: Document what IS, not what SHOULD BE. Fix behavior later, after safety net exists.

This is NOT unit testing. Unit tests specify desired behavior. Characterization tests document actual behavior.

When to Use

Use characterization testing when:

Legacy code has no automated tests
Unclear what code is supposed to do
Before refactoring risky/critical areas
Documentation doesn't match reality
Need safety net without understanding all edge cases

Don't use when:

Code already has comprehensive tests
You're implementing new features (use TDD instead)
Code is so simple testing is unnecessary

The Iron Law

NO REFACTORING WITHOUT CHARACTERIZATION TESTS FIRST

Refactoring without tests = gambling with production. Always create safety net first.

The Process

Step 1: Identify Target

Choose smallest meaningful unit to characterize:

Single function/method (best starting point)
Single class (if functions are tightly coupled)
Module (if class boundaries unclear)

Start small. You can always expand coverage later.

Step 2: Write Failing Test

Write test with unknown expectation:

test('processes user data', () => {
  const result = processUserData({ name: 'John', age: 30 });
  expect(result).toEqual(/* ??? what does it return? */);
});

Don't guess. Leave expectation blank or use placeholder.

Step 3: Run and Capture

Run the test. It will fail. Copy the actual output exactly:

$ npm test
FAIL: expected ???, received { fullName: 'John', isAdult: true, category: 'standard' }

This is the characterization: what the code actually does right now.

Step 4: Lock In Behavior

Update test with actual output:

test('processes user data', () => {
  const result = processUserData({ name: 'John', age: 30 });
  expect(result).toEqual({
    fullName: 'John',
    isAdult: true,
    category: 'standard'
  });
});

Run test again → should pass. You've characterized the behavior.

Step 5: Add Edge Cases

Find weird inputs and capture outputs:

test('handles missing age', () => {
  const result = processUserData({ name: 'John' });
  // Run test, see what happens, lock it in
  expect(result).toEqual({
    fullName: 'John',
    isAdult: false,
    category: 'unknown'
  });
});

test('handles negative age (current behavior - BUG)', () => {
  const result = processUserData({ name: 'John', age: -5 });
  // This is wrong but it's what code does now
  expect(result).toEqual({
    fullName: 'John',
    isAdult: true,  // BUG: negative age treated as adult!
    category: 'standard'
  });
});

test('handles empty name', () => {
  const result = processUserData({ name: '', age: 30 });
  expect(result).toEqual({
    fullName: '',
    isAdult: true,
    category: 'standard'
  });
});

test('handles null input', () => {
  // Might throw error, might return null - capture what happens
  expect(() => processUserData(null)).toThrow('Cannot read property');
});

Key insight: You're documenting bugs, not fixing them. Tests show what code does, including incorrect behavior.

Step 6: Document Known Issues

Mark tests for known bugs:

test.skip('FIXME: should reject negative age', () => {
  // This is what SHOULD happen (not what happens now)
  expect(() => processUserData({ name: 'John', age: -5 }))
    .toThrow('Invalid age: must be non-negative');
});

test('handles negative age (CURRENT BEHAVIOR - BUG)', () => {
  // This is what ACTUALLY happens now
  const result = processUserData({ name: 'John', age: -5 });
  expect(result.isAdult).toBe(true); // Wrong! But it's current behavior
});

Why both tests?

.skip test shows desired behavior (for future)
Active test locks in current behavior (prevents regressions during refactoring)

Step 7: Verify Coverage

Ensure main execution paths covered:

Happy path (valid inputs)
Edge cases (empty, null, undefined, zero, negative)
Boundary values (max/min for your domain)
Error cases (invalid inputs, external failures)

Not 100% code coverage. Focus on behavior coverage: scenarios that matter.

Checklist

Identified smallest testable unit
Wrote test with unknown expectation (???)
Ran test and captured actual output
Locked in current behavior (test passes)
Added edge cases (empty, null, invalid, boundary values)
Documented known bugs with comments
Created .skip tests for desired behavior (future fixes)
All tests pass (green for current behavior)
Tests cover main execution paths

Example: Full Workflow

Legacy code we need to refactor:

function calculateDiscount(user, cart) {
  let total = 0;
  for (let i = 0; i < cart.items.length; i++) {
    total += cart.items[i].price * cart.items[i].quantity;
  }

  if (user.isPremium) {
    total = total * 0.9;
  }

  if (cart.items.length > 5) {
    total = total * 0.95;
  }

  return Math.round(total * 100) / 100;
}

Characterization tests:

describe('calculateDiscount - characterization', () => {
  test('standard user, small cart', () => {
    const user = { isPremium: false };
    const cart = {
      items: [
        { price: 10, quantity: 2 },
        { price: 5, quantity: 1 }
      ]
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(25); // 10*2 + 5*1 = 25
  });

  test('premium user gets 10% discount', () => {
    const user = { isPremium: true };
    const cart = { items: [{ price: 100, quantity: 1 }] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(90); // 100 * 0.9 = 90
  });

  test('more than 5 items gets additional 5% discount', () => {
    const user = { isPremium: false };
    const cart = {
      items: Array(6).fill({ price: 10, quantity: 1 })
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(57); // 60 * 0.95 = 57
  });

  test('premium + bulk discounts stack (CURRENT BEHAVIOR)', () => {
    const user = { isPremium: true };
    const cart = {
      items: Array(6).fill({ price: 10, quantity: 1 })
    };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(51.3); // 60 * 0.9 * 0.95 = 51.3
  });

  test('empty cart returns 0', () => {
    const user = { isPremium: false };
    const cart = { items: [] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(0);
  });

  test('missing isPremium field (CURRENT BEHAVIOR - BUG?)', () => {
    const user = {}; // no isPremium field
    const cart = { items: [{ price: 100, quantity: 1 }] };

    const result = calculateDiscount(user, cart);
    expect(result).toBe(100); // Falsy check treats missing as non-premium
  });

  test('null user throws error', () => {
    const cart = { items: [{ price: 100, quantity: 1 }] };

    expect(() => calculateDiscount(null, cart))
      .toThrow("Cannot read property 'isPremium' of null");
  });
});

Now safe to refactor! If refactoring breaks these tests, you've changed behavior (maybe accidentally).

Anti-Patterns

❌ Fixing Bugs While Characterizing

Bad:

test('negative price should be rejected', () => {
  expect(() => calculateDiscount(user, { items: [{ price: -10, quantity: 1 }] }))
    .toThrow('Invalid price');
});

This is what SHOULD happen, not what DOES happen. You're writing specification, not characterization.

Good:

test('negative price (CURRENT BEHAVIOR - BUG)', () => {
  const user = { isPremium: false };
  const cart = { items: [{ price: -10, quantity: 1 }] };

  const result = calculateDiscount(user, cart);
  expect(result).toBe(-10); // Bug: negative total! But this is current behavior
});

test.skip('FIXME: negative price should be rejected', () => {
  // This is desired future behavior
  expect(() => calculateDiscount(user, { items: [{ price: -10, quantity: 1 }] }))
    .toThrow('Invalid price');
});

❌ Refactoring Before Tests

Bad:

1. Look at legacy code
2. "This is messy, let me clean it up"
3. Refactor
4. Add tests

Good:

1. Look at legacy code
2. Add characterization tests
3. Verify tests pass
4. Refactor with confidence
5. Tests still pass → safe refactoring

❌ Mocking Everything

Bad:

test('calls database with correct params', () => {
  const mockDB = jest.fn();
  processUserData(mockDB, user);
  expect(mockDB).toHaveBeenCalledWith('users', { id: 123 });
});

This tests interactions, not behavior. You don't know what the function returns.

Good:

test('processes user data from database', () => {
  // Use real database or test database
  const result = processUserData({ id: 123 });
  expect(result).toEqual({ name: 'John', email: 'john@example.com' });
});

Characterization tests should test real behavior with real dependencies when possible.

❌ Skipping "Embarrassing" Bugs

Bad:

// I found this bug but I'm not going to test it because it's embarrassing

Good:

test('allows XSS in user input (CURRENT BEHAVIOR - SECURITY BUG)', () => {
  const result = renderUserProfile({ name: '<script>alert("xss")</script>' });
  expect(result).toContain('<script>alert("xss")</script>');
  // Bug exists! But test documents it so we can fix it later
});

Document all bugs, especially security issues. Better to know than to be surprised.

Common Rationalizations

Excuse	Reality
"Code is too complex to test"	Characterization tests don't need full understanding. Capture behavior empirically.
"I'll refactor, then add tests"	Refactoring without tests = hoping you didn't break anything. Tests first.
"Tests will take too long"	Hours of characterization vs days of debugging production. Tests are faster.
"I know what the code should do"	Great! But what does it actually do? They might differ.
"I'll just be careful"	You will miss edge cases. Tests catch what you forget.
"Bugs are embarrassing to test"	Documented bugs can be fixed. Hidden bugs cause incidents.

After Characterization

Now you have safety net. Next steps:

Refactor with confidence - Tests catch if you break something
Fix bugs one at a time - Update characterization test to desired behavior
Add unit tests - For new features, use TDD going forward
Remove characterization tests - Once you have proper unit tests covering behavior

Characterization tests are temporary. They're scaffolding for refactoring, not permanent test suite.

Integration with Other Skills

skills/analysis/code-archaeology - Understand code before characterizing
skills/refactoring/strangler-fig-pattern - Replace characterized code safely
skills/testing/test-driven-development - Add new features with TDD after characterization
skills/refactoring/seam-finding - Find boundaries for characterization
skills/safety/approval-testing - Alternative for complex outputs

Remember

Characterization tests document what IS, not what SHOULD BE
Run test → capture output → lock it in
Document bugs, don't fix them (yet)
Tests are safety net for refactoring
NO REFACTORING without characterization tests first
Characterization tests are temporary scaffolding