Write behavior-focused tests following Testing Trophy model with real dependencies, avoiding common anti-patterns like testing mocks and polluting production code...
Core Philosophy: Test user-observable behavior with real dependencies. Tests should survive refactoring when behavior is unchanged.
Iron Laws:
Write tests in this priority order:
Default to integration tests. Only drop to unit tests for pure utility functions.
BEFORE writing any tests, copy this checklist and track your progress:
Test Writing Progress:
- [ ] Step 1: Review project standards (check existing tests)
- [ ] Step 2: Understand behavior (what should it do? what can fail?)
- [ ] Step 3: Choose test type (Integration/E2E/Unit)
- [ ] Step 4: Identify dependencies (real vs mocked)
- [ ] Step 5: Write failing test first (TDD)
- [ ] Step 6: Implement minimal code to pass
- [ ] Step 7: Verify coverage (happy path, errors, edge cases)
Before writing any tests:
Is this a complete user workflow?
→ YES: E2E test
Is this a pure function (no side effects/dependencies)?
→ YES: Unit test
Everything else:
→ Integration test (with real dependencies)
Default: Don't mock. Use real dependencies.
Why: Mocking internal dependencies creates brittle tests that break during refactoring.
describe("Feature Name", () => {
setup(initialState)
test("should produce expected output when action is performed", () => {
// Arrange: Set up preconditions
// Act: Perform the action being tested
// Assert: Verify observable output
})
})
Key principles:
For language-specific patterns, see the Language-Specific Patterns section.
When tests involve async operations, avoid arbitrary timeouts:
// BAD: Guessing at timing
sleep(500)
assert result == expected
// GOOD: Wait for the actual condition
wait_for(lambda: result == expected)
When to use condition-based waiting:
sleep, setTimeout, or arbitrary delaysDelegate to skill: When you encounter these patterns, invoke Skill(ce:condition-based-waiting) for detailed guidance on implementing proper condition polling and fixing flaky tests.
The Golden Rule of Assertions: A test must fail if, and only if, the intention behind the system is not met.
This rule is bidirectional. A test is broken when it:
Before merging any test, ask: "When will this test fail?" If the answer includes anything other than "when the behavior this test describes is broken," the test needs work.
| Context | Assert On | Avoid |
|---|---|---|
| UI | Visible text, accessibility roles, user-visible state | CSS classes, internal state, test IDs |
| API | Response body, status code, headers | Internal DB state directly |
| CLI | stdout/stderr, exit code | Internal variables |
| Library | Return values, documented side effects | Private methods, internal state |
Why: Tests that assert on implementation details break when you refactor, even if behavior is unchanged.
A test that makes a real network request doesn't just test your function -- it tests DNS resolution, network connectivity, server uptime, and response timing. When any of those fail, your test fails even though your code is fine. That violates the Golden Rule.
Fix: Mock at the boundary of what you don't own or control. The function under test isn't responsible for the server's validity -- it's responsible for making the right request and handling the response correctly. Use API mocking (e.g., MSW, respx, httpmock) to make external interactions fixed, predictable givens.
This is not a contradiction with "default: don't mock." Internal modules stay real. External boundaries (network, third-party services) get mocked so your test only fails when your code's intention is broken.
Use source constants and fixtures, not hard-coded values:
// Good - References actual constant or fixture
expected_message = APP_MESSAGES.SUCCESS
assert response.message == expected_message
// Bad - Hard-coded, breaks when copy changes
assert response.message == "Action completed successfully!"
Why: When product copy changes, you want one place to update, not every test file.
// BAD: Testing that the mock was called, not real behavior
mock_service.assert_called_once()
// GOOD: Test the actual outcome
assert user.is_active == True
assert len(sent_emails) == 1
Gate: Before asserting on mock calls, ask "Am I testing real behavior or mock interactions?" If testing mocks → Stop, test the actual outcome instead.
// BAD: destroy() only used in tests - pollutes production code
class Session:
def destroy(self): # Only exists for test cleanup
...
// GOOD: Test utilities handle cleanup
# In test_utils.py
def cleanup_session(session):
# Access internals here, not in production code
...
Gate: Before adding methods to production code, ask "Is this only for tests?" Yes → Put in test utilities.
// BAD: Mock prevents side effect test actually needs
mock(database.save) # Now duplicate detection won't work!
add_item(item)
add_item(item) # Should fail as duplicate, but won't
// GOOD: Mock at correct level
mock(external_api.validate) # Mock slow external call only
add_item(item) # DB save works, duplicate detected
add_item(item) # Fails correctly
// BAD: Partial mock - missing fields downstream code needs
mock_response = {
status: "success",
data: {...}
// Missing: metadata.request_id that downstream code uses
}
// GOOD: Mirror real API completely
mock_response = {
status: "success",
data: {...},
metadata: {request_id: "...", timestamp: ...}
}
Gate: Before creating mocks, check "What does the real thing return?" Include ALL fields.
If testing mock behavior, you violated TDD - you added mocks without watching test fail against real code.
For detailed framework and language-specific patterns:
references/javascript-react.md for React Testing Library queries, Jest/Vitest setup, Playwright E2E, and component testing patternsreferences/python.md for pytest fixtures, polyfactory, respx mocking, testcontainers, and FastAPI testingreferences/go.md for table-driven tests, testify/go-cmp assertions, testcontainers-go, and interface fakesBefore completing tests, verify:
Test behavior users/callers observe, not code structure.
| Test Type | When | Dependencies |
|---|---|---|
| Integration | Default choice | Real (test DB, real modules) |
| E2E | Critical user workflows | Real (full stack) |
| Unit | Pure functions only | None |
| Anti-Pattern | Fix |
|---|---|
| Testing mock existence | Test actual outcome instead |
| Test-only methods in production | Move to test utilities |
| Mocking without understanding | Understand dependencies, mock minimally |
| Incomplete mocks | Mirror real API completely |
| Tests as afterthought | TDD - write tests first |
| Arbitrary timeouts/sleeps | Use condition-based waiting |