Software Architecture Review Protocol
REVIEW_FILE = specs/architecture/architecture-review.md
Recommended Tools: Make sure you've read ~/.liza/AGENT_TOOLS.md list_directory_tree and codebase_search (fast and token-efficient semantic search) may be specifically useful.
Architecture is about trade-offs, not truths. Raise questions, suggest directions, avoid astronautics.
Invoked for: implementation planning, code review (P3 supplement), or explicit architectural evaluation.
Discovery → Analysis → Recommendations
Templates anchor cognition. Complete Phase 1 before Phase 2. The skill is a lens to apply to what you found, not slots to fill.
| Mode | When | Scope |
|---|---|---|
| Full Review | Explicit architectural evaluation, major implementation planning | All phases, all sections |
| Code Review Supplement | Adding P3 architectural notes to a code review | Targeted analysis of changes only |
| Enrichment | Second+ pass to improve coverage of existing review | Independent analysis → merge → verify → update |
| Enrichment (no lens) | Avoid tunnel vision from lens focus | Same as Enrichment, but unconstrained exploration |
| Adversarial | Break out of convergent attention patterns | Randomized exploration → gap hunting → forced comparisons |
| Adversarial (targeted) | Suspect gaps in specific area | Specific file/component as entry, rest of Adversarial process |
| Adversarial (entry point N) | Force specific exploration pattern | Fixed entry point from table, rest of Adversarial process |
Phase applicability:
| Mode | Phase 1 (Discovery) | Phase 2 (Analysis) | Phase 3 (Recs) | Output |
|---|---|---|---|---|
| Full Review | ✓ Complete | ✓ Complete | ✓ Complete | New REVIEW_FILE |
| Code Review Supplement | — Skip | Targeted (relevant questions only) | — Skip | Inline notes in code review |
| Enrichment | ✓ Fresh (independent) | Update (add to existing) | Update (add to existing) | Revised review file |
| Adversarial | Own process (randomized) | Update (add to existing) | Update (add to existing) | Revised review file |
Mode selection (check in order — first match wins):
Use complete process: Phase 1 → Phase 2 → Phase 3 → Summary.
Appropriate for: new system reviews, periodic health checks, major refactoring decisions.
Default output: REVIEW_FILE (if not specified and doesn't exist yet).
Time Budget: Discovery (Phase 1) should be at least as thorough as Analysis + Recommendations combined. Most missed findings come from rushed discovery — especially the Coverage Checkpoint. If you're tempted to skip ahead, you're probably under-investing in discovery.
Skip full discovery — the code review already established context. Focus on:
Architectural notes:
- [smell/pattern observation relevant to this change]
- [dependency direction concern, if any]
- [trade-off worth discussing]
Tag as [concern] or [suggestion] per code review protocol.
For iterative refinement of an existing architecture review.
⚠️ This section covers both regular Enrichment AND "no lens" mode. No-lens skips lens rotation but follows the exact same protocol below.
Default file: REVIEW_FILE
First check: Verify REVIEW_FILE exists. If it doesn't exist, this is Full Review, not Enrichment — go to that section instead.
Header check (BEFORE discovery): Read ONLY the first 10 lines of REVIEW_FILE to extract:
When to use: Second or subsequent pass over the same codebase, typically after changes or to improve coverage. Each review naturally finds different things — enrichment accumulates findings rather than replacing them.
⚠️ CRITICAL: You MUST NOT read REVIEW_FILE findings until Step 2. Reading findings early causes anchoring — you'll confirm existing findings instead of discovering new ones. The header check above is allowed (pass number + lens only). Complete Phase 1 Discovery fully before reading findings.
Process:
Independent Analysis (Phase 1 Discovery) — Complete the entire Phase 1 below as if no review exists. Explore the codebase fresh. Write your findings to a scratch area or hold in memory. Do not read the existing review file.
Merge Phase — Only now read REVIEW_FILE. Compare your fresh findings against it.
Verification — For each finding in the existing review, explicitly verify:
Gap Analysis — Explicitly list:
Update — Revise REVIEW_FILE with:
*(pass N)* or *(pass N, [lens] lens)*Mode: Enrichment (pass N) — or Mode: Enrichment (pass N, no lens) if no-lens modeOutput: Updated review document, not a separate file.
Time Budget: Independent Analysis (step 1) should be at least as thorough as the merge + verification steps combined. Don't rush it to get to the merge. The value of enrichment comes from fresh eyes — if you read the existing review first, you're just proofreading.
Each enrichment pass uses a different primary lens to shift attention. Continue from the previous pass's lens — read the header to determine where you are in the rotation.
Lenses:
Rotation order (shift of 2): Complexity → Boundaries → Coverage → Duplication → Coupling → (wrap to Complexity)
Why shift-of-2? Covers the 3 highest-value lenses (Complexity, Boundaries, Coverage) as primary in just 3 passes — then Duplication and Coupling get primary focus in passes 4-5 if needed.
How to determine your lens:
Example: Previous was "Boundaries lens" → Your lens is Coverage
How to apply: During Phase 1 Discovery, start with your primary lens. Spend ~40% of discovery time on it before moving to the next. The leading lens gets deepest attention while context is fresh; later lenses get lighter coverage.
Complexity lens — systematic god script scan: When Complexity is your primary lens, START with a systematic LOC scan before any exploration:
find . -name "*.py" -type f ! -path "*/__pycache__/*" -exec wc -l {} + | sort -rn | head -20
Flag ALL files >500 LOC as potential god scripts. For each:
rg -i "detect_platforms.*god|god.*detect_platforms") — do NOT read full reviewThis prevents god scripts from escaping detection due to entry point randomization.
User can request enrichment without lens rotation (e.g., "review architecture, no lens"). Explore freely — let findings emerge from what catches attention rather than a predetermined focus. Use when:
⚠️ No-lens is Enrichment, not a shortcut. Full protocol required:
Mode: Enrichment (pass N, no lens)*(pass N, no lens)*No-lens only skips lens selection. You MUST still: do independent discovery, merge with existing review, verify findings, and update the file.
Recommended iterations: Run enrichment 3 times for solid coverage. The rotation order (Complexity → Boundaries → Coverage) covers the highest-value lenses in the first 3 passes. Additional passes (Duplication, Coupling) provide diminishing returns unless actively extending the system.
⚠️ MANDATORY after 3+ passes: If pass number ≥ 3, you MUST present options before proceeding:
Pass [N] exists ([previous lens] lens). Per skill, 3 passes provide solid coverage.
Options:
1. Pass [N+1] Enrichment ([next lens in rotation] lens) — full independent discovery + merge
2. Adversarial mode — randomized exploration to break convergent patterns
3. Health check — quick verification existing findings still hold
Which approach? (1/2/3)
Do NOT proceed with enrichment without explicit user choice. The prompt exists because attention convergence makes additional standard passes less effective than Adversarial mode.
For breaking out of convergent attention patterns when enrichment plateaus.
Requires: Existing REVIEW_FILE
When to use: After 3+ enrichment passes when findings have plateaued but you suspect gaps remain. Adversarial mode disrupts the exploration patterns that cause repeated passes to converge on the same findings.
⚠️ CRITICAL: Do NOT read the existing review until Step 4. The goal is to find what the review missed — reading it first defeats the purpose.
Mindset: "This review is incomplete. A critical reviewer would find issues that aren't documented. What did we miss?"
Process:
Randomized Entry Point — Do NOT start with directory tree. Pick using 1 + (current_minute mod 7) or choose the entry point you're least naturally drawn to — the goal is to break your default exploration pattern:
| # | Entry Point |
|---|---|
| 1 | Start from tests — what are they testing? What's NOT tested? |
| 2 | Start from config — what's configurable? What's hardcoded that shouldn't be? |
| 3 | Start from a random mid-sized file (100-300 LOC) — trace its dependencies |
| 4 | Start from specs/ or docs/ — what's specified but not implemented? What's implemented but not specified? |
| 5 | Start from error handling — search for except, raise, error — how are failures handled? |
| 6 | Start from data flow — pick an input, trace it to output |
| 7 | Start from documented smells in existing review — investigate each for clustering issues |
ADRs are historical records: Architecture Decision Records (specs/architecture/ADR/) capture decisions at a point in time, not current state. Path references, architectural patterns, and implementation details in ADRs may have evolved since writing. When using entry point #4 with ADRs:
God script override: If the review lists god scripts (>500 LOC files), consider starting from one NOT yet used as an adversarial entry point. God scripts cluster other smells — good ROI for finding new issues.
Contrarian Questions — For each component you examine, ask:
Forced Comparisons — Pick 2-3 pairs and explicitly compare:
base.py files — are they consistent?Gap Hunting — Now read REVIEW_FILE. Three techniques:
Smell search: For each smell category in the Reference: Smell Catalog that has NO finding in the review, actively search for it:
Inverse grep: Pick 3 keywords that appear frequently in existing findings (e.g., config, async, test). Search for files that DON'T contain them — what's happening in the unlabeled parts? This catches architectural islands — code that exists outside the main patterns.
Semantic search queries: Use natural language to find what regex misses:
Second Look — Pick 3 files that appear in existing findings and re-read them with fresh eyes. What else is there?
Update — Add new findings to review with tag: *(Adversarial pass)*
Note: Adversarial mode adds findings but does NOT verify/remove stale ones. This is intentional — Adversarial focuses on breaking attention patterns to find gaps. Use Enrichment mode periodically for comprehensive maintenance (verify + cleanup).
Output format in header:
**Mode:** Adversarial (after pass N)
Time Budget: Spend 60% on steps 1-3 (before reading existing review). The value comes from the different exploration path.
Success metric: Finding at least one issue not in the existing review. If you find nothing new, document what you searched and why — that's still valuable signal that the review is comprehensive.
Adversarial (targeted): Skip randomized entry point selection. Start from a specific file or component you suspect has gaps. Use when:
Specify in prompt: "Adversarial mode, starting from [file/component]"
Adversarial (entry point N): Force a specific entry point from the table above. Use when:
Specify in prompt: "Adversarial mode, entry point 5" (or whichever number)
Smell-driven entry (entry point 7): A particularly effective variant — use the documented smells in the existing review as your starting points. Problems cluster: a god script often has coupling issues, silent failures, hardcoded config. Each documented smell becomes a lead to investigate for adjacent issues.
This inverts standard Adversarial's logic:
Both are valid and address different failure modes.
Full Review and Enrichment modes. Code Review Supplement skips to targeted analysis. Adversarial mode uses its own process (see above).
Mandatory before analysis. Ensures nothing overlooked.
State the system's purpose and data flow. Use two-row format showing stages and artifacts:
Stage1 → Stage2 → Stage3 → Stage4
↓ ↓ ↓ ↓
artifact1 artifact2 artifact3 artifact4
For each component:
### [Component Name] (`path/`)
**Purpose:** What it does
**Pattern:** How it's structured (if applicable)
**Observations:**
- Notable design decisions
- Interfaces with other components
- Potential concerns (feed Phase 2)
Quantitative signals to note:
ASCII tree showing what depends on what:
shared/ (stable)
├── models.py ← Used by all components
├── utils.py ← Common utilities
└── config.py ← Centralized settings
application/ (volatile)
├── component_a/ ← Produces X
└── component_b/ ← Consumes X, produces Y
Annotate stability (stable/volatile) and data flow direction.
Time Budget: This checkpoint should take ~20% of Phase 1 time. Rushed checkpoints are the #1 source of missed findings.
Before moving to analysis, deliberately answer each question:
Enforcement: If you identify gaps, STOP. Go back and add them to the walkthrough. Don't carry forward "I should have looked at X" — look at it now.
Check README.md or equivalent for pointers to specs, design docs, or architectural decisions that might be orphaned.
Deep inspection triggers: If you noted any of these in the walkthrough, read the actual code:
Apply these lenses to what Phase 1 discovered.
Answer before making recommendations:
| # | Question | Assessment |
|---|---|---|
| 1 | What problem is being solved? | |
| 2 | What are the change vectors? (likely to change vs stable) | |
| 3 | What are the constraints? (team, timeline, existing patterns) | |
| 4 | What's the cost of being wrong? (reversibility) | |
| 5 | Where do errors get handled? What happens when things fail? (contained or propagated?) | |
| 6 | What's the expected lifespan? | |
| 7 | What's the concurrency model? | |
| 8 | Who owns the data and its invariants? | |
| 9 | Where are the boundaries? | |
| 10 | What are the runtime constraints? |
What's working well. Synthesize observations into architectural judgments — not "X exists" but "X is appropriate because...":
### [Architectural Judgment]
[What works, why it matters, what it enables]
Examples of synthesis:
Problems detected. Use smell vocabulary from reference table:
Smell: [name] in [location from Phase 1]
Signal: [what triggered detection]
Impact: [why it matters]
Direction: [refactoring suggestion, not prescription]
Patterns identified. Table format:
| Pattern | Where Used | Purpose |
|---|---|---|
| [Name] | [Component/file] | [What problem it solves here] |
Assess test structure and gaps. For detailed test analysis, use the testing skill — this section is architectural overview only.
Questions:
Red flags:
Note gaps as input to recommendations, not as prescriptive requirements. Reference testing skill for remediation guidance.
Order by decreasing priority (High first, None last):
| Priority | Issue | Rationale | Action |
|---|---|---|---|
| High | |||
| Medium | |||
| Low | |||
| None |
"None" is valid — explicitly stating what's not worth doing.
One paragraph overall assessment:
| Component | Location |
|---|---|
| [Name] | path/to/directory/ |
Keep at directory level. Only list individual files for entry points or key abstractions.
These are lenses, not laws. Apply when they clarify; ignore when they obscure.
| Principle | Question It Raises |
|---|---|
| SRP | Does this unit have one reason to change? |
| OCP | Can we extend without modifying? Should we? |
| LSP | Can subtypes substitute without surprise? |
| ISP | Are clients forced to depend on methods they don't use? |
| DIP | Do high-level policies depend on low-level details? |
| YAGNI | Are we building for a future that may not arrive? |
| KISS | Is there a simpler way that still works? |
| DRY | Is this duplication, or coincidental similarity? |
Tension acknowledgment: YAGNI and OCP often conflict. DRY and clarity sometimes trade off. Name the tension, don't pretend it resolves.
Reversibility preference:
Domain (stable) ← Application ← Infrastructure (volatile)
Not dogma — pragmatic for codebases expecting change. For scripts or throwaway code, skip it.
Patterns are solutions to recurring problems. Name them to communicate, not to impress:
Anti-pattern: Suggesting patterns without naming the problem they solve here.
| Smell | Signal | Direction |
|---|---|---|
| Shotgun surgery | One change touches many files | Missing abstraction |
| Divergent change | One file changes for unrelated reasons | Split responsibilities |
| Feature envy | Method uses another class's data extensively | Move method |
| Inappropriate intimacy | Classes know too much about each other | Introduce interface |
| God class/module | One unit does everything; >300 LOC or many unrelated methods | Extract cohesive pieces |
| Speculative generality | Abstractions for hypothetical futures | Remove until needed |
| Primitive obsession | Domain concepts as raw types | Introduce value objects |
| Leaky abstraction | Implementation details escape boundaries | Tighten interface |
| Untestable by design | Can't test without database/network/time | Inject dependencies, extract pure logic |
| Non-idempotent operations | Retries cause duplicates or corruption | Make operations safely repeatable |
| Unstable interface | Small changes ripple to many callers | Narrow or stabilize contract |
| Unobservable behavior | Can't tell what's happening in production | Add logging, metrics, tracing hooks |
| Hardcoded configuration | Tunable values hardcoded in source | Extract to config file or environment |
| N+1 queries | Loop making individual calls instead of batch | Batch operations, eager loading |
| Unbounded operations | No limits on collection size, query results, retries | Add pagination, limits, circuit breakers |
| Secrets in code | API keys, passwords, tokens in source | Move to environment variables, secret manager |
| Missing access control | No authorization checks on sensitive operations | Add permission gates at boundaries |
Calibration question: If I removed this abstraction and inlined the code, what would break? If the answer is "nothing concrete" — reconsider.
When invoked during code review, add architectural perspective after P0-P2 (security, correctness, data):
Architectural notes:
- [smell/pattern observation]
- [dependency direction concern, if any]
- [trade-off worth discussing]
Tag as [concern] or [suggestion] per code review protocol. Architectural disagreements are rarely [blocker] unless they create correctness/security issues.
ISSUES_FILE = specs/architecture/architectural-issues.md
Significant findings (smells, structural concerns, high-priority recommendations) should be persisted to ISSUES_FILE for long-term tracking.
What to persist:
What NOT to persist:
Each finding must include skill attribution:
### [Issue Title]
**Skill:** software-architecture-review
**Category:** [Smell name or RECOMMENDATION]
**Issue:** [Description]
**Implication:** [Why it matters]
**Direction:** [Suggested approach, if any]
The skill uses the full repo for context, but what to raise depends on mode:
Liza mode (multi-agent):
Pairing mode:
Pairing mode: Before saving findings to ISSUES_FILE, present the list and ask:
Found [N] architectural issues worth persisting:
1. [Issue title] — [one-line summary]
2. ...
Save to specs/architecture/architectural-issues.md? (y/n/select specific)
Wait for user confirmation before writing.
Liza mode (multi-agent): Save findings automatically after review completion. No confirmation required — the skill is invoked by agents operating autonomously.