The Incremental Improvement: Refactoring the Logistics Codebase Without Stopping the World
The Incremental Improvement
The logistics platform has 338 Java classes, 23 methods above the cognitive complexity threshold, 67 classes in one package, zero ArchUnit tests, and a God class with 3,400 lines. Fixing all of this at once requires a feature freeze, a dedicated team, and weeks of work. No business sponsor approves that. No engineering leader promises the team can refactor for three weeks without shipping features.
The alternative is incremental improvement: small, reversible changes delivered alongside feature work. Each change makes one thing better. No change makes anything worse. Over time, the cumulative effect is a codebase that is measurably easier to navigate, review, and modify.
This chapter describes the sequence that works. The sequence matters because some improvements enable others, and some improvements without the right foundation create more confusion than they resolve.
The Sequence
Phase 1: Establish the baseline (one sprint)
Set up the measurement tools before changing any code. Install SonarQube if it is not already running. Add ArchUnit to the test suite with zero rules. Run both. Record the numbers.
Baseline metrics for the logistics platform:
- Total classes: 338 (338 public, 0 package-private)
- Methods above cognitive complexity 15: 23
- Largest class by method count: ShipmentService (47 methods)
- Largest package by class count: service (67 classes)
- Circular package dependencies: 8 cycles
- Average dependencies per service class: 11
These numbers are the starting point. Every improvement is measured against them. Post them on the team dashboard. Do not editorialize. Let the numbers speak.
Phase 2: Add ArchUnit boundary tests (one sprint)
Before restructuring anything, add ArchUnit tests for the three most important boundaries:
- Services do not depend on controllers
- No circular dependencies between packages
- Repositories do not contain business logic
Run the tests. They will fail. Use ArchUnit’s freeze feature to baseline existing violations. New violations are blocked. Existing violations are tracked.
This phase costs nothing in terms of refactoring effort. It adds tests. It prevents new violations. It creates a ratchet: the codebase can only get better, never worse.
Phase 3: Improve naming in the hotspot methods (continuous)
The 23 methods above the cognitive complexity threshold are the most-modified, most-reviewed, most-expensive methods in the codebase. Start with naming improvements, not structural changes. Rename variables from data to shipmentItems, from result to stockLevels, from flag to isInternational. These changes are low-risk, easy to review, and immediately improve readability for every developer who reads these methods.
Naming changes can be included in any pull request that touches the relevant file. No dedicated refactoring sprint required. A developer fixing a bug in calculateRate renames val to baseRate while they are there. The reviewer approves the rename alongside the bug fix.
Phase 4: Extract the God class clusters (planned, over two sprints)
This is the first change that modifies the code’s structure. It requires coordination but not a feature freeze. Use the dependency clustering technique from Chapter 7:
- Identify the five responsibility clusters in
ShipmentService - Extract one cluster per pull request
- Each pull request is independently reviewable and reversible
- Each extraction reduces the method count and dependency count of
ShipmentService
After five pull requests, ShipmentService is gone. Five focused classes replace it. Each pull request was small enough to review in 30 minutes.
Phase 5: Restructure packages by feature (planned, over two sprints)
Move classes from the layer-based structure to the feature-based structure established in Chapter 5. Move one feature at a time. Each move is a separate pull request. Update the ArchUnit tests to enforce the new boundaries.
Phase 6: Reduce public surface area (continuous)
After packages are feature-based, audit each package for unnecessary public visibility. Make implementation classes package-private. Add ArchUnit tests to enforce the public surface.
This diagram shows the logistics platform’s readability metrics improving over six months without a feature freeze. The top line tracks the number of methods above cognitive complexity 15, dropping from 23 to 4. The middle line tracks circular dependencies, dropping from 8 to 0. The bottom line tracks the ratio of public to total classes, dropping from 100% to 28%. Each data point corresponds to a sprint. The improvements are gradual but monotonic: no sprint makes the metrics worse.
Handling Resistance
Three forms of resistance appear in every incremental improvement effort:
“We don’t have time to refactor.” The answer: naming improvements take five minutes per pull request. They are included alongside feature work. ArchUnit tests take one afternoon to set up. They run automatically forever after. The question is not whether you have time to refactor. The question is whether you have time to keep paying the cognitive load tax on every feature, every review, and every debugging session.
“What if the refactoring breaks something?” The answer: each change is small, independently testable, and reversible. A rename cannot change behavior. An extraction that preserves the public API cannot change behavior. An ArchUnit test cannot change behavior. The risk of each individual change is near zero. The risk of not changing is that the codebase continues to degrade.
“The whole team needs to agree on the target architecture first.” The answer: no, they do not. The first three phases (baseline, ArchUnit tests, naming improvements) require no architectural decisions. They improve the current state without committing to a future state. The architectural decisions (package structure, module boundaries) can be made incrementally as the team gains experience with the improved codebase.
The End State
After six months of incremental improvement, the logistics platform metrics:
| Metric | Before | After |
|---|---|---|
| Total classes | 338 | 312 |
| Public classes | 338 (100%) | 87 (28%) |
| Methods above complexity 15 | 23 | 4 |
| Largest class (methods) | 47 | 8 |
| Largest package (classes) | 67 | 11 |
| Circular dependencies | 8 | 0 |
| Avg dependencies per service | 11 | 4 |
| New developer onboarding time | 3 weeks | 5 days |
| Median PR review time | 45 min | 18 min |
The codebase is not perfect. Four methods still exceed the complexity threshold. Some naming could be better. Some packages could be smaller. But a new developer joining the team can find the shipment tracking code in the shipment.tracking package, understand the ShipmentTracker class without opening any other class, and submit a pull request that will be reviewed for design quality, not just compilation and tests.
The codebase is readable. Not because someone rewrote it. Because the team improved it, one decision at a time, in every pull request, for six months.
The techniques in this book are not a methodology. They are not a framework. They are not a transformation program. They are habits. A habit of naming variables so the next reader does not need to check the implementation. A habit of keeping classes small enough to fit in working memory. A habit of enforcing boundaries with tests instead of documentation. A habit of reviewing for design, not just correctness. A habit of measuring what matters and improving it incrementally.
The readable codebase is not a destination. It is what happens when a team decides that the next engineer’s reading experience matters and acts on that decision every day.