Why cleaning Legacy code before Migration matters
Upgrading platforms, frameworks, or Infrastructure is more than a version bump. It’s an opportunity to pay down Technical debt, reduce operational risk, and improve developer velocity. Migrating without first cleaning old code compounds issues: hidden dependencies, undefined behavior, brittle tests, and Performance regressions. A focused pre-Migration cleanup reduces surprise, shortens downtime, and gives you repeatable, auditable steps for a smoother cutover.
Prerequisites / Before You Start
Inventory and Audit
- Create a complete inventory of repositories, modules, binaries, services, Cron jobs, and scripts.
- Identify the runtime matrix: programming languages (e.g., Java 8, Python 3.7), frameworks (Spring, Django), build tools (Maven, Gradle, npm), databases (PostgreSQL, MySQL), messaging systems (Kafka, RabbitMQ).
- Note Deployment targets: VMs, containers, Kubernetes, Serverless.
- Map external integrations: Third-party APIs, SSO/IdP (OAuth/OpenID), Payment gateways, storage (S3, NFS).
- List data stores and schema versions; record migration tooling (Liquibase, Flyway).
- Document service dependencies and SLAs.
Environment and backups
- Ensure automated, tested backups for code, artifacts, and especially databases and object storage. Test a restore on a staging environment.
- Freeze production access rules and secrets management (Vault, AWS Secrets Manager). Plan secret rotation.
- Provision a staging environment that mirrors production topology and size where practical.
Versions and Compatibility matrix
- Define target versions with reasoning (LTS, Security support window, compatibility guarantees).
- For each component, capture:
- Current version
- Target version
- Breaking changes notes
- Migration path and tooling
- Owner
Example compatibility table:
| Component | Current | Target | Notes |
|---|---|---|---|
| Java JDK | 8u201 | 17 LTS | Review removed modules; update GC flags |
| Spring Boot | 1.5 | 3.x | Jakarta namespace change; APIs removed |
| Python | 3.7 | 3.11 | Update asyncio APIs, typing changes |
| PostgreSQL | 10 | 14 | Reindex, check deprecated Features |
| Node.js | 12 | 20 | ES modules behavior, OpenSSL changes |
Team and process readiness
- Establish a branching strategy (GitFlow or trunk-based with release branches).
- Enable CI/CD with reproducible builds. Add quality gates (linting, static analysis, test coverage).
- Agree on a rollback and roll-forward strategy with clear RTO/RPO.
- Assign owners for code areas and define review Standards.
Step-by-step guide to prepare Legacy code for migration
Step 1: Stabilize the baseline
-
Freeze new Features or guard them behind feature flags.
-
Tag the current state and create a migration branch:
git checkout main
git pull –ff-only
git tag Pre-migration-baseline
git checkout -b migration/cleanup -
Capture Performance and error-rate baselines (APM metrics, p99 latency, CPU/memory) to compare later.
Step 2: Make builds deterministic
- Pin dependencies with lockfiles or exact versions:
- npm:
npm ciwith package-lock.json - Python:
pip-compile(pip-tools) to generate requirements.txt - Maven: use dependencyManagement with explicit versions
- Gradle: version catalogs (libs.versions.toml)
- npm:
- Cache artifacts in an internal registry (Nexus/Artifactory) to avoid upstream volatility.
- Record compiler flags and build environment in a build manifest.
Example (Maven dependencyManagement):
xml
Step 3: Introduce or strengthen automated tests
- Start with characterization tests to lock in current behavior of legacy modules, especially around public APIs and domain logic.
- Add smoke tests and health checks for services.
- Increase unit/Integration coverage around risky areas: date/time handling, concurrency, serialization, database transactions.
- Use tests to document “weird but required” behavior.
- Add test doubles or adapters for external systems to decouple tests from flaky dependencies.
Example (Python pytest characterization):
python
def test_invoice_total_rounding_legacy():
invoice = Invoice(lines=[Line(9.995, 1)])
Legacy behavior rounds down
assert invoice.total() == 9.99
Step 4: Run static analysis and linters
- Enable tools such as SonarQube, ESLint, Pylint, Checkstyle, SpotBugs.
- Fail the build on severe issues (Security hotspots, null dereferences, SQL injection, path traversal).
- Configure formatting and enforce automatically (Prettier, Black, gofmt). Consistent style reduces cognitive load.
Example (ESLint strict config):
json
{
“extends”: [“eslint:recommended”, “plugin:@typescript-eslint/strict”],
“rules”: {
“no-implicit-globals”: “error”,
“no-floating-decimal”: “error”
}
}
Step 5: Identify and remove dead code
- Use coverage reports and static call graphs. Search for unused classes, endpoints, and features.
- Deprecate in one release, remove in the next, if needed. Keep change logs visible to stakeholders.
Commands:
Java: list unused dependencies
mvn dependency:analyze
Python: find imports unresolved or unused
pip install vulture
vulture src/
JS: detect unused exports
npx ts-prune
Step 6: Isolate side effects and I/O
- Introduce ports-and-adapters (hexagonal Architecture) boundaries.
- Wrap external systems (filesystems, HTTP clients, message brokers) behind interfaces so they can be mocked and swapped.
Example adapter (TypeScript):
ts
export interface PaymentGateway {
charge(amountCents: number, token: string): Promise
}
export class StripeGateway implements PaymentGateway {
async charge(amountCents: number, token: string) {
// Stripe specifics here
return “txn_123”;
}
}
This allows migration to a new library or endpoint without touching domain logic.
Step 7: Deprecation and compatibility posture
-
Define public API contracts with semantic versioning. Mark deprecated endpoints and headers.
-
Add runtime deprecation warnings to logs and responses:
http
Deprecation: true
Sunset: Tue, 01 Apr 2025 00:00:00 GMT
Link: </v2/resource>; rel=”successor-version” -
Create compatibility shims if necessary (e.g., DTO adapters mapping old field names to new schema).
Step 8: Upgrade the build and runtime scaffolding
- Update Docker base images to supported versions:
Before: vulnerable and EOL
FROM openjdk:8u121-jre
After: supported LTS
FROM eclipse-temurin:17-jre-alpine
- Add minimal health endpoints, readiness/liveness probes for orchestration systems (Kubernetes).
- Externalize Configuration via environment variables, 12-factor style.
- Ensure logging uses structured JSON with correlation IDs.
Kubernetes example:
yaml
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 20
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 5
Step 9: Refactor risky code paths gradually
- Target code smells that break on upgrade: reflection hacks, private API access, thread-unsafe singletons, custom serialization.
- Replace deprecated APIs early. For Java, watch for Java EE to Jakarta namespace changes. For Python, update asyncio and typing usages. For Node, move to ES modules if required.
- Apply strangler fig or branch by abstraction patterns to replace components in place while preserving behavior.
Example (Java: replacing legacy date APIs):
java
// Before
Date expiry = new Date(System.currentTimeMillis() + 86400000);
// After
Instant expiry = Instant.now().plus(1, ChronoUnit.DAYS);
Step 10: Data model and schema readiness
- Introduce additive schema changes first (columns, tables). Keep backward-compatibility:
- Write code to both old and new columns during transition (dual-write pattern).
- Backfill asynchronously with idempotent jobs.
- Use feature flags to switch reads to the new schema after verification.
- Use migration tools:
Flyway
flyway -url=jdbc:postgresql://db/app -user=app -password=*** migrate
Step 11: Security, secrets, and Compliance
- Replace hard-coded secrets with secret managers.
- Update cipher suites, TLS versions, and token lifetimes to meet Compliance.
- Run SCA (software composition analysis) and SAST scans; patch CVEs before migration.
- Enable Audit logs for access and Configuration changes.
Step 12: Performance and capacity hygiene
- Profile hotspots and memory usage under realistic load (k6, JMeter, Locust).
- Remove N+1 queries, enable connection pooling, add indices confirmed by EXPLAIN plans.
- Set timeouts, retries, and circuit breakers to prevent cascading failures.
Example (Resilience4j in Spring):
yaml
resilience4j:
circuitbreaker:
instances:
externalApi:
slidingWindowSize: 20
failureRateThreshold: 50
waitDurationInOpenState: 30s
Step 13: Observability and diagnostics
- Standardize metrics (Prometheus), tracing (OpenTelemetry), logs (structured).
- Propagate trace/context IDs. Add domain-specific business metrics.
- Pre-create dashboards and alerts for the migration window.
Step 14: Dry runs and rehearsal
- Use production-like data subsets (anonymized) for rehearsal.
- Practice:
- Backup → restore
- Schema migration → app Deployment
- Rollback and rollforward
- Record timings, bottlenecks, and team responsibilities.
Step 15: Change management and communications
- Publish a Migration plan including blackout windows, user-facing impacts, and rollback criteria.
- Notify downstream consumers of API changes with clear timelines.
- Keep a decision log (ADR) to document trade-offs and technical choices.
Practical examples
Example: Feature flags for risky changes
python
if flags.is_enabled(“use_new_pricing_engine”):
price = pricing_v2.calculate(cart)
else:
price = pricing_v1.calculate(cart)
- Roll out progressively using percentage rollouts or segment by tenant.
- Monitor error rates per path.
Example: Backward-compatible API response
json
{
“id”: “123”,
“name”: “Widget”,
“status”: “ACTIVE”,
“status_code”: 1, // Deprecated; will be removed after 2025-06-30
“_links”: { “self”: “/v2/widgets/123” }
}
Example: Finding risky reflections in Java
grep -R “setAccessible(true)” -n src
Risks, common issues, and how to avoid them
- Hidden transitive dependency breakage
- Mitigation: Lock dependencies, use SBOMs, run dependency trees (mvn dependency:tree, npm ls). Prefer smallest set of upgrades at once.
- Behavior changes in runtime defaults (GC, timezones, TLS)
- Mitigation: Explicitly configure JVM flags, set TZ in containers, pin TLS versions and cipher suites. Add canary deployments to detect drift.
- Insufficient test coverage
- Mitigation: Characterization tests around critical flows; require coverage deltas on PRs; add contract tests for services.
- Data migration failures
- Mitigation: Practice migrations; design idempotent scripts; run with low lock time; use online schema change tools where needed.
- Performance regressions
- Mitigation: Baseline before changes; load test after; add p95/p99 SLO alerts; use blue-green or canary to validate under real traffic.
- Rollback complexity
- Mitigation: Backward-compatible schema (expand → migrate → contract). Keep old code path togglable via feature flags until stable.
- Security regressions due to new defaults
- Mitigation: Run security scans Post-upgrade; revalidate auth flows; rotate secrets; verify CSP and HSTS headers.
Post-migration Checklist and validation steps
- Functional validation
- Critical user journeys pass (login, checkout, CRUD, payments).
- Background jobs and schedulers run as expected.
- Data integrity
- Row counts, checksums, and sampled record comparisons match between old and new paths.
- No unexpected growth in dead letters or retry queues.
- Performance and reliability
- p95/p99 latency and error rates within SLO for 24–72 hours.
- Resource usage (CPU, memory, I/O) stable; no abnormal GC or leak patterns.
- Observability
- Dashboards and alerts are green and informative.
- Traces include all major spans; logs show structured fields and correlation IDs.
- Security and compliance
- Secrets sourced from correct store; key rotations tested.
- Vulnerability scans clean; dependencies at intended versions.
- Rollback readiness
- Point-in-time restore validated; rollback instructions current.
- Feature flags allow quick disable of new paths.
- Documentation and handover
- READMEs, runbooks, and ADRs updated.
- Post-mortem of the migration with action items, even if successful.
Mapping common legacy smells to remediation
| Smell / Risk | Remediation approach |
|---|---|
| God classes, long methods | Extract smaller services/modules; apply SRP |
| Direct DB calls scattered in code | Introduce repository/DAO layer |
| Global state, singletons | Dependency injection; make components stateless |
| Reflection hacks | Replace with public APIs; codegen if required |
| Tight coupling to vendor SDKs | Add abstraction interfaces and adapters |
| Ad-hoc scripts in production | Codify as jobs with proper logging and idempotency |
| Mixed concerns in controllers | Move Business logic to domain services |
| Lack of Error handling and retries | Add timeouts, retries with backoff, circuit breakers |
Configuration snippets you can reuse
-
GitHub Actions quality gate:
yaml
name: CI
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with: { distribution: ‘temurin’, java-version: ’17’ } - run: ./mvnw -B -DskipTests=false verify
-
Docker healthcheck:
dockerfile
HEALTHCHECK –interval=30s –timeout=3s –retries=3 CMD curl -f http://localhost:8080/health || exit 1 -
OpenTelemetry bootstrap (Java):
bash
java -javaagent:/otel/opentelemetry-javaagent.jar \
-Dotel.service.name=orders \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
-jar app.jar
Governance and documentation essentials
- Keep a living Compatibility guide documenting removed/deprecated features, new requirements, and consumer impact.
- Maintain a migration Playbook with timelines, responsibilities, and runbooks.
- Track decisions in short Architecture decision records (ADRs).
- Publish release notes with clear upgrade steps and rollback notes.
Rollout strategies
- Blue-Green deployment: spin up the new version alongside the old, switch traffic atomically, retain immediate rollback.
- Canary releases: route a small percentage of traffic to the new version, expand progressively while monitoring.
- Shadow traffic: replay production requests to the new stack without affecting users to compare responses.
Choose based on risk tolerance, statefulness, and Infrastructure.
Tooling Checklist
- Static analysis: SonarQube, ESLint, Pylint, SpotBugs
- Dependency management and SCA: Dependabot/Renovate, OWASP Dependency-Check, Trivy
- Testing: JUnit/PyTest/Jest, Testcontainers, WireMock, Pact (contract testing)
- Observability: Prometheus/Grafana, OpenTelemetry, ELK/EFK
- Migration: Liquibase/Flyway, gh-ost/pt-online-schema-change
- Feature flags: LaunchDarkly, Unleash, OpenFeature
- CI/CD: GitHub Actions, GitLab CI, Jenkins, ArgoCD
Communication tips for stakeholders
- Share a high-level roadmap with milestones and acceptance criteria.
- Provide risk heatmaps and mitigation plans in plain language.
- Offer test environments and timelines for downstream integrators to validate against pre-release builds.
- Set expectations on sunset dates for deprecated APIs, with reminders.
Minimal day-by-day plan (example)
- Day 1–3: Baseline metrics, lock dependencies, enable linters.
- Day 4–7: Add characterization tests around critical flows; remove dead code.
- Day 8–12: Abstract external dependencies; add observability; refactor risky paths.
- Day 13–15: Schema prep and backfills; dual-write strategy; Performance tuning.
- Day 16–18: Rehearsal migration in staging; load tests; fix findings.
- Day 19–20: Canary → blue-green rollout; validate; finalize documentation.
Adjust based on system size and complexity.
FAQ
How much test coverage do I need before starting migration?
Aim for coverage where it matters: critical business flows, serialization boundaries, and high-change modules. A practical target is 70–80% in those areas, with smoke and contract tests covering the rest. Characterization tests can rapidly raise confidence even if global coverage remains moderate.
Should I refactor everything before upgrading frameworks?
No. Focus on changes that materially reduce risk: remove dead code, replace deprecated APIs, isolate I/O, and add observability. Avoid large-scale rewrites; prefer incremental refactors supported by tests and feature flags.
What if I can’t have downtime for database changes?
Design online, backward-compatible migrations: expand (add columns/tables), backfill asynchronously, dual-write, then contract after verifying reads from the new schema. Use tools like gh-ost or pt-online-schema-change to alter large tables with minimal locking.
How do I decide between canary and blue-green?
Use canary when you need to validate under real traffic gradually or when capacity is limited. Choose blue-green when you require instant rollback and can duplicate the environment. Often, teams combine both: canary first, then blue-green switch.
How do I manage transitive dependencies that keep changing?
Generate a lockfile or BOM, use an internal artifact registry, and define an SBOM for traceability. Schedule periodic, small batch upgrades maintained by Renovate or Dependabot to avoid large, risky jumps.
