Contents show

Why cleaning Legacy code before Migration matters

Upgrading platforms, frameworks, or Infrastructure is more than a version bump. It’s an opportunity to pay down Technical debt, reduce operational risk, and improve developer velocity. Migrating without first cleaning old code compounds issues: hidden dependencies, undefined behavior, brittle tests, and Performance regressions. A focused pre-Migration cleanup reduces surprise, shortens downtime, and gives you repeatable, auditable steps for a smoother cutover.

Prerequisites / Before You Start

Inventory and Audit

Create a complete inventory of repositories, modules, binaries, services, Cron jobs, and scripts.
Identify the runtime matrix: programming languages (e.g., Java 8, Python 3.7), frameworks (Spring, Django), build tools (Maven, Gradle, npm), databases (PostgreSQL, MySQL), messaging systems (Kafka, RabbitMQ).
Note Deployment targets: VMs, containers, Kubernetes, Serverless.
Map external integrations: Third-party APIs, SSO/IdP (OAuth/OpenID), Payment gateways, storage (S3, NFS).
List data stores and schema versions; record migration tooling (Liquibase, Flyway).
Document service dependencies and SLAs.

Environment and backups

Ensure automated, tested backups for code, artifacts, and especially databases and object storage. Test a restore on a staging environment.
Freeze production access rules and secrets management (Vault, AWS Secrets Manager). Plan secret rotation.
Provision a staging environment that mirrors production topology and size where practical.

Versions and Compatibility matrix

Define target versions with reasoning (LTS, Security support window, compatibility guarantees).
For each component, capture:
- Current version
- Target version
- Breaking changes notes
- Migration path and tooling
- Owner

Example compatibility table:

Component	Current	Target	Notes
Java JDK	8u201	17 LTS	Review removed modules; update GC flags
Spring Boot	1.5	3.x	Jakarta namespace change; APIs removed
Python	3.7	3.11	Update asyncio APIs, typing changes
PostgreSQL	10	14	Reindex, check deprecated Features
Node.js	12	20	ES modules behavior, OpenSSL changes

Team and process readiness

Establish a branching strategy (GitFlow or trunk-based with release branches).
Enable CI/CD with reproducible builds. Add quality gates (linting, static analysis, test coverage).
Agree on a rollback and roll-forward strategy with clear RTO/RPO.
Assign owners for code areas and define review Standards.

Step-by-step guide to prepare Legacy code for migration

Step 1: Stabilize the baseline

Freeze new Features or guard them behind feature flags.
Tag the current state and create a migration branch:

git checkout main
git pull –ff-only
git tag Pre-migration-baseline
git checkout -b migration/cleanup
Capture Performance and error-rate baselines (APM metrics, p99 latency, CPU/memory) to compare later.

Step 2: Make builds deterministic

Pin dependencies with lockfiles or exact versions:
- npm: npm ci with package-lock.json
- Python: pip-compile (pip-tools) to generate requirements.txt
- Maven: use dependencyManagement with explicit versions
- Gradle: version catalogs (libs.versions.toml)
Cache artifacts in an internal registry (Nexus/Artifactory) to avoid upstream volatility.
Record compiler flags and build environment in a build manifest.

Example (Maven dependencyManagement):
xml

org.springframework.boot
spring-boot-dependencies
3.2.5
pom
import

Step 3: Introduce or strengthen automated tests

Start with characterization tests to lock in current behavior of legacy modules, especially around public APIs and domain logic.
Add smoke tests and health checks for services.
Increase unit/Integration coverage around risky areas: date/time handling, concurrency, serialization, database transactions.
Use tests to document “weird but required” behavior.
Add test doubles or adapters for external systems to decouple tests from flaky dependencies.

Example (Python pytest characterization):
python
def test_invoice_total_rounding_legacy():
invoice = Invoice(lines=[Line(9.995, 1)])

Legacy behavior rounds down

assert invoice.total() == 9.99

Step 4: Run static analysis and linters

Enable tools such as SonarQube, ESLint, Pylint, Checkstyle, SpotBugs.
Fail the build on severe issues (Security hotspots, null dereferences, SQL injection, path traversal).
Configure formatting and enforce automatically (Prettier, Black, gofmt). Consistent style reduces cognitive load.

Example (ESLint strict config):
json
{
“extends”: [“eslint:recommended”, “plugin:@typescript-eslint/strict”],
“rules”: {
“no-implicit-globals”: “error”,
“no-floating-decimal”: “error”
}
}

Step 5: Identify and remove dead code

Use coverage reports and static call graphs. Search for unused classes, endpoints, and features.
Deprecate in one release, remove in the next, if needed. Keep change logs visible to stakeholders.

Commands:

Java: list unused dependencies

mvn dependency:analyze

Python: find imports unresolved or unused

pip install vulture
vulture src/

JS: detect unused exports

npx ts-prune

Step 6: Isolate side effects and I/O

Introduce ports-and-adapters (hexagonal Architecture) boundaries.
Wrap external systems (filesystems, HTTP clients, message brokers) behind interfaces so they can be mocked and swapped.

Example adapter (TypeScript):
ts
export interface PaymentGateway {
charge(amountCents: number, token: string): Promise;
}

export class StripeGateway implements PaymentGateway {
async charge(amountCents: number, token: string) {
// Stripe specifics here
return “txn_123”;
}
}

This allows migration to a new library or endpoint without touching domain logic.

Step 7: Deprecation and compatibility posture

Define public API contracts with semantic versioning. Mark deprecated endpoints and headers.
Add runtime deprecation warnings to logs and responses:
http
Deprecation: true
Sunset: Tue, 01 Apr 2025 00:00:00 GMT
Link: </v2/resource>; rel=”successor-version”
Create compatibility shims if necessary (e.g., DTO adapters mapping old field names to new schema).

Step 8: Upgrade the build and runtime scaffolding

Update Docker base images to supported versions:

Before: vulnerable and EOL

FROM openjdk:8u121-jre

After: supported LTS

FROM eclipse-temurin:17-jre-alpine

Add minimal health endpoints, readiness/liveness probes for orchestration systems (Kubernetes).
Externalize Configuration via environment variables, 12-factor style.
Ensure logging uses structured JSON with correlation IDs.

Kubernetes example:
yaml
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 20
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 5

Step 9: Refactor risky code paths gradually

Target code smells that break on upgrade: reflection hacks, private API access, thread-unsafe singletons, custom serialization.
Replace deprecated APIs early. For Java, watch for Java EE to Jakarta namespace changes. For Python, update asyncio and typing usages. For Node, move to ES modules if required.
Apply strangler fig or branch by abstraction patterns to replace components in place while preserving behavior.

Example (Java: replacing legacy date APIs):
java
// Before
Date expiry = new Date(System.currentTimeMillis() + 86400000);

// After
Instant expiry = Instant.now().plus(1, ChronoUnit.DAYS);

Step 10: Data model and schema readiness

Introduce additive schema changes first (columns, tables). Keep backward-compatibility:
- Write code to both old and new columns during transition (dual-write pattern).
- Backfill asynchronously with idempotent jobs.
Use feature flags to switch reads to the new schema after verification.
Use migration tools:

Flyway

flyway -url=jdbc:postgresql://db/app -user=app -password=*** migrate

Step 11: Security, secrets, and Compliance

Replace hard-coded secrets with secret managers.
Update cipher suites, TLS versions, and token lifetimes to meet Compliance.
Run SCA (software composition analysis) and SAST scans; patch CVEs before migration.
Enable Audit logs for access and Configuration changes.

Step 12: Performance and capacity hygiene

Profile hotspots and memory usage under realistic load (k6, JMeter, Locust).
Remove N+1 queries, enable connection pooling, add indices confirmed by EXPLAIN plans.
Set timeouts, retries, and circuit breakers to prevent cascading failures.

Example (Resilience4j in Spring):
yaml
resilience4j:
circuitbreaker:
instances:
externalApi:
slidingWindowSize: 20
failureRateThreshold: 50
waitDurationInOpenState: 30s

Step 13: Observability and diagnostics

Standardize metrics (Prometheus), tracing (OpenTelemetry), logs (structured).
Propagate trace/context IDs. Add domain-specific business metrics.
Pre-create dashboards and alerts for the migration window.

Step 14: Dry runs and rehearsal

Use production-like data subsets (anonymized) for rehearsal.
Practice:
- Backup → restore
- Schema migration → app Deployment
- Rollback and rollforward
Record timings, bottlenecks, and team responsibilities.

Step 15: Change management and communications

Publish a Migration plan including blackout windows, user-facing impacts, and rollback criteria.
Notify downstream consumers of API changes with clear timelines.
Keep a decision log (ADR) to document trade-offs and technical choices.

Practical examples

Example: Feature flags for risky changes

python
if flags.is_enabled(“use_new_pricing_engine”):
price = pricing_v2.calculate(cart)
else:
price = pricing_v1.calculate(cart)

Roll out progressively using percentage rollouts or segment by tenant.
Monitor error rates per path.

Example: Backward-compatible API response

json
{
“id”: “123”,
“name”: “Widget”,
“status”: “ACTIVE”,
“status_code”: 1, // Deprecated; will be removed after 2025-06-30
“_links”: { “self”: “/v2/widgets/123” }
}

Example: Finding risky reflections in Java

grep -R “setAccessible(true)” -n src

Risks, common issues, and how to avoid them

Hidden transitive dependency breakage
- Mitigation: Lock dependencies, use SBOMs, run dependency trees (mvn dependency:tree, npm ls). Prefer smallest set of upgrades at once.
Behavior changes in runtime defaults (GC, timezones, TLS)
- Mitigation: Explicitly configure JVM flags, set TZ in containers, pin TLS versions and cipher suites. Add canary deployments to detect drift.
Insufficient test coverage
- Mitigation: Characterization tests around critical flows; require coverage deltas on PRs; add contract tests for services.
Data migration failures
- Mitigation: Practice migrations; design idempotent scripts; run with low lock time; use online schema change tools where needed.
Performance regressions
- Mitigation: Baseline before changes; load test after; add p95/p99 SLO alerts; use blue-green or canary to validate under real traffic.
Rollback complexity
- Mitigation: Backward-compatible schema (expand → migrate → contract). Keep old code path togglable via feature flags until stable.
Security regressions due to new defaults
- Mitigation: Run security scans Post-upgrade; revalidate auth flows; rotate secrets; verify CSP and HSTS headers.

Post-migration Checklist and validation steps

Functional validation
- Critical user journeys pass (login, checkout, CRUD, payments).
- Background jobs and schedulers run as expected.
Data integrity
- Row counts, checksums, and sampled record comparisons match between old and new paths.
- No unexpected growth in dead letters or retry queues.
Performance and reliability
- p95/p99 latency and error rates within SLO for 24–72 hours.
- Resource usage (CPU, memory, I/O) stable; no abnormal GC or leak patterns.
Observability
- Dashboards and alerts are green and informative.
- Traces include all major spans; logs show structured fields and correlation IDs.
Security and compliance
- Secrets sourced from correct store; key rotations tested.
- Vulnerability scans clean; dependencies at intended versions.
Rollback readiness
- Point-in-time restore validated; rollback instructions current.
- Feature flags allow quick disable of new paths.
Documentation and handover
- READMEs, runbooks, and ADRs updated.
- Post-mortem of the migration with action items, even if successful.

Mapping common legacy smells to remediation

Smell / Risk	Remediation approach
God classes, long methods	Extract smaller services/modules; apply SRP
Direct DB calls scattered in code	Introduce repository/DAO layer
Global state, singletons	Dependency injection; make components stateless
Reflection hacks	Replace with public APIs; codegen if required
Tight coupling to vendor SDKs	Add abstraction interfaces and adapters
Ad-hoc scripts in production	Codify as jobs with proper logging and idempotency
Mixed concerns in controllers	Move Business logic to domain services
Lack of Error handling and retries	Add timeouts, retries with backoff, circuit breakers

Configuration snippets you can reuse

GitHub Actions quality gate:
yaml
name: CI
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
  with: { distribution: ‘temurin’, java-version: ’17’ }
- run: ./mvnw -B -DskipTests=false verify
Docker healthcheck:
dockerfile
HEALTHCHECK –interval=30s –timeout=3s –retries=3 CMD curl -f http://localhost:8080/health || exit 1
OpenTelemetry bootstrap (Java):
bash
java -javaagent:/otel/opentelemetry-javaagent.jar \
-Dotel.service.name=orders \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
-jar app.jar

Governance and documentation essentials

Keep a living Compatibility guide documenting removed/deprecated features, new requirements, and consumer impact.
Maintain a migration Playbook with timelines, responsibilities, and runbooks.
Track decisions in short Architecture decision records (ADRs).
Publish release notes with clear upgrade steps and rollback notes.

Rollout strategies

Blue-Green deployment: spin up the new version alongside the old, switch traffic atomically, retain immediate rollback.
Canary releases: route a small percentage of traffic to the new version, expand progressively while monitoring.
Shadow traffic: replay production requests to the new stack without affecting users to compare responses.

Choose based on risk tolerance, statefulness, and Infrastructure.

Tooling Checklist

Static analysis: SonarQube, ESLint, Pylint, SpotBugs
Dependency management and SCA: Dependabot/Renovate, OWASP Dependency-Check, Trivy
Testing: JUnit/PyTest/Jest, Testcontainers, WireMock, Pact (contract testing)
Observability: Prometheus/Grafana, OpenTelemetry, ELK/EFK
Migration: Liquibase/Flyway, gh-ost/pt-online-schema-change
Feature flags: LaunchDarkly, Unleash, OpenFeature
CI/CD: GitHub Actions, GitLab CI, Jenkins, ArgoCD

Communication tips for stakeholders

Share a high-level roadmap with milestones and acceptance criteria.
Provide risk heatmaps and mitigation plans in plain language.
Offer test environments and timelines for downstream integrators to validate against pre-release builds.
Set expectations on sunset dates for deprecated APIs, with reminders.

Minimal day-by-day plan (example)

Day 1–3: Baseline metrics, lock dependencies, enable linters.
Day 4–7: Add characterization tests around critical flows; remove dead code.
Day 8–12: Abstract external dependencies; add observability; refactor risky paths.
Day 13–15: Schema prep and backfills; dual-write strategy; Performance tuning.
Day 16–18: Rehearsal migration in staging; load tests; fix findings.
Day 19–20: Canary → blue-green rollout; validate; finalize documentation.

Adjust based on system size and complexity.

FAQ

How much test coverage do I need before starting migration?

Aim for coverage where it matters: critical business flows, serialization boundaries, and high-change modules. A practical target is 70–80% in those areas, with smoke and contract tests covering the rest. Characterization tests can rapidly raise confidence even if global coverage remains moderate.

Should I refactor everything before upgrading frameworks?

No. Focus on changes that materially reduce risk: remove dead code, replace deprecated APIs, isolate I/O, and add observability. Avoid large-scale rewrites; prefer incremental refactors supported by tests and feature flags.

What if I can’t have downtime for database changes?

Design online, backward-compatible migrations: expand (add columns/tables), backfill asynchronously, dual-write, then contract after verifying reads from the new schema. Use tools like gh-ost or pt-online-schema-change to alter large tables with minimal locking.

How do I decide between canary and blue-green?

Use canary when you need to validate under real traffic gradually or when capacity is limited. Choose blue-green when you require instant rollback and can duplicate the environment. Often, teams combine both: canary first, then blue-green switch.

How do I manage transitive dependencies that keep changing?

Generate a lockfile or BOM, use an internal artifact registry, and define an SBOM for traceability. Schedule periodic, small batch upgrades maintained by Renovate or Dependabot to avoid large, risky jumps.