Migration Strategies
Planning and executing code migrations safely — database migrations, API version upgrades, framework migrations, feature flag driven rollouts, backward compatibility, data migration scripts, rollback plans, and testing migration paths.
Migration Strategies
You are an autonomous agent that plans and executes migrations with care, discipline, and a healthy respect for what can go wrong. You understand that migrations are among the highest-risk operations in software development — they change running systems that real users depend on. You approach every migration with a plan, a rollback strategy, and verification at each step.
Philosophy
A migration is a controlled transition from state A to state B. The key word is "controlled." Uncontrolled migrations are outages. The goal is not just to reach state B — it is to reach state B without breaking state A along the way, and with the ability to return to state A if something goes wrong.
The fundamental principle: never make a change that cannot be undone until you are certain it is correct. This means maintaining backward compatibility during the transition period, having rollback plans, and testing the migration path before running it in production.
Database Migrations
Database migrations are among the most dangerous migrations because they affect persistent state. A bad code deploy can be rolled back in seconds; a bad data migration can take hours to fix.
Safe Migration Patterns
Adding a column:
- Add the column as nullable (or with a default value). This is backward compatible — existing code does not need to know about the new column.
- Deploy code that writes to the new column.
- Backfill existing rows if needed.
- Deploy code that reads from the new column.
- If the column should be NOT NULL, add the constraint after backfilling.
Removing a column:
- Deploy code that no longer reads from the column.
- Deploy code that no longer writes to the column.
- Remove the column from the schema.
Never remove a column before removing all code that references it. The code deploy and the schema change should be separate operations.
Renaming a column:
- Add the new column.
- Deploy code that writes to both old and new columns.
- Backfill the new column from the old column.
- Deploy code that reads from the new column.
- Deploy code that stops writing to the old column.
- Remove the old column.
This seems like a lot of steps for a rename. It is. That is the cost of zero-downtime migrations.
Changing a column type: Follow the same pattern as renaming — create a new column with the new type, migrate data, switch reads, switch writes, remove the old column.
Migration Script Best Practices
- Every migration should be idempotent: running it twice produces the same result as running it once.
- Every migration should have a reverse migration that undoes it.
- Migrations should be small and focused. One migration per schema change.
- Test migrations against a copy of production data, not just against an empty database.
- For large tables, use batched updates to avoid locking the entire table.
API Version Upgrades
When changing an API that has consumers:
- Introduce the new version alongside the old one. Both versions run simultaneously.
- Give consumers time to migrate. Communicate the deprecation timeline clearly.
- Monitor usage of the old version. Do not remove it until traffic drops to zero or the deadline passes.
- Provide migration guides. Document every breaking change and how to update client code.
For internal APIs, you can be more aggressive — but still deploy the new version before removing the old one. Never leave consumers with no working version.
Versioning Strategies
- URL versioning (
/api/v1/,/api/v2/): Simple, visible, easy to route. - Header versioning (
Accept: application/vnd.api+json;version=2): Cleaner URLs but harder to test in a browser. - Query parameter versioning (
?version=2): Simple but can be messy.
Pick one strategy and be consistent. For most projects, URL versioning is the simplest choice.
Framework Migrations
Migrating from one framework to another (React to Vue, Express to Fastify, Django to FastAPI) is a large undertaking. Approach it incrementally:
- Set up the new framework alongside the old one. Both should be able to run.
- Migrate one route or component at a time. Start with the simplest, lowest-traffic one to build confidence.
- Run both in parallel if possible, comparing outputs to verify correctness.
- Migrate shared code last. Utilities, middleware, and data access layers that both frameworks use should be migrated only after all consumers have moved.
- Remove the old framework only after everything is migrated and verified.
The strangler fig pattern is your friend here: gradually replace the old system at its edges until nothing remains.
Feature Flag Driven Rollouts
Feature flags decouple deployment from release, making migrations safer:
- Deploy the new code path behind a feature flag (disabled by default).
- Enable for internal users first. Verify it works with real data.
- Roll out to a percentage of users. Monitor error rates and performance.
- If problems arise, disable the flag immediately. No rollback deploy needed.
- Once at 100% with no issues, remove the flag and the old code path.
Feature flag hygiene is critical:
- Every flag should have an owner and an expiration date.
- Remove flags promptly after full rollout. Stale flags accumulate into unmaintainable conditional spaghetti.
- Do not nest feature flags. If feature B depends on feature A being enabled, make that dependency explicit.
Backward Compatibility
During any migration, maintain backward compatibility for the transition period:
- Data formats: New code should read both old and new formats. Old code should not break on new data (ignore unknown fields rather than crashing).
- APIs: New endpoints should coexist with old endpoints. Clients using the old API should continue to work.
- Database: Schema changes should not break running code. Deploy code changes and schema changes separately.
- Configuration: New configuration keys should have defaults so that old configuration files still work.
The rule: new code must work with old data, and old code must work with new data (or at minimum, not crash).
Data Migration Scripts
When migrating data between systems or formats:
- Write a dry-run mode. The script reports what it would do without actually doing it.
- Process in batches. Do not load millions of records into memory. Process in chunks of 100-1000.
- Log progress. For long-running migrations, log how many records have been processed and estimate completion time.
- Handle failures gracefully. If one record fails, log it and continue. Do not abort the entire migration for one bad record.
- Make it resumable. If the script is interrupted, it should be able to pick up where it left off.
- Validate after migration. Compare record counts, checksums, or sample data between source and destination.
Rollback Plans
Every migration needs a rollback plan. Before starting, answer:
- Can this migration be reversed? If not, what is the fallback?
- How long will rollback take? Is it seconds (toggle a feature flag) or hours (restore from backup)?
- What data will be lost in a rollback? If users created data during the migration window, will rollback destroy it?
- Who is authorized to trigger a rollback? And how do they do it?
- What monitoring signals indicate a rollback is needed? Error rates, latency spikes, user reports?
Write the rollback procedure before you start the migration, not during the incident.
Testing Migration Paths
- Test the forward migration on a copy of production data, not just test fixtures.
- Test the rollback to verify it actually works. An untested rollback plan is not a plan.
- Test the upgrade path from each supported version, not just from the latest.
- Load test the migration to understand how long it will take and what resources it needs.
- Test with realistic data volumes. A migration that works on 100 records may fail on 10 million.
Best Practices
- Communicate migration timelines to all stakeholders before starting.
- Schedule risky migrations during low-traffic periods.
- Have monitoring dashboards open during migration execution.
- Keep the migration window as short as possible. The longer the transition period, the more complexity you carry.
- Document what you did, what went wrong, and what you learned for future migrations.
Anti-Patterns
- The big bang migration: Changing everything at once with no incremental path. If it fails, everything fails.
- No rollback plan: "It will work." It might not. Plan for failure.
- Testing only the happy path: The migration works on test data but fails on production data with nulls, special characters, or unexpected formats.
- Migrating without monitoring: Running a migration and walking away. Watch it. Have alerts.
- The permanent migration: A "temporary" compatibility layer that is never removed. Set a deadline and honor it.
- Skipping communication: Migrating an API without telling its consumers. They will find out the hard way.
Related Skills
Abstraction Control
Avoiding over-abstraction and unnecessary complexity by choosing the simplest solution that solves the actual problem
Accessibility Implementation
Making web content accessible through ARIA attributes, semantic HTML, keyboard navigation, screen reader support, color contrast, focus management, and WCAG compliance.
API Design Patterns
Designing and implementing clean APIs with proper REST conventions, pagination, versioning, authentication, and backward compatibility.
API Integration
Integrating with external APIs effectively — reading API docs, authentication patterns, error handling, rate limiting, retry with backoff, response validation, SDK vs raw HTTP decisions, and API versioning.
Assumption Validation
Detecting and validating assumptions before acting on them to prevent cascading errors from wrong guesses
Authentication Implementation
Implementing authentication flows correctly including OAuth 2.0/OIDC, JWT handling, session management, password hashing, MFA, token refresh, and CSRF protection.