Skip to main content
Cloud Ops

AWS Migration Checklist: 25 Steps for a Zero-Downtime Cloud Move

A comprehensive checklist for migrating to AWS. From assessment to cutover, covering networking, data, security, DNS, and rollback planning.

AWS Migration Checklist: 25 Steps for a Zero-Downtime Cloud Move

AWS Migration Checklist: 25 Steps for a Zero-Downtime Cloud Move

Migrating to AWS is not a weekend project. It touches every layer of your stack — compute, storage, networking, security, DNS, and the application layer itself. Get it right and your team barely notices the switch. Get it wrong and you are staring at downtime, data loss, or a bill that makes your CFO twitch.

After completing 200+ migrations to AWS — from single-server WordPress sites to distributed microservice architectures across multiple regions — the ZenoCloud team has distilled the process into 25 steps across six phases. Every item on this list exists because we have seen what happens when it gets skipped.

Print this checklist, pin it to your project board, and check off each step before moving to the next phase.


AWS Migration Checklist: 25 Steps for a Zero-Downtime Cloud Move — concept

Phase 1: Assessment (Steps 1-5)

Before you touch a single AWS resource, you need to understand exactly what you are migrating, what it depends on, and what constraints you are operating under. The assessment phase is where most failed migrations actually fail — they just do not realize it until cutover night.

Step 1: Build a Complete Infrastructure Inventory

Document every server, service, database, cron job, queue worker, and third-party integration in your current environment:

  • Compute. Every server (physical or virtual), its OS, CPU, RAM, disk, and average utilization.
  • Storage. Block volumes, object storage, NFS mounts, shared filesystems. Note which is ephemeral vs. persistent.
  • Databases. Every instance — engine, version, size, IOPS needs, replication topology, backup schedule. Include Redis, Memcached, and Elasticsearch.
  • Network. Load balancers, firewalls, VPN tunnels, DNS zones, SSL certificates, static IPs.
  • Scheduled jobs. Cron jobs, batch processes, queue workers. These are the most commonly forgotten components and the ones most likely to cause silent post-migration failures.
  • Third-party integrations. Payment gateways, email services, CDNs, CI/CD, and anything connected via IP allowlisting.

Step 2: Map Application Dependencies

Your inventory tells you what exists. Dependency mapping tells you what talks to what.

For each application, document upstream and downstream dependencies, database connections (including read/write splits), shared resources (filesystems, caches, queues), hardcoded IPs or hostnames, and service discovery mechanisms. Use network flow analysis to validate against actual traffic — engineers routinely forget about that one cron job that SSHs into production or the legacy service still calling the old internal endpoint.

Step 3: Identify Compliance and Regulatory Requirements

Before choosing AWS regions or encryption strategies, map your compliance landscape:

  • Data residency. Does data need to stay in a specific country? This dictates available regions.
  • Industry regulations. PCI DSS, HIPAA, SOC 2, RBI guidelines — each has specific encryption, access control, and audit requirements.
  • Data classification. Classify data (public, internal, confidential, restricted) and map levels to encryption and access policies.
  • Licensing. Some licenses are tied to hardware or on-premises deployments. Identify any needing renegotiation.

Step 4: Benchmark Current Performance

You cannot validate a migration without a baseline. Establish benchmarks for:

  • Response times. P50, P95, P99 latency for critical endpoints, measured at the application layer.
  • Database performance. Slow query logs, average execution time, IOPS utilization, peak connection counts.
  • Throughput. Requests per second, concurrent connections, bandwidth during normal and peak traffic.
  • Resource utilization. CPU, memory, disk I/O, and network patterns over at least two weeks.
  • Error rates. Baseline rates per service so you can distinguish migration-caused errors from pre-existing ones.

These benchmarks become your post-migration acceptance criteria. If P95 latency was 200ms before, it should be 200ms or better after.

Step 5: Assess Migration Complexity and Risk

For each component, determine:

  • Strategy. AWS defines the 7 R’s: Rehost, Replatform, Repurchase, Refactor, Retire, Retain, Relocate. Most migrations land at 70% rehost, 20% replatform, 10% everything else.
  • Risk level. Stateless web servers are low risk. Primary databases with replication requirements are high risk.
  • Migration order. Low-risk, low-dependency workloads first. Production databases last.
  • Rollback feasibility. Some rollbacks are trivial (flip DNS). Others are nearly impossible (database with hours of new writes). Define rollback for each component.

Phase 2: Planning (Steps 6-9)

With a clear picture of what you are migrating, design the target architecture, estimate costs, and build a realistic timeline.

Step 6: Design the Target AWS Architecture

Translate your current infrastructure into AWS-native architecture:

  • VPC design. CIDR ranges, subnet strategy (public, private, isolated), AZ distribution, VPC peering or Transit Gateway.
  • Compute. Map server specs to EC2 instance families based on actual utilization — burstable, compute-optimized, or memory-optimized.
  • Databases. Which go to RDS, which to Aurora, which stay on EC2 (complex configs like Galera may need self-managed instances initially).
  • Storage. EBS (gp3, io2), S3 buckets with lifecycle policies, EFS mounts.
  • Load balancing. ALB for HTTP/HTTPS, NLB for TCP/UDP. Security groups, NACLs, WAF.
  • High availability. Multi-AZ databases, Auto Scaling Groups for compute. Define RTO and RPO per tier.

Step 7: Build a Detailed Cost Estimate

AWS pricing surprises teams that do not model carefully:

  • Compute. On-demand for migration, Reserved Instances or Savings Plans for steady-state.
  • Data transfer. The most underestimated line item. Inbound is free; outbound, inter-region, and inter-AZ are not. Model from actual traffic patterns.
  • Storage. EBS, S3 tiers, EFS, snapshot retention. Snapshots accumulate cost over time.
  • Database. RDS/Aurora pricing including I/O, Multi-AZ standby, read replicas, backup retention.
  • Hidden costs. NAT Gateway data processing, CloudWatch log ingestion, Elastic IP charges, Route 53 query volume.

Add a 20% buffer for the first 90 days.

Step 8: Define the Migration Timeline

Build a realistic timeline with explicit milestones:

  • Preparation duration. Account setup, VPC provisioning, IAM — typically 1-2 weeks.
  • Migration waves. Group workloads by dependency and risk. Each wave gets its own testing window and go/no-go checkpoint.
  • Cutover window. Exact date and time during a low-traffic period.
  • Rollback window. 48-72 hours is standard.
  • Buffer. Add 30% to every estimate. Migrations always take longer than planned.

Step 9: Establish Communication and Escalation Plans

Define stakeholder notifications per phase, war room protocol during cutover, incident escalation paths (who calls a rollback?), and customer communications for any user-facing changes. Document this as a RACI matrix before execution begins.


Phase 3: Preparation (Steps 10-15)

Infrastructure buildout. Everything provisioned here should be tested before any production data moves.

Step 10: Set Up AWS Accounts and Organizations

Create an AWS Organization with separate accounts for production, staging, development, security/audit, and shared services. Apply Service Control Policies (SCPs) to enforce guardrails — restrict regions, block root usage, require encryption. Enable consolidated billing and configure IAM Identity Center for centralized access management.

Step 11: Provision VPC and Networking

  • VPC. CIDR ranges that do not overlap with on-premises (critical for VPN connectivity during migration).
  • Subnets. Public (load balancers), private (app servers), isolated (databases) — minimum two AZs.
  • NAT Gateways. One per AZ for outbound internet from private subnets.
  • VPN or Direct Connect. Site-to-site VPN is usually sufficient for migration. Direct Connect is for long-term hybrid setups.
  • VPC Flow Logs. Enable from day one for network troubleshooting.

Step 12: Configure IAM Policies and Roles

Follow least privilege from the start. Create service roles for EC2, Lambda, and ECS. Configure cross-account roles with trust policies if you are using multiple accounts (you should be). Enforce MFA for all console users, especially root. Use instance profiles instead of long-lived access keys wherever possible.

Step 13: Define Security Groups and NACLs

Create security groups per tier (web, app, database, cache). Reference other security groups instead of IP ranges — this keeps rules clean as instances scale. Start with deny-all and open only what is needed. Deploy test instances and validate connectivity in every subnet before migration begins.

Step 14: Set Up Monitoring and Logging

Deploy monitoring before the workloads arrive:

  • CloudWatch. Dashboards, alarms for CPU/memory/disk, log groups.
  • CloudTrail. All regions, all accounts, logs to a centralized S3 bucket. Non-negotiable for audit.
  • Third-party. Datadog, Zabbix, Grafana — install agents and validate metric flow before workloads arrive.
  • Alerting. PagerDuty, Opsgenie, or Slack routing. Test with a synthetic alert.

Step 15: Reduce DNS TTL Values

Two to three weeks before cutover, lower DNS TTL for all records that will change to 60-300 seconds. Default TTLs are often 3600 seconds. If you switch DNS with an hour-long TTL, some users hit your old infrastructure for an hour after cutover. Verify propagation with dig from multiple locations. Document original TTLs so you can restore them after stabilization.


Phase 4: Migration (Steps 16-19)

Data and application moves, executed in waves from lowest risk to critical path.

Step 16: Execute Data Migration

Approach depends on volume and acceptable downtime:

  • Databases under 100 GB. Native dump/restore plus replication. AWS DMS handles schema conversion and ongoing replication.
  • Databases over 100 GB. DMS full load plus CDC (change data capture). Start early — initial loads take hours or days.
  • Object storage. AWS DataSync or S3 Transfer Acceleration. For tens of terabytes, consider Snowball.
  • File systems. DataSync for NFS-to-EFS with validation.

After every transfer, validate row counts, checksums, and random-sample comparisons. Do not trust the transfer tool’s success message alone.

Step 17: Migrate Application Workloads

Move in waves:

  1. Non-production. Staging and dev first — your dress rehearsal.
  2. Low-risk production. Internal tools, admin panels, non-customer-facing services.
  3. Customer-facing. Main app servers, APIs, frontends. Deploy in parallel and shift traffic gradually with weighted DNS or feature flags.
  4. Data stores. Primary databases and caches last, after dependent workloads are validated.

For each wave: deploy, configure secrets and connections, smoke test, run full tests, validate against Step 4 benchmarks, get sign-off.

Step 18: Migrate Database Workloads

Database migration carries the highest risk and deserves its own focus:

  • Set up replication from source to AWS target (RDS, Aurora, or EC2). DMS or native replication (binlog, logical replication) keeps the target synchronized.
  • Monitor replication lag continuously. Your cutover window depends on reaching zero lag.
  • Test failover by pointing a staging app at the AWS database and running your test suite.
  • Resolve compatibility issues during setup, not cutover — especially if changing engines.
  • Set up reverse replication from AWS back to source before cutover. This gives you a rollback path that preserves post-cutover writes. Gold standard for zero-downtime migration.

Step 19: Conduct Parallel Running

Run both environments simultaneously before cutover:

  • Use weighted DNS (Route 53) to send 5-10% of traffic to AWS, then increase gradually.
  • Compare responses, monitor error rates and latency against Step 4 benchmarks.
  • Run for at least 48-72 hours, covering one full business cycle.
  • Hold a formal go/no-go meeting with explicit pass/fail criteria: error rates, latency thresholds, data consistency.

Phase 5: Cutover (Steps 20-22)

The cutover window should be short, well-rehearsed, and boring.

Step 20: Execute DNS Cutover

  • Final sync. Bring all databases and file systems to zero lag.
  • Stop writes. Briefly halt writes to source (or enter maintenance mode) to capture everything.
  • Switch DNS. Update A records, CNAMEs, and all relevant records. With the reduced TTLs from Step 15, propagation completes in minutes.
  • Verify propagation. Check from multiple locations with dig and online propagation tools.
  • Monitor traffic shift. Old infrastructure traffic should drop to near-zero within the TTL window.

Step 21: Validate SSL/TLS and Security

Immediately post-cutover:

  • Verify SSL certificates are serving correctly with complete chains and proper HTTPS redirects.
  • Confirm ACM certificates are associated with ALBs, CloudFront, or API Gateway.
  • Check HSTS headers — a missing header can downgrade returning users to HTTP.
  • Port scan public infrastructure to confirm only intended ports are open.
  • Review WAF logs for false positives.

Step 22: Activate Full Monitoring and Alerting

Switch all alarms from observation to alerting mode. Update on-call rotations to include AWS infrastructure. Verify dashboards show complete metric flow. Activate synthetic health checks from external locations every 60 seconds.


AWS Migration Checklist: 25 Steps for a Zero-Downtime Cloud Move — solution

Phase 6: Post-Migration (Steps 23-25)

The migration is complete when the new environment is validated, optimized, and the old environment is decommissioned.

Step 23: Validate Performance Against Benchmarks

Within 72 hours: rerun Step 4 performance tests, load test to validate Auto Scaling, have QA run end-to-end workflows, review database slow query logs, and confirm all cron jobs and batch processes complete on schedule.

Step 24: Optimize Costs

Two to four weeks post-cutover:

  • Right-size instances using AWS Compute Optimizer and real utilization data.
  • Purchase Savings Plans for 24/7 workloads (30-60% savings over on-demand).
  • Optimize storage. Move cold data to S3 Intelligent-Tiering or Glacier. Delete orphaned snapshots.
  • Audit data transfer. Check for excessive inter-AZ traffic, NAT Gateway charges, and CDN origin costs.
  • Enable Cost Anomaly Detection and implement a tagging strategy for cost allocation.

Step 25: Decommission Source Infrastructure

  • Keep source running (idle) for 2-4 weeks as a safety net.
  • Take a final backup before decommission. Store per compliance retention requirements.
  • Shut down in reverse order: app servers first, databases last.
  • Restore DNS TTLs to normal values.
  • Update all runbooks, architecture diagrams, and DR plans.
  • Hold a retrospective. Document what worked and what did not, even if the migration was clean.

Common Pitfalls

After 200+ migrations, these patterns surface repeatedly:

Underestimating data transfer time. A 500 GB database over 100 Mbps takes 11+ hours for initial transfer alone. Start replication early.

Ignoring DNS TTL. Switching DNS with a 3600-second TTL means some users hit your old infrastructure for an hour. Reduce TTLs weeks in advance.

Hardcoded IPs. Grep your codebase and configs for IP patterns before migration. This catches hours of post-cutover debugging.

Missing secrets. Applications work in staging but fail in production because of a missed environment variable. Automate provisioning with Secrets Manager or Parameter Store.

No rollback plan. Every step needs a documented rollback. Test it before you need it.

Skipping parallel running. The 48-72 hours of parallel running catches issues that no amount of staging testing surfaces. Do not skip this.


Checklist Summary

PhaseStepsKey Deliverables
Assessment1-5Inventory, dependency map, compliance matrix, performance baseline, risk manifest
Planning6-9Architecture diagram, cost estimate, timeline, communication plan
Preparation10-15AWS accounts, VPC, IAM, security groups, monitoring, DNS TTL reduction
Migration16-19Data sync, app deployment, database replication, parallel running
Cutover20-22DNS switch, SSL validation, monitoring activation
Post-Migration23-25Performance validation, cost optimization, source decommission

Let ZenoCloud Handle Your Migration

You can run this checklist yourself. Or you can let a team that has done it 200+ times run it for you.

ZenoCloud handles the entire process — from assessment through post-migration optimization. We assign a dedicated migration engineer, build a custom runbook for your infrastructure, and execute every step targeting zero downtime and zero data loss.

What you get:

  • Dedicated migration engineer from day one
  • Complete infrastructure assessment and AWS architecture design
  • Full data migration with integrity validation at every step
  • Parallel running and cutover management with rollback at every stage
  • 30-day post-migration support including performance tuning and cost optimization
  • 24/7 monitoring during and after the migration window

Start with a free migration assessment. We will review your infrastructure, identify the optimal strategy, and provide a cost estimate and timeline — no commitment required.

Get Your Free Migration Assessment

Need help with this?

Let us manage your cloud infrastructure.

Learn more