AWS Graviton ARM64 Migration Guide: Real Cost Savings and What Actually Breaks
Graviton3 instances give you 20-40% better cost/performance than equivalent x86 — but migrating isn't just flipping a flag. Here's what breaks on ARM64 and the exact strategy to migrate without downtime.

The cost case for Graviton is real, and the numbers are concrete. An m7g.large (Graviton3) costs $0.0798/hr in us-east-1. An equivalent m7i.large (Intel) costs $0.1008/hr. That's a 21% savings on compute alone before you factor in the performance differences. For memory-bound workloads, Graviton's memory bandwidth is noticeably better, which means you can often right-size to a smaller instance type.
In practice, teams migrating to Graviton report 20-40% reduction in EC2 costs for general-purpose workloads. For a cluster with $50K/month in EC2 spend, that's $10-20K/month in savings. At $500K/month, the math is obvious.
But the migration is not a one-liner. You're changing CPU architecture from x86_64 to aarch64. Binaries compiled for x86 won't run on ARM64. Container images targeting x86 won't run on Graviton nodes. This is a real migration that requires preparation.
What the Architecture Change Means
x86_64 and aarch64 are different instruction set architectures. When you run a container on a Graviton node, the container must contain ARM64 binaries. The OS kernel handles the ABI, but the user-space code — your application binary, its libraries, the language runtime — must be compiled for ARM64.
Most modern software ships multi-arch images that support both linux/amd64 and linux/arm64. When you docker pull nginx:latest on an ARM64 machine, Docker automatically pulls the ARM64 variant. The problem is with:
- Software that doesn't publish ARM64 images — older projects, proprietary tools, some enterprise software
- Native extensions and modules — Node.js addons compiled with node-gyp, Python packages with C extensions, JNI libraries
- AVX/SSE2 instruction dependencies — code that uses x86-specific SIMD instructions
- Statically compiled binaries included in images — build tools, database binaries, agent software bundled at build time
Let me work through each of these.
Auditing Your Images for ARM64 Compatibility
Before migrating a single node, audit every container image you run.
1# Check if an image has an arm64 manifest
2docker buildx imagetools inspect nginx:latest | grep -A2 "linux/arm64"
3
4# For a list of images, use this script
5for image in \
6 nginx:latest \
7 postgres:16 \
8 redis:7 \
9 your-registry/your-app:latest; do
10 echo -n "$image: "
11 docker buildx imagetools inspect $image 2>/dev/null | \
12 grep -q "linux/arm64" && echo "arm64 supported" || echo "ARM64 NOT FOUND"
13doneFor ECR images:
1# Check ECR image manifest for arm64 support
2aws ecr describe-images \
3 --repository-name my-repo \
4 --image-ids imageTag=latest \
5 --query 'imageDetails[].imagePushedAt' \
6 --region us-east-1
7
8# Inspect the manifest list
9aws ecr batch-get-image \
10 --repository-name my-repo \
11 --image-ids imageTag=latest \
12 --query 'images[].imageManifest' \
13 --output text | python3 -m json.tool | grep architectureDo this for every image you deploy. Make a spreadsheet. Column 1: image name. Column 2: arm64 available (yes/no). Column 3: alternative or action needed. Don't skip this step.
Building Multi-Arch Images with Docker Buildx
For images you build yourself, you need to produce multi-arch images. Docker Buildx with QEMU emulation lets you build ARM64 images on x86 machines:
1# Set up QEMU for cross-compilation
2docker run --privileged --rm tonistiigi/binfmt --install all
3
4# Create a new buildx builder that supports multi-platform
5docker buildx create --name multiarch --driver docker-container --use
6docker buildx inspect --bootstrap
7
8# Build and push a multi-arch image
9docker buildx build \
10 --platform linux/amd64,linux/arm64 \
11 --tag my-registry/my-app:latest \
12 --push \
13 .The --push flag is required for multi-arch builds — multi-platform images can't be loaded into the local Docker daemon, they must be pushed to a registry.
In CI (GitHub Actions):
1name: Build Multi-Arch Image
2
3on:
4 push:
5 branches: [main]
6
7jobs:
8 build:
9 runs-on: ubuntu-latest
10 steps:
11 - uses: actions/checkout@v4
12
13 - name: Set up QEMU
14 uses: docker/setup-qemu-action@v3
15
16 - name: Set up Docker Buildx
17 uses: docker/setup-buildx-action@v3
18
19 - name: Login to ECR
20 uses: aws-actions/amazon-ecr-login@v2
21
22 - name: Build and push
23 uses: docker/build-push-action@v5
24 with:
25 context: .
26 platforms: linux/amd64,linux/arm64
27 push: true
28 tags: |
29 ${{ env.ECR_REGISTRY }}/my-app:${{ github.sha }}
30 ${{ env.ECR_REGISTRY }}/my-app:latest
31 cache-from: type=gha
32 cache-to: type=gha,mode=maxNote: QEMU emulation is slow. Building an ARM64 image via QEMU on an x86 runner might take 3-5x longer than native. For large builds, consider using ARM64 GitHub Actions runners (available via runs-on: ubuntu-latest-arm64 in GitHub-hosted runners) or AWS CodeBuild with Graviton instances for native ARM64 builds.
What Actually Breaks on ARM64
Node.js Native Addons
Packages that use node-gyp to compile native addons must be recompiled for ARM64. The image build will fail during npm install if the native module doesn't support ARM64, or the module will work fine because it has prebuilt binaries for ARM64.
Common packages with ARM64 support: bcrypt (use bcryptjs pure-JS instead — it's faster on modern hardware), sharp (full ARM64 support since v0.29), canvas, sqlite3.
Check any native module by looking for ARM64 in its prebuilt binary list:
# Check if a package has ARM64 prebuilds
npm info <package> | grep -i armPython Packages with C Extensions
Most major data science and ML packages (numpy, scipy, pandas, Pillow) now ship ARM64 wheels. But some smaller packages still lack ARM64 wheels and require compilation from source, which means you need build tools in your base image:
1FROM python:3.12-slim
2
3# Add build tools for packages that need compilation
4RUN apt-get update && apt-get install -y \
5 gcc \
6 g++ \
7 python3-dev \
8 libffi-dev \
9 && rm -rf /var/lib/apt/lists/*
10
11RUN pip install --no-cache-dir -r requirements.txtThis adds ~200MB to your image but ensures packages compile correctly. Alternatively, use a multi-stage build and copy only the compiled packages:
FROM python:3.12 AS builder
RUN pip install --no-cache-dir --target /install -r requirements.txt
FROM python:3.12-slim
COPY --from=builder /install /usr/local/lib/python3.12/site-packagesAVX/SSE2 Dependencies
Some software uses x86-specific SIMD instructions (SSE2, AVX2, AVX-512) for performance-critical paths. This is common in:
- Machine learning inference: TensorFlow and PyTorch have Graviton-optimized builds now, but older versions may fall back to slower paths
- Database binaries: some PostgreSQL extensions use AVX for performance
- Compression libraries: some configurations of zlib, lz4 have x86-specific optimizations
For ML workloads specifically, AWS provides Graviton-optimized builds:
# Use AWS-provided DLCs for ML workloads
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-graviton:2.1.0-cpu-py310-ubuntu20.04-ec2Third-Party Agents and Sidecars
Monitoring agents, service mesh sidecars, security scanners bundled as init containers — all need ARM64 support. Common ones:
| Agent | ARM64 Support |
|---|---|
| Datadog Agent | Yes (since 7.34) |
| New Relic Infrastructure | Yes |
| Dynatrace OneAgent | Yes |
| Falco | Yes |
| Aqua Security | Yes |
| Wiz | Check current version |
| Lacework | Yes |
| AppDynamics | Partial — check your version |
Verify every agent before migrating. A sidecar that silently fails on ARM64 can cause init containers to fail or monitoring to disappear without clear error messages.
EKS Blue-Green Node Group Migration
This is the migration strategy I use. Never in-place migrate existing nodes — you want a clear rollback path (see zero-downtime cluster upgrades).
Step 1: Create Graviton Node Group
1# Create a new node group with Graviton instances
2aws eks create-nodegroup \
3 --cluster-name my-cluster \
4 --nodegroup-name workers-graviton \
5 --node-role arn:aws:iam::123456789:role/eks-node-role \
6 --subnets subnet-abc123 subnet-def456 \
7 --scaling-config minSize=2,maxSize=20,desiredSize=5 \
8 --ami-type AL2_ARM_64 \
9 --instance-types m7g.large \
10 --kubernetes-version 1.30 \
11 --labels '{"node-type":"graviton","arch":"arm64"}' \
12 --region us-east-1
13
14# Wait for node group to be active
15aws eks wait nodegroup-active \
16 --cluster-name my-cluster \
17 --nodegroup-name workers-gravitonNote AL2_ARM_64 — this is the Graviton-compatible Amazon Linux 2 AMI. For AL2023 use AL2023_ARM_64_STANDARD.
Step 2: Taint Graviton Nodes Initially
To avoid workloads scheduling on Graviton nodes before you've verified compatibility, taint them:
# Taint all new Graviton nodes
kubectl get nodes -l node-type=graviton \
-o jsonpath='{.items[*].metadata.name}' | \
xargs -I {} kubectl taint nodes {} arch=arm64:NoScheduleNow only pods with a toleration for arch=arm64:NoSchedule will schedule on Graviton nodes.
Step 3: Test a Non-Critical Workload First
Pick a stateless, non-critical service with multi-arch images confirmed. Add a toleration and node affinity:
1spec:
2 tolerations:
3 - key: arch
4 operator: Equal
5 value: arm64
6 effect: NoSchedule
7 affinity:
8 nodeAffinity:
9 requiredDuringSchedulingIgnoredDuringExecution:
10 nodeSelectorTerms:
11 - matchExpressions:
12 - key: node-type
13 operator: In
14 values:
15 - gravitonDeploy it, monitor it, confirm it works. Check your monitoring agent is collecting metrics, your logs are flowing, your application is behaving correctly.
Step 4: Remove Taints and Migrate Workloads
Once you're confident, remove the taints from Graviton nodes and taint the old x86 nodes:
1# Remove taint from Graviton nodes
2kubectl get nodes -l node-type=graviton \
3 -o jsonpath='{.items[*].metadata.name}' | \
4 xargs -I {} kubectl taint nodes {} arch=arm64:NoSchedule-
5
6# Taint old x86 nodes to prevent new scheduling
7kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-x86 \
8 -o jsonpath='{.items[*].metadata.name}' | \
9 xargs -I {} kubectl taint nodes {} arch=x86:NoSchedule
10
11# Cordon and drain old nodes
12OLD_NODES=$(kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-x86 \
13 -o jsonpath='{.items[*].metadata.name}')
14
15for node in $OLD_NODES; do
16 kubectl cordon $node
17 kubectl drain $node \
18 --ignore-daemonsets \
19 --delete-emptydir-data \
20 --grace-period=60 \
21 --timeout=300s
22 echo "Drained $node, waiting 30s..."
23 sleep 30
24doneStep 5: Delete Old Node Group
aws eks delete-nodegroup \
--cluster-name my-cluster \
--nodegroup-name workers-x86 \
--region us-east-1nodeSelector for Mixed Clusters
If you're running a mixed cluster (some x86, some Graviton) during migration or permanently (for x86-only workloads), use nodeSelector to be explicit:
1# Force x86 scheduling for a workload with x86-only images
2spec:
3 nodeSelector:
4 kubernetes.io/arch: amd64
5
6# Force ARM64 scheduling
7spec:
8 nodeSelector:
9 kubernetes.io/arch: arm64Kubernetes automatically labels nodes with kubernetes.io/arch. No additional configuration needed.
Cost Comparison: Real Numbers
| Instance Type | vCPU | Memory | On-Demand (us-east-1) | Architecture |
|---|---|---|---|---|
| m7i.large | 2 | 8GB | $0.1008/hr | x86_64 |
| m7g.large | 2 | 8GB | $0.0798/hr | ARM64 (Graviton3) |
| m7i.xlarge | 4 | 16GB | $0.2016/hr | x86_64 |
| m7g.xlarge | 4 | 16GB | $0.1596/hr | ARM64 (Graviton3) |
| c7i.large | 2 | 4GB | $0.0893/hr | x86_64 |
| c7g.large | 2 | 4GB | $0.0725/hr | ARM64 (Graviton3) |
| r7i.large | 2 | 16GB | $0.1323/hr | x86_64 |
| r7g.large | 2 | 16GB | $0.1058/hr | ARM64 (Graviton3) |
Savings are consistent at 20-21% for on-demand. For Reserved Instances and Savings Plans, the absolute dollar savings are larger. If you're running 100 m7i.large equivalents at on-demand rates, switching to m7g.large saves ~$18K/year.
Common Issues After Migration
Image pull errors: exec format error in pod events means you're running an x86 image on an ARM64 node. The image pull succeeds (the manifest exists), but the binary won't execute. Fix: rebuild with multi-arch support.
Missing arm64 manifest on DockerHub: Some images have a manifest list that claims arm64 support but the actual arm64 image is empty or wrong. Test by running docker run --platform linux/arm64 <image> <command> before migrating.
Performance regression: If Graviton is slower than expected, check if your software is hitting a software emulation path for missing ARM instructions. This is rare with modern software but can happen with older or less-maintained packages.
Monitoring gaps: Your APM agent may not start correctly on ARM64. Check agent logs during the testing phase — silent failures are the worst case.
Running EKS and want to estimate your actual Graviton savings before committing to a migration? Talk to us at Coding Protocols. We audit your workloads for ARM64 compatibility and handle the migration with a zero-downtime blue-green strategy.


