AWS Graviton ARM64 Migration Guide for EKS: Savings, Gotchas, and Strategy

The cost case for Graviton is real, and the numbers are concrete. An m7g.large (Graviton3) costs $0.0798/hr in us-east-1. An equivalent m7i.large (Intel) costs $0.1008/hr. That's a 21% savings on compute alone before you factor in the performance differences. For memory-bound workloads, Graviton's memory bandwidth is noticeably better, which means you can often right-size to a smaller instance type.

In practice, teams migrating to Graviton report 20-40% reduction in EC2 costs for general-purpose workloads. For a cluster with $50K/month in EC2 spend, that's $10-20K/month in savings. At $500K/month, the math is obvious.

But the migration is not a one-liner. You're changing CPU architecture from x86_64 to aarch64. Binaries compiled for x86 won't run on ARM64. Container images targeting x86 won't run on Graviton nodes. This is a real migration that requires preparation.

What the Architecture Change Means

x86_64 and aarch64 are different instruction set architectures. When you run a container on a Graviton node, the container must contain ARM64 binaries. The OS kernel handles the ABI, but the user-space code — your application binary, its libraries, the language runtime — must be compiled for ARM64.

Most modern software ships multi-arch images that support both linux/amd64 and linux/arm64. When you docker pull nginx:latest on an ARM64 machine, Docker automatically pulls the ARM64 variant. The problem is with:

Software that doesn't publish ARM64 images — older projects, proprietary tools, some enterprise software
Native extensions and modules — Node.js addons compiled with node-gyp, Python packages with C extensions, JNI libraries
AVX/SSE2 instruction dependencies — code that uses x86-specific SIMD instructions
Statically compiled binaries included in images — build tools, database binaries, agent software bundled at build time

Let me work through each of these.

Auditing Your Images for ARM64 Compatibility

Before migrating a single node, audit every container image you run.

bash

1# Check if an image has an arm64 manifest
2docker buildx imagetools inspect nginx:latest | grep -A2 "linux/arm64"
3
4# For a list of images, use this script
5for image in \
6  nginx:latest \
7  postgres:16 \
8  redis:7 \
9  your-registry/your-app:latest; do
10  echo -n "$image: "
11  docker buildx imagetools inspect $image 2>/dev/null | \
12    grep -q "linux/arm64" && echo "arm64 supported" || echo "ARM64 NOT FOUND"
13done

For ECR images:

bash

1# Check ECR image manifest for arm64 support
2aws ecr describe-images \
3  --repository-name my-repo \
4  --image-ids imageTag=latest \
5  --query 'imageDetails[].imagePushedAt' \
6  --region us-east-1
7
8# Inspect the manifest list
9aws ecr batch-get-image \
10  --repository-name my-repo \
11  --image-ids imageTag=latest \
12  --query 'images[].imageManifest' \
13  --output text | python3 -m json.tool | grep architecture

Do this for every image you deploy. Make a spreadsheet. Column 1: image name. Column 2: arm64 available (yes/no). Column 3: alternative or action needed. Don't skip this step.

Building Multi-Arch Images with Docker Buildx

For images you build yourself, you need to produce multi-arch images. Docker Buildx with QEMU emulation lets you build ARM64 images on x86 machines:

bash

1# Set up QEMU for cross-compilation
2docker run --privileged --rm tonistiigi/binfmt --install all
3
4# Create a new buildx builder that supports multi-platform
5docker buildx create --name multiarch --driver docker-container --use
6docker buildx inspect --bootstrap
7
8# Build and push a multi-arch image
9docker buildx build \
10  --platform linux/amd64,linux/arm64 \
11  --tag my-registry/my-app:latest \
12  --push \
13  .

The --push flag is required for multi-arch builds — multi-platform images can't be loaded into the local Docker daemon, they must be pushed to a registry.

In CI (GitHub Actions):

yaml

1name: Build Multi-Arch Image
2
3on:
4  push:
5    branches: [main]
6
7jobs:
8  build:
9    runs-on: ubuntu-latest
10    steps:
11    - uses: actions/checkout@v4
12
13    - name: Set up QEMU
14      uses: docker/setup-qemu-action@v3
15
16    - name: Set up Docker Buildx
17      uses: docker/setup-buildx-action@v3
18
19    - name: Login to ECR
20      uses: aws-actions/amazon-ecr-login@v2
21
22    - name: Build and push
23      uses: docker/build-push-action@v5
24      with:
25        context: .
26        platforms: linux/amd64,linux/arm64
27        push: true
28        tags: |
29          ${{ env.ECR_REGISTRY }}/my-app:${{ github.sha }}
30          ${{ env.ECR_REGISTRY }}/my-app:latest
31        cache-from: type=gha
32        cache-to: type=gha,mode=max

Note: QEMU emulation is slow. Building an ARM64 image via QEMU on an x86 runner might take 3-5x longer than native. For large builds, consider using ARM64 GitHub Actions runners (available via runs-on: ubuntu-latest-arm64 in GitHub-hosted runners) or AWS CodeBuild with Graviton instances for native ARM64 builds.

What Actually Breaks on ARM64

Node.js Native Addons

Packages that use node-gyp to compile native addons must be recompiled for ARM64. The image build will fail during npm install if the native module doesn't support ARM64, or the module will work fine because it has prebuilt binaries for ARM64.

Common packages with ARM64 support: bcrypt (use bcryptjs pure-JS instead — it's faster on modern hardware), sharp (full ARM64 support since v0.29), canvas, sqlite3.

Check any native module by looking for ARM64 in its prebuilt binary list:

bash

# Check if a package has ARM64 prebuilds
npm info <package> | grep -i arm

Python Packages with C Extensions

Most major data science and ML packages (numpy, scipy, pandas, Pillow) now ship ARM64 wheels. But some smaller packages still lack ARM64 wheels and require compilation from source, which means you need build tools in your base image:

dockerfile

1FROM python:3.12-slim
2
3# Add build tools for packages that need compilation
4RUN apt-get update && apt-get install -y \
5    gcc \
6    g++ \
7    python3-dev \
8    libffi-dev \
9    && rm -rf /var/lib/apt/lists/*
10
11RUN pip install --no-cache-dir -r requirements.txt

This adds ~200MB to your image but ensures packages compile correctly. Alternatively, use a multi-stage build and copy only the compiled packages:

dockerfile

FROM python:3.12 AS builder
RUN pip install --no-cache-dir --target /install -r requirements.txt

FROM python:3.12-slim
COPY --from=builder /install /usr/local/lib/python3.12/site-packages

AVX/SSE2 Dependencies

Some software uses x86-specific SIMD instructions (SSE2, AVX2, AVX-512) for performance-critical paths. This is common in:

Machine learning inference: TensorFlow and PyTorch have Graviton-optimized builds now, but older versions may fall back to slower paths
Database binaries: some PostgreSQL extensions use AVX for performance
Compression libraries: some configurations of zlib, lz4 have x86-specific optimizations

For ML workloads specifically, AWS provides Graviton-optimized builds:

dockerfile

# Use AWS-provided DLCs for ML workloads
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-graviton:2.1.0-cpu-py310-ubuntu20.04-ec2

Third-Party Agents and Sidecars

Monitoring agents, service mesh sidecars, security scanners bundled as init containers — all need ARM64 support. Common ones:

Agent	ARM64 Support
Datadog Agent	Yes (since 7.34)
New Relic Infrastructure	Yes
Dynatrace OneAgent	Yes
Falco	Yes
Aqua Security	Yes
Wiz	Check current version
Lacework	Yes
AppDynamics	Partial — check your version

Verify every agent before migrating. A sidecar that silently fails on ARM64 can cause init containers to fail or monitoring to disappear without clear error messages.

EKS Blue-Green Node Group Migration

This is the migration strategy I use. Never in-place migrate existing nodes — you want a clear rollback path (see zero-downtime cluster upgrades).

Step 1: Create Graviton Node Group

bash

1# Create a new node group with Graviton instances
2aws eks create-nodegroup \
3  --cluster-name my-cluster \
4  --nodegroup-name workers-graviton \
5  --node-role arn:aws:iam::123456789:role/eks-node-role \
6  --subnets subnet-abc123 subnet-def456 \
7  --scaling-config minSize=2,maxSize=20,desiredSize=5 \
8  --ami-type AL2_ARM_64 \
9  --instance-types m7g.large \
10  --kubernetes-version 1.30 \
11  --labels '{"node-type":"graviton","arch":"arm64"}' \
12  --region us-east-1
13
14# Wait for node group to be active
15aws eks wait nodegroup-active \
16  --cluster-name my-cluster \
17  --nodegroup-name workers-graviton

Note AL2_ARM_64 — this is the Graviton-compatible Amazon Linux 2 AMI. For AL2023 use AL2023_ARM_64_STANDARD.

Step 2: Taint Graviton Nodes Initially

To avoid workloads scheduling on Graviton nodes before you've verified compatibility, taint them:

bash

# Taint all new Graviton nodes
kubectl get nodes -l node-type=graviton \
  -o jsonpath='{.items[*].metadata.name}' | \
  xargs -I {} kubectl taint nodes {} arch=arm64:NoSchedule

Now only pods with a toleration for arch=arm64:NoSchedule will schedule on Graviton nodes.

Step 3: Test a Non-Critical Workload First

Pick a stateless, non-critical service with multi-arch images confirmed. Add a toleration and node affinity:

yaml

1spec:
2  tolerations:
3  - key: arch
4    operator: Equal
5    value: arm64
6    effect: NoSchedule
7  affinity:
8    nodeAffinity:
9      requiredDuringSchedulingIgnoredDuringExecution:
10        nodeSelectorTerms:
11        - matchExpressions:
12          - key: node-type
13            operator: In
14            values:
15            - graviton

Deploy it, monitor it, confirm it works. Check your monitoring agent is collecting metrics, your logs are flowing, your application is behaving correctly.

Step 4: Remove Taints and Migrate Workloads

Once you're confident, remove the taints from Graviton nodes and taint the old x86 nodes:

bash

1# Remove taint from Graviton nodes
2kubectl get nodes -l node-type=graviton \
3  -o jsonpath='{.items[*].metadata.name}' | \
4  xargs -I {} kubectl taint nodes {} arch=arm64:NoSchedule-
5
6# Taint old x86 nodes to prevent new scheduling
7kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-x86 \
8  -o jsonpath='{.items[*].metadata.name}' | \
9  xargs -I {} kubectl taint nodes {} arch=x86:NoSchedule
10
11# Cordon and drain old nodes
12OLD_NODES=$(kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-x86 \
13  -o jsonpath='{.items[*].metadata.name}')
14
15for node in $OLD_NODES; do
16  kubectl cordon $node
17  kubectl drain $node \
18    --ignore-daemonsets \
19    --delete-emptydir-data \
20    --grace-period=60 \
21    --timeout=300s
22  echo "Drained $node, waiting 30s..."
23  sleep 30
24done

Step 5: Delete Old Node Group

bash

aws eks delete-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name workers-x86 \
  --region us-east-1

nodeSelector for Mixed Clusters

If you're running a mixed cluster (some x86, some Graviton) during migration or permanently (for x86-only workloads), use nodeSelector to be explicit:

yaml

1# Force x86 scheduling for a workload with x86-only images
2spec:
3  nodeSelector:
4    kubernetes.io/arch: amd64
5
6# Force ARM64 scheduling
7spec:
8  nodeSelector:
9    kubernetes.io/arch: arm64

Kubernetes automatically labels nodes with kubernetes.io/arch. No additional configuration needed.

Cost Comparison: Real Numbers

Instance Type	vCPU	Memory	On-Demand (us-east-1)	Architecture
m7i.large	2	8GB	$0.1008/hr	x86_64
m7g.large	2	8GB	$0.0798/hr	ARM64 (Graviton3)
m7i.xlarge	4	16GB	$0.2016/hr	x86_64
m7g.xlarge	4	16GB	$0.1596/hr	ARM64 (Graviton3)
c7i.large	2	4GB	$0.0893/hr	x86_64
c7g.large	2	4GB	$0.0725/hr	ARM64 (Graviton3)
r7i.large	2	16GB	$0.1323/hr	x86_64
r7g.large	2	16GB	$0.1058/hr	ARM64 (Graviton3)

Savings are consistent at 20-21% for on-demand. For Reserved Instances and Savings Plans, the absolute dollar savings are larger. If you're running 100 m7i.large equivalents at on-demand rates, switching to m7g.large saves ~$18K/year.

Common Issues After Migration

Image pull errors: exec format error in pod events means you're running an x86 image on an ARM64 node. The image pull succeeds (the manifest exists), but the binary won't execute. Fix: rebuild with multi-arch support.

Missing arm64 manifest on DockerHub: Some images have a manifest list that claims arm64 support but the actual arm64 image is empty or wrong. Test by running docker run --platform linux/arm64 <image> <command> before migrating.

Performance regression: If Graviton is slower than expected, check if your software is hitting a software emulation path for missing ARM instructions. This is rare with modern software but can happen with older or less-maintained packages.

Monitoring gaps: Your APM agent may not start correctly on ARM64. Check agent logs during the testing phase — silent failures are the worst case.

Running EKS and want to estimate your actual Graviton savings before committing to a migration? Talk to us at Coding Protocols. We audit your workloads for ARM64 compatibility and handle the migration with a zero-downtime blue-green strategy.

AWS Graviton ARM64 Migration Guide: Real Cost Savings and What Actually Breaks

What the Architecture Change Means

Auditing Your Images for ARM64 Compatibility

Building Multi-Arch Images with Docker Buildx

What Actually Breaks on ARM64

Node.js Native Addons

Python Packages with C Extensions

AVX/SSE2 Dependencies

Third-Party Agents and Sidecars

EKS Blue-Green Node Group Migration

Step 1: Create Graviton Node Group

Step 2: Taint Graviton Nodes Initially

Step 3: Test a Non-Critical Workload First

Step 4: Remove Taints and Migrate Workloads

Step 5: Delete Old Node Group

nodeSelector for Mixed Clusters

Cost Comparison: Real Numbers

Common Issues After Migration

Related Topics

Read Next

The Ultimate Guide to Kubernetes Cost Optimization on AWS

EKS Auto Mode vs. GKE Autopilot: Choosing the Right Managed Experience

The K8s Cloud Wars: EKS vs GKE vs AKS (2026 Edition)