AWS bills grow because it's easier to overprovision than to tune. A t3.xlarge "just to be safe" instead of t3.medium doubles the instance cost. An RDS db.r5.large with 10% CPU average is wasting 90% of its compute.

This tutorial gives you a systematic process to find the waste and act on it.

The Four Categories of AWS Waste

Overprovisioned resources — right type, wrong size (most common)
Unused resources — running but doing nothing (snapshots, idle EBS, stopped instances)
Wrong purchase type — On-Demand when Reserved or Spot would be cheaper
Wrong storage class — S3 Standard for cold data, gp2 instead of gp3

Step 1: Get the High-Level Picture with Cost Explorer

In the AWS Console → Cost Management → Cost Explorer:

Set date range to last 3 months
Group by Service — see what's driving the bill
Group by Usage Type — see EC2 instances, data transfer, storage separately
Filter to your top spending service, group by Instance Type

Look for:

Instance types with low utilisation (Cost Explorer shows rightsizing recommendations)
Data transfer costs (often hidden, can be 20% of bill)
NAT Gateway data processed (usually avoidable)

Enable Cost Explorer Rightsizing Recommendations: Cost Management → Rightsizing Recommendations. This uses CloudWatch CPU metrics to suggest instance downsizes.

Step 2: Find Underutilised EC2 Instances

bash

1# Find instances with avg CPU < 10% over 14 days
2aws cloudwatch get-metric-statistics \
3  --namespace AWS/EC2 \
4  --metric-name CPUUtilization \
5  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
6  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%SZ) \
7  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
8  --period 86400 \
9  --statistics Average
10
11# Better: use AWS Compute Optimizer
12aws compute-optimizer get-ec2-instance-recommendations \
13  --query "instanceRecommendations[?finding=='OVER_PROVISIONED'].[instanceArn,recommendationOptions[0].instanceType,utilizationMetrics[0].value]" \
14  --output table

AWS Compute Optimizer uses 14 days of CloudWatch data to recommend downsizes. Enable it first:

bash

aws compute-optimizer update-enrollment-status --status Active

Wait 24 hours for it to process your account, then check recommendations:

bash

aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=finding,values=OVER_PROVISIONED

Step 3: Rightsize EKS Pod Resource Requests

In Kubernetes, requests determine which node a pod lands on. Overprovisioned requests leave nodes underutilised — you pay for capacity that sits idle.

Find pods with low CPU utilisation using Prometheus:

promql

1# Pods using less than 20% of their CPU request (over 24h)
2(
3  sum by (pod, namespace) (
4    rate(container_cpu_usage_seconds_total{container!=""}[24h])
5  )
6  /
7  sum by (pod, namespace) (
8    kube_pod_container_resource_requests{resource="cpu", container!=""}
9  )
10) < 0.2

promql

1# Memory: pods using less than 30% of their memory request
2(
3  sum by (pod, namespace) (
4    container_memory_working_set_bytes{container!=""}
5  )
6  /
7  sum by (pod, namespace) (
8    kube_pod_container_resource_requests{resource="memory", container!=""}
9  )
10) < 0.3

Install the Vertical Pod Autoscaler (VPA) in recommendation mode to get automated suggestions:

bash

1git clone https://github.com/kubernetes/autoscaler.git
2cd autoscaler/vertical-pod-autoscaler
3./hack/vpa-up.sh
4
5# Create a VPA object in recommendation mode (won't change pods automatically)
6kubectl apply -f - <<EOF
7apiVersion: autoscaling.k8s.io/v1
8kind: VerticalPodAutoscaler
9metadata:
10  name: my-app-vpa
11  namespace: production
12spec:
13  targetRef:
14    apiVersion: apps/v1
15    kind: Deployment
16    name: my-app
17  updatePolicy:
18    updateMode: "Off"   # Recommend only, don't change pods
19EOF
20
21# After 24h, check recommendations
22kubectl describe vpa my-app-vpa -n production

VPA recommendations:

  Container Recommendations:
    Container Name:  app
    Lower Bound:
      Cpu:     25m
      Memory:  64Mi
    Target:          ← use this
      Cpu:     100m
      Memory:  256Mi
    Upper Bound:
      Cpu:     500m
      Memory:  512Mi
    Uncapped Target:
      Cpu:     87m
      Memory:  230Mi

Update your deployment with the Target values. Then re-check in a week.

Step 4: Find Unused Resources

Idle Elastic IPs (charged when unattached):

bash

aws ec2 describe-addresses \
  --query "Addresses[?AssociationId==null].[AllocationId,PublicIp]" \
  --output table

Unattached EBS volumes:

bash

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query "Volumes[*].[VolumeId,Size,CreateTime]" \
  --output table

Old snapshots (more than 30 days, no associated instance):

bash

aws ec2 describe-snapshots \
  --owner-ids self \
  --query "Snapshots[?StartTime<='$(date -u -v-30d +%Y-%m-%d)'].[SnapshotId,StartTime,VolumeSize]" \
  --output table

Load balancers with no targets:

bash

1aws elbv2 describe-target-groups \
2  --query "TargetGroups[*].[TargetGroupArn,TargetType]" \
3  --output text | while read arn type; do
4    count=$(aws elbv2 describe-target-health \
5      --target-group-arn $arn \
6      --query "length(TargetHealthDescriptions)" \
7      --output text)
8    if [ "$count" = "0" ]; then
9      echo "Empty target group: $arn"
10    fi
11  done

Old RDS snapshots:

bash

aws rds describe-db-snapshots \
  --query "DBSnapshots[?SnapshotCreateTime<='$(date -u -v-30d +%Y-%m-%d)' && SnapshotType=='manual'].[DBSnapshotIdentifier,AllocatedStorage,SnapshotCreateTime]" \
  --output table

Step 5: Switch gp2 EBS to gp3

gp3 is cheaper than gp2 at the same baseline performance and gives you 3000 IOPS free (vs gp2's baseline that scales with volume size). For most volumes, gp3 is a drop-in replacement that costs 20% less.

bash

1# Find all gp2 volumes
2aws ec2 describe-volumes \
3  --filters Name=volume-type,Values=gp2 \
4  --query "Volumes[*].[VolumeId,Size]" \
5  --output text | while read vol_id size; do
6    echo "Modifying $vol_id ($size GB) to gp3..."
7    aws ec2 modify-volume --volume-id $vol_id --volume-type gp3
8  done

The modification is live — no downtime, no detaching. It takes a few minutes per volume.

For EKS, update the StorageClass:

yaml

1apiVersion: storage.k8s.io/v1
2kind: StorageClass
3metadata:
4  name: gp3
5  annotations:
6    storageclass.kubernetes.io/is-default-class: "true"
7provisioner: ebs.csi.aws.com
8parameters:
9  type: gp3
10  iops: "3000"
11  throughput: "125"

Step 6: Reserved Instances for Predictable Workloads

For EC2 instances that run 24/7, a 1-year Reserved Instance saves 40% over On-Demand. For 3-year, it's 60%.

Only reserve instance types and sizes you're confident won't change. Use Compute Savings Plans instead of RIs for more flexibility — they apply to any instance type and family automatically.

Check your On-Demand spend:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-04-01 \
  --granularity MONTHLY \
  --filter '{"Dimensions":{"Key":"PURCHASE_TYPE","Values":["On Demand"]}}' \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE

For EKS node groups, use Spot instances for stateless workloads with a fallback to On-Demand:

bash

aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name spot-workers \
  --capacity-type SPOT \
  --instance-types t3.medium t3.large t3a.medium \
  --scaling-config minSize=2,maxSize=20,desiredSize=5

Mix of 3+ instance types reduces the chance of Spot interruption in a single AZ.

Tracking Progress

Set a monthly budget alert:

bash

1aws budgets create-budget \
2  --account-id $(aws sts get-caller-identity --query Account --output text) \
3  --budget '{
4    "BudgetName": "Monthly AWS Budget",
5    "BudgetLimit": {"Amount": "1000", "Unit": "USD"},
6    "TimeUnit": "MONTHLY",
7    "BudgetType": "COST"
8  }' \
9  --notifications-with-subscribers '[{
10    "Notification": {
11      "NotificationType": "ACTUAL",
12      "ComparisonOperator": "GREATER_THAN",
13      "Threshold": 80,
14      "ThresholdType": "PERCENTAGE"
15    },
16    "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "you@example.com"}]
17  }]'

Review Cost Explorer weekly for the first month after changes. Cost reductions take a full billing cycle to show up clearly.

Official References

AWS Compute Optimizer — AWS's free rightsizing recommendations for EC2, ECS, Lambda, and EBS
AWS Cost Explorer — Cost Explorer docs: filtering, grouping, Reserved Instance coverage, and Savings Plans recommendations
Kubecost Documentation — In-cluster Kubernetes cost visibility: per-namespace, per-deployment, and per-pod cost allocation
AWS Graviton Processor — ARM-based instances with up to 40% better price-performance vs equivalent x86 — a primary rightsizing lever
EC2 Instance Selector — CLI tool for filtering EC2 instance types by vCPU, memory, and network requirements

Rightsizing AWS Costs: Finding and Fixing Overprovisioned Resources

Before you begin

The Four Categories of AWS Waste

Step 1: Get the High-Level Picture with Cost Explorer

Step 2: Find Underutilised EC2 Instances

Step 3: Rightsize EKS Pod Resource Requests

Step 4: Find Unused Resources

Step 5: Switch gp2 EBS to gp3

Step 6: Reserved Instances for Predictable Workloads

Tracking Progress

Official References

Struggling with this in production?