Rightsizing AWS Costs: Finding and Fixing Overprovisioned Resources
Most AWS bills have 20–40% waste from overprovisioned EC2 instances, underused RDS, and forgotten resources. This tutorial shows you how to find it systematically and reduce it without impacting reliability.
Before you begin
- AWS account with billing access
- AWS CLI configured
- Basic understanding of EC2 and RDS
AWS bills grow because it's easier to overprovision than to tune. A t3.xlarge "just to be safe" instead of t3.medium doubles the instance cost. An RDS db.r5.large with 10% CPU average is wasting 90% of its compute.
This tutorial gives you a systematic process to find the waste and act on it.
The Four Categories of AWS Waste
- Overprovisioned resources — right type, wrong size (most common)
- Unused resources — running but doing nothing (snapshots, idle EBS, stopped instances)
- Wrong purchase type — On-Demand when Reserved or Spot would be cheaper
- Wrong storage class — S3 Standard for cold data, gp2 instead of gp3
Step 1: Get the High-Level Picture with Cost Explorer
In the AWS Console → Cost Management → Cost Explorer:
- Set date range to last 3 months
- Group by Service — see what's driving the bill
- Group by Usage Type — see EC2 instances, data transfer, storage separately
- Filter to your top spending service, group by Instance Type
Look for:
- Instance types with low utilisation (Cost Explorer shows rightsizing recommendations)
- Data transfer costs (often hidden, can be 20% of bill)
- NAT Gateway data processed (usually avoidable)
Enable Cost Explorer Rightsizing Recommendations: Cost Management → Rightsizing Recommendations. This uses CloudWatch CPU metrics to suggest instance downsizes.
Step 2: Find Underutilised EC2 Instances
1# Find instances with avg CPU < 10% over 14 days
2aws cloudwatch get-metric-statistics \
3 --namespace AWS/EC2 \
4 --metric-name CPUUtilization \
5 --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
6 --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%SZ) \
7 --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
8 --period 86400 \
9 --statistics Average
10
11# Better: use AWS Compute Optimizer
12aws compute-optimizer get-ec2-instance-recommendations \
13 --query "instanceRecommendations[?finding=='OVER_PROVISIONED'].[instanceArn,recommendationOptions[0].instanceType,utilizationMetrics[0].value]" \
14 --output tableAWS Compute Optimizer uses 14 days of CloudWatch data to recommend downsizes. Enable it first:
aws compute-optimizer update-enrollment-status --status ActiveWait 24 hours for it to process your account, then check recommendations:
aws compute-optimizer get-ec2-instance-recommendations \
--filters name=finding,values=OVER_PROVISIONEDStep 3: Rightsize EKS Pod Resource Requests
In Kubernetes, requests determine which node a pod lands on. Overprovisioned requests leave nodes underutilised — you pay for capacity that sits idle.
Find pods with low CPU utilisation using Prometheus:
1# Pods using less than 20% of their CPU request (over 24h)
2(
3 sum by (pod, namespace) (
4 rate(container_cpu_usage_seconds_total{container!=""}[24h])
5 )
6 /
7 sum by (pod, namespace) (
8 kube_pod_container_resource_requests{resource="cpu", container!=""}
9 )
10) < 0.21# Memory: pods using less than 30% of their memory request
2(
3 sum by (pod, namespace) (
4 container_memory_working_set_bytes{container!=""}
5 )
6 /
7 sum by (pod, namespace) (
8 kube_pod_container_resource_requests{resource="memory", container!=""}
9 )
10) < 0.3Install the Vertical Pod Autoscaler (VPA) in recommendation mode to get automated suggestions:
1git clone https://github.com/kubernetes/autoscaler.git
2cd autoscaler/vertical-pod-autoscaler
3./hack/vpa-up.sh
4
5# Create a VPA object in recommendation mode (won't change pods automatically)
6kubectl apply -f - <<EOF
7apiVersion: autoscaling.k8s.io/v1
8kind: VerticalPodAutoscaler
9metadata:
10 name: my-app-vpa
11 namespace: production
12spec:
13 targetRef:
14 apiVersion: apps/v1
15 kind: Deployment
16 name: my-app
17 updatePolicy:
18 updateMode: "Off" # Recommend only, don't change pods
19EOF
20
21# After 24h, check recommendations
22kubectl describe vpa my-app-vpa -n productionVPA recommendations:
Container Recommendations:
Container Name: app
Lower Bound:
Cpu: 25m
Memory: 64Mi
Target: ← use this
Cpu: 100m
Memory: 256Mi
Upper Bound:
Cpu: 500m
Memory: 512Mi
Uncapped Target:
Cpu: 87m
Memory: 230Mi
Update your deployment with the Target values. Then re-check in a week.
Step 4: Find Unused Resources
Idle Elastic IPs (charged when unattached):
aws ec2 describe-addresses \
--query "Addresses[?AssociationId==null].[AllocationId,PublicIp]" \
--output tableUnattached EBS volumes:
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query "Volumes[*].[VolumeId,Size,CreateTime]" \
--output tableOld snapshots (more than 30 days, no associated instance):
aws ec2 describe-snapshots \
--owner-ids self \
--query "Snapshots[?StartTime<='$(date -u -v-30d +%Y-%m-%d)'].[SnapshotId,StartTime,VolumeSize]" \
--output tableLoad balancers with no targets:
1aws elbv2 describe-target-groups \
2 --query "TargetGroups[*].[TargetGroupArn,TargetType]" \
3 --output text | while read arn type; do
4 count=$(aws elbv2 describe-target-health \
5 --target-group-arn $arn \
6 --query "length(TargetHealthDescriptions)" \
7 --output text)
8 if [ "$count" = "0" ]; then
9 echo "Empty target group: $arn"
10 fi
11 doneOld RDS snapshots:
aws rds describe-db-snapshots \
--query "DBSnapshots[?SnapshotCreateTime<='$(date -u -v-30d +%Y-%m-%d)' && SnapshotType=='manual'].[DBSnapshotIdentifier,AllocatedStorage,SnapshotCreateTime]" \
--output tableStep 5: Switch gp2 EBS to gp3
gp3 is cheaper than gp2 at the same baseline performance and gives you 3000 IOPS free (vs gp2's baseline that scales with volume size). For most volumes, gp3 is a drop-in replacement that costs 20% less.
1# Find all gp2 volumes
2aws ec2 describe-volumes \
3 --filters Name=volume-type,Values=gp2 \
4 --query "Volumes[*].[VolumeId,Size]" \
5 --output text | while read vol_id size; do
6 echo "Modifying $vol_id ($size GB) to gp3..."
7 aws ec2 modify-volume --volume-id $vol_id --volume-type gp3
8 doneThe modification is live — no downtime, no detaching. It takes a few minutes per volume.
For EKS, update the StorageClass:
1apiVersion: storage.k8s.io/v1
2kind: StorageClass
3metadata:
4 name: gp3
5 annotations:
6 storageclass.kubernetes.io/is-default-class: "true"
7provisioner: ebs.csi.aws.com
8parameters:
9 type: gp3
10 iops: "3000"
11 throughput: "125"Step 6: Reserved Instances for Predictable Workloads
For EC2 instances that run 24/7, a 1-year Reserved Instance saves 40% over On-Demand. For 3-year, it's 60%.
Only reserve instance types and sizes you're confident won't change. Use Compute Savings Plans instead of RIs for more flexibility — they apply to any instance type and family automatically.
Check your On-Demand spend:
aws ce get-cost-and-usage \
--time-period Start=2026-01-01,End=2026-04-01 \
--granularity MONTHLY \
--filter '{"Dimensions":{"Key":"PURCHASE_TYPE","Values":["On Demand"]}}' \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=INSTANCE_TYPEFor EKS node groups, use Spot instances for stateless workloads with a fallback to On-Demand:
aws eks create-nodegroup \
--cluster-name my-cluster \
--nodegroup-name spot-workers \
--capacity-type SPOT \
--instance-types t3.medium t3.large t3a.medium \
--scaling-config minSize=2,maxSize=20,desiredSize=5Mix of 3+ instance types reduces the chance of Spot interruption in a single AZ.
Tracking Progress
Set a monthly budget alert:
1aws budgets create-budget \
2 --account-id $(aws sts get-caller-identity --query Account --output text) \
3 --budget '{
4 "BudgetName": "Monthly AWS Budget",
5 "BudgetLimit": {"Amount": "1000", "Unit": "USD"},
6 "TimeUnit": "MONTHLY",
7 "BudgetType": "COST"
8 }' \
9 --notifications-with-subscribers '[{
10 "Notification": {
11 "NotificationType": "ACTUAL",
12 "ComparisonOperator": "GREATER_THAN",
13 "Threshold": 80,
14 "ThresholdType": "PERCENTAGE"
15 },
16 "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "you@example.com"}]
17 }]'Review Cost Explorer weekly for the first month after changes. Cost reductions take a full billing cycle to show up clearly.
Official References
- AWS Compute Optimizer — AWS's free rightsizing recommendations for EC2, ECS, Lambda, and EBS
- AWS Cost Explorer — Cost Explorer docs: filtering, grouping, Reserved Instance coverage, and Savings Plans recommendations
- Kubecost Documentation — In-cluster Kubernetes cost visibility: per-namespace, per-deployment, and per-pod cost allocation
- AWS Graviton Processor — ARM-based instances with up to 40% better price-performance vs equivalent x86 — a primary rightsizing lever
- EC2 Instance Selector — CLI tool for filtering EC2 instance types by vCPU, memory, and network requirements
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.