7 AWS Mistakes That Show Up in Almost Every Environment
Cloud cost waste is rarely exotic. It is not the result of an obscure misconfiguration or a niche service nobody understands. It is, almost always, a handful of the same boring mistakes — repeated across teams, across companies, across years.
This post collects the seven we think are worth checking first. If your team runs anything on AWS, there’s a strong chance you have at least one of them right now. They’re easy to find. They’re often easy to fix. And collectively, they account for a startling share of the average AWS bill.
1. Idle NAT Gateways quietly burning a few hundred dollars a month
NAT Gateways are one of the most common sources of “wait, what’s that line item?” surprises. At roughly $0.045/hour plus per-GB data processing, a single forgotten NAT Gateway costs around $32/month before any traffic even flows through it. Multiple AZs, multiple VPCs, multiple environments, and the bill grows fast.
The cheapest NAT Gateway is the one you never created. Use VPC endpoints for S3 and DynamoDB traffic, and consolidate NAT Gateways across AZs only where high availability is actually a requirement.
How to find them
# List all NAT Gateways across all regions
for region in $(aws ec2 describe-regions --query 'Regions[].RegionName' --output text); do
aws ec2 describe-nat-gateways --region "$region" \
--query 'NatGateways[?State==`available`].[NatGatewayId,VpcId,SubnetId]' \
--output table
done
Cross-reference the results against your route tables. If nothing points to it, you’re paying for nothing.
2. A large share of spend that isn’t tagged
Tagging is the most boring problem in cloud cost. It is also the most expensive one to ignore.
When monthly spend cannot be attributed to a team, an environment, or a service, FinOps becomes guesswork. It is not uncommon for organisations to have 30-60% of their AWS bill effectively “unallocated” — and you cannot have a serious conversation about cost when nearly half the bill is marked Unknown.
The pattern is almost always the same:
- A tagging policy exists somewhere in Confluence
- It was written 18 months ago
- Nobody enforced it at provisioning time
- Cost Explorer is effectively read-only by the time anyone tries to use it
We’ve got a step-by-step tagging strategy that addresses each of these.
3. Forgotten dev and sandbox accounts
In any organisation that’s been on AWS for more than a year or two, it is almost guaranteed there exists at least one AWS account that:
- Was created for a specific project
- Belongs to someone who has since left the company
- Still has production-grade resources running
- Has not been logged into in 90+ days
These accounts are where the biggest single waste findings tend to hide. A SageMaker endpoint, a Redshift cluster, an EMR job — anything left running in a “test” account that nobody remembers exists — can easily account for thousands of dollars a month, indefinitely, with nobody noticing.
If you have AWS Organizations, run an account inventory. Last-login dates, last-resource-modification dates, owner contacts. If you can’t answer “who owns this account?” within 30 seconds, that’s a finding.
4. Oversized RDS instances with single-digit CPU utilisation
The default for “is this database big enough?” is almost always “make it bigger”. The default for “is this database too big?” is almost never asked.
Average CPU utilisation across managed databases tends to sit in the single digits. Memory utilisation is rarely much better. The vast majority of databases in the cloud are paying for headroom they will never use.
| Instance class | Approx. monthly cost | Right-sized target | Approx. monthly savings |
|---|---|---|---|
| db.r6g.4xlarge | $1,460 | db.r6g.xlarge | ~$1,095 |
| db.m6g.2xlarge | $560 | db.m6g.large | ~$420 |
| db.r6g.8xlarge | $2,920 | db.r6g.2xlarge | ~$2,190 |
Right-sizing requires actually looking at CloudWatch metrics — CPUUtilization, FreeableMemory, DatabaseConnections — over a representative period (at least 2 weeks, ideally 4). Then drop one size and watch for a fortnight before dropping again.
5. CloudWatch Logs retention set to “Never Expire”
The default log retention in CloudWatch is forever. This is almost certainly not what you want.
It’s not unusual to find log groups that have been accumulating data for years — terabytes of logs nobody has queried since they were written. CloudWatch charges per GB ingested and per GB stored, and storage charges are forever if retention is never set.
Set a sensible default — 30 days for application logs, 90 days for audit logs, longer only where compliance requires it — and apply it across every log group. AWS Config rules or a small Lambda can enforce this organisation-wide.
# Show every log group with no retention policy set
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==null].[logGroupName,storedBytes]' \
--output table
6. EBS snapshots multiplying without a lifecycle policy
Snapshots feel free. They are not.
The default cycle of “create a snapshot before that risky change” + “we’ll clean it up later” produces vast graveyards of incremental EBS snapshots. Many will be snapshots of volumes that no longer exist, of instances that were terminated years ago. The storage charges quietly compound.
Amazon Data Lifecycle Manager (DLM) is free to configure. Set a policy: keep the last N snapshots, expire anything older than X days, and apply it across the organisation. Anything older than the policy can be reviewed once and bulk-deleted.
7. Reserved Instances and Savings Plans expiring without anyone noticing
This one hurts. A team commits to a 1- or 3-year Savings Plan, gets a meaningful discount, and then — months or years later — the renewal date passes silently and the next month’s bill jumps 30% with no warning.
The fix is unglamorous but effective:
- Put commitment expiry dates in a shared calendar
- Add them to your team’s runbook
- Set up CloudWatch alarms or AWS Budgets alerts at 60 and 30 days before expiry
- Review your commitment portfolio quarterly, not at renewal time
The waste from an un-renewed Savings Plan is pure, avoidable overspend. There is no upside to letting it lapse without a decision.
What every one of these has in common
None of these are sophisticated. None require a FinOps platform, an AI agent, or a consultant. They require someone to look.
That’s really what an audit is. Someone — preferably someone outside the team that built the environment — taking the time to look at every line item and ask “should this still exist?”
If you’d like a second pair of eyes on your AWS environment, get in touch. The first conversation is free, and there’s usually something worth talking about within the first hour.