Over-Provisioned by Default
It's one of the most common patterns in AWS: an ad platform spins up infrastructure to handle a big campaign, the campaign ends, and the infrastructure stays. Multiply that by two years of growth and you get what we found at a national advertising agency - a compute fleet where the average instance was using less than a third of its provisioned resources.
The team wasn't being careless. They were being cautious. In adtech, latency kills revenue. A slow ad-serve means lost impressions and unhappy clients. So when in doubt, the instinct was always to go bigger - larger instance types, more memory, faster storage. The problem is that "bigger" has a monthly cost, and over time those costs compound.
When we audited their environment, the oversizing was consistent across the board. But so was the opportunity.
The Rightsizing Process
Rightsizing sounds simple - use smaller instances. In practice, it requires careful analysis to avoid performance problems. Here's how we approached it:
Step 1: Baseline Everything
We pulled CloudWatch metrics for every EC2 instance over a 30-day window. For each instance, we tracked:
- Average and peak CPU utilization
- Memory usage (via CloudWatch agent - not available by default)
- Network throughput (in/out)
- Disk I/O (IOPS and throughput)
This gave us a realistic picture of what each instance actually needed versus what it was provisioned for. The CloudWatch agent piece is critical - without memory metrics, you're guessing.
Step 2: Categorize and Prioritize
We grouped instances by workload type and sorted by waste:
| Workload | Instance Type | Avg CPU | Avg Memory | Recommendation | |----------|--------------|---------|------------|----------------| | Ad-serving API | r5.2xlarge | 18% | 22% | r5.large | | Campaign analytics | m5.4xlarge | 25% | 31% | m5.xlarge | | Asset processing | c5.4xlarge | 35% | 15% | c5.xlarge | | Internal tools | t3.xlarge | 8% | 12% | t3.medium |
The ad-serving API was the biggest opportunity: running on r5.2xlarge instances (8 vCPUs, 64 GB RAM) when actual workload needed an r5.large (2 vCPUs, 16 GB RAM). That's a 4x overprovisioning on a fleet of dozens of instances.
Step 3: Test Before Switching
We didn't just swap instance types and hope for the best. For each workload category, we:
- Launched the recommended instance type alongside the existing one
- Shifted a portion of traffic to the new instance
- Monitored CPU, memory, latency, and error rates for 48-72 hours
- Confirmed performance parity before proceeding
This validation step is where most DIY rightsizing efforts fall short. The data might say an instance is underutilized, but you need to verify that the new size can handle traffic spikes and peak processing loads - not just the average.
Step 4: Execute the Migration
Once validated, we migrated workloads in rolling batches. Zero downtime, zero performance impact. The compute savings alone were significant - but we weren't done.
Storage Optimization: The Hidden Cost Center
Storage costs tend to be invisible. They're not as dramatic as compute costs, and they accumulate gradually. But in an adtech environment generating terabytes of campaign data, creative assets, and analytics output, storage adds up fast.
Here's what we found and fixed:
EBS Volume Rightsizing
- gp2 volumes sized for burst performance they never used - Many volumes were provisioned at 500GB+ specifically because gp2 ties IOPS to volume size. We migrated these to gp3, which provides 3,000 baseline IOPS regardless of size, then reduced volumes to match actual usage
- Provisioned IOPS (io1) volumes on non-critical workloads - Some internal analytics databases were running on io1 volumes provisioned for 10,000 IOPS when actual usage averaged 800 IOPS. Migrated to gp3 with a cost reduction of over 70% per volume
- Orphaned volumes - 23 EBS volumes attached to nothing. Leftover from terminated instances that someone forgot to clean up. Easy savings - just delete them after confirming no data recovery need
S3 Lifecycle Policies
The agency's S3 buckets were a time capsule. Campaign assets, raw analytics data, and processed reports going back years - all sitting in S3 Standard storage class at full price.
We implemented tiered lifecycle policies:
- Campaign assets older than 90 days → S3 Infrequent Access (40% cheaper)
- Raw analytics data older than 30 days → S3 Infrequent Access
- Raw analytics data older than 180 days → S3 Glacier Instant Retrieval (68% cheaper)
- Processed reports older than 1 year → S3 Glacier Deep Archive (95% cheaper)
- Incomplete multipart uploads → auto-abort after 7 days
The lifecycle policies alone reduced S3 costs meaningfully without deleting a single file. The data is still accessible - it just costs less to store.
Snapshot Cleanup
EBS snapshots are another quiet cost accumulator. The team had daily snapshots going back 14 months with no retention policy. We implemented automated snapshot management:
- Keep daily snapshots for 7 days
- Keep weekly snapshots for 4 weeks
- Keep monthly snapshots for 12 months
- Delete everything else
This reduced snapshot storage costs substantially while maintaining a reasonable backup history.
Combined Results
The full results are detailed in our advertising firm case study, but the rightsizing and storage optimization specifically delivered:
- Instance costs reduced by approximately 40% through rightsizing alone - no application changes required
- EBS costs reduced by over 50% through gp3 migration, volume rightsizing, and orphan cleanup
- S3 costs reduced by approximately 35% through lifecycle policies and storage class optimization
- Snapshot costs cut significantly through automated retention policies
- Zero performance degradation - validated through staged testing before every change
The total AWS cost reduction across all optimization categories came to 30% annually. Rightsizing and storage optimization were the largest contributors, but they were part of a broader effort that included auto scaling optimization and commitment strategy (Savings Plans for the newly right-sized baseline).
Why AdTech Companies Are Especially Prone to This
Advertising technology companies share a few characteristics that make them particularly susceptible to over-provisioning and storage sprawl:
- Revenue-sensitive latency - When a slow response means lost ad impressions, teams default to bigger infrastructure. Rightsizing feels risky when your revenue depends on milliseconds
- Campaign-driven data accumulation - Every campaign generates assets, analytics, and reports. Without lifecycle policies, storage grows indefinitely
- Rapid growth - AdTech companies often scale infrastructure quickly to support new clients. The infrastructure decisions made during rapid growth rarely get revisited
- Peak provisioning - Campaign launches create genuine spikes. But provisioning for the spike and leaving it running is dramatically more expensive than scaling dynamically
Where to Start
If you suspect your AWS environment is over-provisioned, here's a practical starting point:
- Install the CloudWatch agent on every EC2 instance if you haven't already - you need memory metrics, not just CPU
- Pull 30 days of utilization data and look for instances consistently below 40% CPU and memory
- Check your EBS volumes - are you still on gp2? Do you have orphaned volumes? Is anything on io1 that doesn't need it?
- Audit your S3 buckets - sort by size and check the last access dates. If data hasn't been accessed in 90+ days, it belongs in a cheaper storage class
- Review your snapshot retention - if there's no policy, you're probably keeping far more than you need
If you want a thorough analysis of your specific environment, schedule a free consultation. We'll review your utilization data, identify the biggest savings opportunities, and give you a prioritized action plan - whether you implement it yourself or bring us in to help.