The Problem: Paying for Peak Capacity 24/7
When a national advertising agency came to us, their AWS bill was growing faster than their revenue. The root cause was straightforward once we looked at the data: they were provisioned for worst-case load around the clock.
Their ad-serving infrastructure, campaign analytics pipelines, and creative asset processing all ran on fixed-size EC2 fleets. During campaign launches, traffic would spike 4-5x - so the team had sized everything to handle that peak. The other 90% of the time, those instances were sitting at 15-25% CPU utilization.
It's a pattern we see constantly in adtech and media companies. Traffic is inherently spiky - tied to campaign launches, dayparting, and seasonal cycles - but infrastructure stays flat.
What We Found in the Utilization Audit
Before recommending anything, we pulled four weeks of CloudWatch metrics across their entire compute fleet. The numbers told a clear story:
- 72% of instances were running below 30% average CPU utilization
- Campaign analytics clusters peaked for 3-4 hours per day, then sat nearly idle
- Creative asset processing spiked during business hours but dropped to near-zero overnight and on weekends
- Ad-serving instances had legitimate traffic variation but were sized for the absolute worst case with no scaling policy
The waste wasn't coming from one place - it was spread across the entire infrastructure. But the fix didn't require rearchitecting anything. It required matching capacity to demand in real time.
The Auto Scaling Strategy
We designed three different scaling approaches matched to the actual traffic patterns of each workload:
1. Target Tracking for Ad-Serving
Ad-serving traffic is unpredictable - it depends on which campaigns are active, bid volumes, and real-time auction activity. For these workloads, we implemented target tracking auto scaling policies tied to CPU utilization and request count.
The key was setting the right target. Too aggressive and you get scaling thrash. Too conservative and you're still overpaying. We tested with historical traffic data and landed on a 60% CPU target with a 3-minute cooldown - responsive enough to handle a campaign launch spike within minutes, stable enough to avoid constant scaling noise.
2. Scheduled Scaling for Predictable Patterns
Campaign analytics and reporting followed a clear daily pattern: heavy usage during business hours (8am-8pm EST), minimal overnight. Weekend traffic dropped 70% compared to weekdays.
For these workloads, scheduled scaling was the obvious choice. We set up cron-based policies that scaled the fleet down by 60% overnight and 70% on weekends, then back up before the next business day. Simple, predictable, and immediately effective.
3. Queue-Based Scaling for Processing
Creative asset processing - video transcoding, image resizing, banner generation - was driven by internal submissions, not external traffic. We replaced the fixed fleet with an auto scaling group tied to SQS queue depth.
When designers uploaded a batch of assets, the queue filled up, instances spun up to process them, and then scaled back to a minimal baseline once the queue drained. Processing time stayed the same. Costs dropped dramatically because we stopped paying for idle capacity between batches.
The Results
You can read the full case study for the complete breakdown, but the headline numbers:
- 30% reduction in annual AWS compute spend
- Zero degradation to ad-serving latency or analytics throughput
- Compute costs now track actual demand instead of theoretical peak capacity
- Campaign launch spikes handled automatically - no more manual instance provisioning
- Overnight and weekend waste eliminated through scheduled scaling
The 30% number is the annual average. During off-peak hours, the savings are significantly higher - closer to 60-70% compared to what they were paying before.
Why This Matters for AdTech Companies
Advertising and media companies have some of the spikiest traffic patterns in any industry. Campaign launches, real-time bidding, seasonal events, and dayparting all create load that varies dramatically hour to hour.
If you're running a fixed compute fleet to handle those spikes, you're almost certainly overpaying. The gap between peak capacity and average utilization is where the waste lives.
Auto scaling isn't a new concept - but the implementation details matter. The wrong scaling policy can cause latency spikes during scale-up events or leave you exposed during a traffic surge. Getting it right requires understanding both the traffic patterns and the tolerance for scaling lag in each workload.
Getting Started
If your AWS bill feels too high relative to your actual traffic, start here:
- Pull utilization data - look at 4 weeks of CloudWatch metrics for CPU, memory, and network across your compute fleet
- Identify the pattern - is the workload spiky, predictable, or queue-driven? The pattern determines the scaling approach
- Size your baseline - what's the minimum fleet size you need at the quietest point? That's your starting point
- Test with historical data - use past traffic patterns to validate your scaling policy before applying it to production
If you want help analyzing your AWS environment for auto scaling opportunities, schedule a free consultation. We'll look at your actual utilization data and tell you where the savings are.