Welcome to AWS Wednesday — our weekly look at what's happening in the AWS ecosystem and what it means for real-world architecture decisions.
This week has been one of the most eventful in AWS history. We had the first-ever physical attack on major cloud infrastructure, a 20-year milestone that puts the cloud's maturity in perspective, and a partnership that's redefining what AI inference looks like. Let's dig into all of it.
The Wake-Up Call: AWS Data Centers Hit by Drone Strikes
Earlier this month, Iranian drone strikes damaged multiple AWS data centers in the UAE and Bahrain. Services across the me-south-1 (Bahrain) region experienced significant disruptions, with 84+ services affected. Amazon began migrating customer workloads while the situation was assessed.
This is unprecedented. Not "cloud outage" unprecedented — we've had plenty of those. This is "a geopolitical conflict physically damaged cloud infrastructure" unprecedented. The multi-AZ redundancy that AWS designs into every region assumes hardware failures, network partitions, even natural disasters. It doesn't assume coordinated attacks on multiple facilities within the same geographic area.
What actually happened:
- Multiple Availability Zones in a single region were impacted simultaneously
- AWS activated disaster recovery procedures and began workload migration
- Customers without cross-region failover experienced extended downtime
- The event sparked a global conversation about cloud infrastructure as critical infrastructure
What it means for your architecture: If your disaster recovery plan starts and ends with "we're in multiple AZs," this month proved that's not enough. Multi-AZ protects against a single data center going down. It doesn't protect against regional disruption — whether from conflict, natural disaster, or a really bad day at the power grid.
This isn't about fear-mongering. Most businesses will never need to worry about drone strikes on their cloud provider. But the same principle applies to hurricanes, earthquakes, regulatory changes, and prolonged regional outages. The question isn't "will something happen?" — it's "what's our tolerance for downtime, and does our architecture match that tolerance?"
The Good News: AWS Is More Resilient Than Ever
Here's where we balance the story, because the doom-and-gloom narrative misses important context.
S3 Turns 20 — And It's Remarkable
Amazon S3 launched on March 14, 2006. Twenty years later, it stores more objects than there are stars in the observable universe. That's not marketing copy — it's a real statistic.
S3's track record is genuinely impressive:
- 99.999999999% durability (eleven nines) — this means if you store 10 million objects, you can expect to lose one every 10,000 years
- The service has grown from simple object storage to an entire ecosystem: intelligent tiering, event notifications, access points, Object Lambda, S3 Express One Zone for single-digit millisecond latency
- It remains the foundation that half the internet runs on
What S3's longevity tells us: cloud infrastructure works. The Bahrain incident is an outlier, not the norm. Twenty years of S3 means twenty years of continuous improvement to the underlying infrastructure, network, and operational practices.
AWS + Cerebras: Redefining AI Inference Speed
AWS announced a collaboration with Cerebras to bring the CS-3 wafer-scale engine to AWS infrastructure. Built on the AWS Nitro System, this partnership combines Cerebras' raw inference speed with AWS's security, isolation, and operational consistency.
The architecture is clever: AWS Trainium handles the prefill phase (processing your prompt), and Cerebras CS-3 handles the decode phase (generating the response). It's a hybrid approach that plays to each chip's strengths.
Why this matters beyond the benchmarks: It signals that AWS is serious about offering heterogeneous compute — not just their own silicon, but best-in-class hardware from partners. For customers, that means more options, better price/performance ratios, and the ability to choose the right tool for the right workload without leaving the AWS ecosystem.
Multi-Region Is Becoming Easier
AWS now operates 34 regions with 108 Availability Zones. They're building in Malaysia, Thailand, Mexico, New Zealand, and more. Local Zones bring AWS services within single-digit millisecond latency to population centers. Wavelength Zones embed AWS at 5G network edges.
What this means practically: building multi-region architectures is getting cheaper and more accessible every year. Five years ago, running active-active across two regions was an enterprise luxury. Today, the tooling exists to make it feasible for mid-market businesses.
Building Regional Resilience: A Practical Framework
So how do you actually build resilience without over-engineering? Here's how we think about it:
Tier 1: The Basics (Everyone Should Do This)
- Multi-AZ deployments — this is table stakes, not a resilience strategy. RDS Multi-AZ, ECS/EKS across AZs, load balancers spanning zones.
- Automated backups to another region — S3 Cross-Region Replication, RDS cross-region read replicas, DynamoDB Global Tables.
- Infrastructure as Code — if your environment is defined in CloudFormation or Terraform, you can rebuild it anywhere. If it was hand-configured through the console, you can't.
- Document your recovery procedures — the middle of an outage is not the time to figure out how failover works.
Tier 2: Active-Passive (High-Value Workloads)
- Warm standby in a second region — reduced-capacity environment running continuously, ready to scale up.
- Route 53 health checks + failover routing — automatic DNS failover when your primary region goes down.
- Database replication — Aurora Global Database gives you sub-second replication across regions with automated failover.
- Regular failover testing — Game Days aren't optional. If you haven't tested failover, you don't have failover.
Tier 3: Active-Active (Business-Critical)
- Traffic distributed across regions — Route 53 latency-based or geolocation routing.
- DynamoDB Global Tables — multi-region, multi-active database with single-digit millisecond replication.
- Stateless application design — if your app can run anywhere, it can fail over anywhere.
- Conflict resolution strategy — active-active means concurrent writes, which means you need a plan for conflicts.
Tier 4: Multi-Cloud (Rare, But Real)
The Bahrain incident has renewed the multi-cloud conversation. Forbes and Gartner are both writing about cross-cloud continuity as the new standard for uptime.
Our take: multi-cloud is a valid strategy for a very specific set of organizations. If regulatory requirements demand provider independence, or if your risk tolerance is measured in zero-downtime SLAs, it's worth the complexity. For everyone else, multi-region within AWS gives you 95% of the resilience at 30% of the complexity.
What To Do This Week
If the Bahrain news made you uncomfortable, good — that discomfort is useful. Channel it into action:
- Audit your single-region dependencies. List every service that would go down if your primary region became unavailable. Be honest about the list.
- Set your RTO and RPO. Recovery Time Objective: how long can you be down? Recovery Point Objective: how much data can you lose? These numbers drive every architecture decision.
- Cost out a Tier 2 setup. A warm standby in a second region might cost less than you think — especially with reserved capacity and intelligent tiering.
- Run a tabletop exercise. Gather your team, declare "us-east-1 is gone," and walk through what happens. The gaps will reveal themselves quickly.
- Check your backups. Not "do we have backups" — "can we restore from them, and how long does it take?" If you've never tested a restore, it's not a backup. It's a hope.
The Bottom Line
The AWS Bahrain incident is a genuine inflection point for cloud architecture planning. But it's one data point in a twenty-year story of remarkable reliability and continuous improvement. S3's eleven nines of durability didn't happen by accident — it's the result of two decades of engineering discipline.
The cloud isn't fragile. But your architecture might be, if you assumed "deploy to us-east-1" was a complete strategy. Regional resilience isn't about paranoia — it's about building systems that match your actual risk tolerance.
The good news? The tools to build resilient architectures have never been better, cheaper, or more accessible. The question is whether you'll invest before the next wake-up call, or after.
---
Need help assessing your regional resilience posture? We help organizations audit their AWS architecture and build practical disaster recovery strategies that match their risk tolerance and budget. [Let's talk](/contact).