Every week we see another headline about an AWS data breach. And every time, the root cause is the same: not a sophisticated zero-day exploit, but a misconfigured S3 bucket, an overly permissive IAM policy, or credentials that should have been rotated six months ago. AWS provides an incredibly powerful set of security tools. The problem is that most organizations only use a fraction of them, and they configure the ones they do use incorrectly.
This guide is the checklist we wish every AWS customer had on day one. It covers the controls that matter most, the configurations that prevent real breaches, and the mistakes we see teams make over and over. Whether you are running a startup on a single account or managing a multi-account enterprise environment, these practices apply.
The Shared Responsibility Model
Before you configure a single security group, you need to understand what AWS secures and what you secure. This is not optional context. It is the foundation of every security decision you will make on the platform.
AWS is responsible for security ofthe cloud: the physical data centers, the hypervisor, the managed service infrastructure, the global network backbone. They handle hardware decommissioning, facility access controls, and network-level DDoS protection. This is the part you do not need to worry about, and it is covered by AWS's extensive compliance certifications (SOC 2, ISO 27001, FedRAMP, and dozens more).
You are responsible for security in the cloud: everything you put on top of AWS infrastructure. That means your IAM policies, your network configurations, your encryption settings, your application code, your operating system patches, and your data classification. If you misconfigure a security group and expose a database to the internet, that is your problem. AWS will not stop you and will not alert you unless you have specifically configured a service like GuardDuty or Config to do so.
Common Misconceptions
The biggest misconception we encounter is the belief that "AWS is secure, so my workloads are secure." AWS provides the building blocks for a secure environment, but assembly is entirely your responsibility. A second misconception is that managed services eliminate your security obligations. They reduce them, yes. With Lambda, you do not patch operating systems. With RDS, you do not manage database engine updates. But you still own IAM permissions, network access, encryption configuration, and data handling. The shared responsibility line shifts depending on the service type, but it never disappears entirely.
IAM Best Practices
Identity and Access Management is the single most important security control on AWS. If your IAM is weak, nothing else matters. A perfectly configured VPC is useless if an attacker has AdministratorAccess credentials. We have audited hundreds of AWS environments, and IAM misconfigurations are present in nearly every single one.
Enforce Least Privilege
Every IAM principal, whether a user, role, or service account, should have the minimum permissions required to perform its function. Start with zero permissions and add only what is needed. Use IAM Access Analyzer to identify unused permissions and generate least-privilege policies based on actual access patterns from CloudTrail logs. Review policies quarterly at minimum. Wildcard actions ("Action": "*") and wildcard resources ("Resource": "*") should be treated as defects that require immediate remediation.
MFA Enforcement
Multi-factor authentication must be required for every human user with no exceptions. Enable MFA on the root account immediately and store root credentials in a physical safe or hardware security module. For IAM users that still exist (more on eliminating these below), enforce MFA through IAM policies that deny all actions except MFA self-management until MFA is activated. For privileged operations like deleting S3 buckets, modifying IAM policies, or stopping CloudTrail, add MFA conditions directly to the IAM policy so that even authenticated users must re-verify.
IAM Identity Center (SSO)
Stop creating individual IAM users. Use AWS IAM Identity Center (formerly AWS SSO) to federate access through your organization's identity provider, whether that is Okta, Azure AD, Google Workspace, or another SAML 2.0-compatible provider. This gives you centralized user lifecycle management, consistent MFA enforcement, temporary credentials by default, and the ability to revoke all AWS access for an employee from a single location when they leave. For multi-account environments, Identity Center lets you manage permission sets that apply across accounts, eliminating the sprawl of per-account IAM users and policies.
Service Control Policies
If you are using AWS Organizations (and you should be), apply Service Control Policies (SCPs) as guardrails across all accounts. SCPs define the maximum permissions boundary for an entire account. Use them to prevent actions that should never happen in any account: disabling CloudTrail, deleting VPC Flow Logs, creating IAM users with console access in workload accounts, or launching resources in regions you do not use. SCPs are your safety net. Even if an administrator in a child account grants overly broad permissions, the SCP will prevent the prohibited actions from executing.
Access Key Management
Long-lived access keys are the leading cause of credential compromise on AWS. Eliminate them wherever possible. For applications running on EC2, use instance profiles with IAM roles. For Lambda, use execution roles. For ECS, use task roles. For CI/CD pipelines, use OIDC federation with your provider (GitHub Actions, GitLab CI, and others all support this natively). If you absolutely must use access keys, rotate them every 90 days, never embed them in source code, and use AWS Secrets Manager to store them. Run the IAM credential report regularly to identify keys that have not been rotated or used in over 90 days, and deactivate them.
Permission Boundaries
Permission boundaries let you delegate IAM administration safely. They define the maximum permissions that an IAM entity can have, regardless of what policies are attached to it. This is essential when you want developers to create their own roles and policies without risking privilege escalation. Attach a permission boundary to every role created by non-admin users, ensuring they cannot grant themselves permissions beyond what the boundary allows.
Network Security
A well-designed network is your second line of defense after IAM. Even if credentials are compromised, proper network segmentation can limit the blast radius of an incident. On AWS, network security starts with your VPC architecture.
VPC Design
Design your VPCs with segmentation in mind from day one. Separate workloads by sensitivity: production, staging, and development should not share a VPC. Use multiple availability zones for resilience, and plan your CIDR blocks to avoid overlaps if you will need VPC peering or Transit Gateway later. Each VPC should have a clear purpose and a documented network architecture diagram.
Public vs. Private Subnets
The majority of your resources should live in private subnets with no direct internet access. Only load balancers, bastion hosts, and NAT gateways belong in public subnets. Databases, application servers, and internal services should never have public IP addresses. If a resource in a private subnet needs outbound internet access (for software updates or external API calls), route traffic through a NAT gateway. If it only needs to reach AWS services, use VPC endpoints instead and avoid internet exposure entirely.
Security Groups vs. NACLs
Security groups are stateful firewalls attached to individual resources. Network ACLs (NACLs) are stateless firewalls attached to subnets. Use both, but understand their differences. Security groups should be your primary control: restrict inbound traffic to only the ports and source ranges required. Never leave port 22 (SSH) or 3389 (RDP) open to 0.0.0.0/0. Use Systems Manager Session Manager for shell access instead, which eliminates the need for open inbound ports entirely. NACLs serve as a coarse-grained backup: use them to deny known bad IP ranges or to block traffic patterns that should never occur regardless of security group rules.
VPC Endpoints and PrivateLink
VPC endpoints allow your resources to communicate with AWS services (S3, DynamoDB, KMS, CloudWatch, and dozens more) without traversing the public internet. Gateway endpoints are free for S3 and DynamoDB. Interface endpoints (powered by PrivateLink) cost money but keep all traffic on the AWS network. For security-sensitive workloads, VPC endpoints are non-negotiable. They eliminate an entire class of data exfiltration risk and reduce your exposure to man-in-the-middle attacks. Attach endpoint policies to further restrict which actions and resources can be accessed through the endpoint.
VPN and Direct Connect
For hybrid environments, use AWS Site-to-Site VPN or AWS Direct Connect to establish private connectivity between your on-premises network and AWS. Direct Connect provides dedicated, consistent bandwidth and does not traverse the public internet, making it the preferred option for sensitive workloads. For remote administrator access, use AWS Client VPN rather than exposing management interfaces to the internet. Combine VPN access with IAM Identity Center for authenticated, auditable remote access to your AWS environment.
Data Protection
Encryption is not optional. Every data store, every data transfer, and every backup in your AWS environment should be encrypted. The tools are there. The defaults are increasingly secure. But you still need to verify and configure them correctly.
Encryption at Rest
Use AWS Key Management Service (KMS) to manage encryption keys centrally. For most workloads, customer-managed KMS keys (CMKs) are the right choice because they give you full control over key policies, rotation schedules, and usage auditing through CloudTrail. Enable default encryption on every service that supports it:
- -S3: Enable default bucket encryption with SSE-KMS. Apply bucket policies that deny unencrypted uploads (deny PutObject where x-amz-server-side-encryption is absent).
- -EBS: Enable default EBS encryption at the account level in every region. This ensures every new volume and snapshot is automatically encrypted with no developer action required.
- -RDS and Aurora: Encryption must be enabled at instance creation. It cannot be added retroactively. If you have unencrypted instances, you must create an encrypted snapshot and restore from it.
- -DynamoDB: Encryption at rest is on by default, but with AWS-owned keys. Switch to customer-managed KMS keys for full audit visibility.
Encryption in Transit
Enforce TLS 1.2 or higher on all endpoints. Use AWS Certificate Manager (ACM) to provision and manage TLS certificates at no cost for use with ALBs, CloudFront, and API Gateway. Enforce HTTPS-only on S3 bucket policies by adding a condition that denies requests where "aws:SecureTransport" is false. For RDS, set the "rds.force_ssl" parameter to enforce encrypted connections. For internal service-to-service communication, use mutual TLS (mTLS) or service mesh encryption if your architecture supports it.
Key Rotation
Enable automatic key rotation for all customer-managed KMS keys. AWS supports automatic annual rotation, which creates new key material while retaining the ability to decrypt data encrypted with previous versions. For asymmetric keys or keys with stricter rotation requirements, implement manual rotation procedures with documented runbooks. Track key age and rotation status through AWS Config rules and set up CloudWatch alarms for keys approaching their rotation deadline.
S3 Bucket Policies and Access Controls
S3 is the most common source of accidental data exposure on AWS. Enable S3 Block Public Access at the account level. This is a single setting that prevents any bucket in the account from being made public, regardless of individual bucket policies or ACLs. Apply it to every account, including new ones as they are created. Beyond block public access, use bucket policies to enforce encryption, restrict access to specific VPC endpoints, and require MFA for delete operations. Use S3 Access Points to simplify access management for shared buckets with multiple consumers.
Detection and Monitoring
Prevention will fail eventually. When it does, your ability to detect, investigate, and respond depends entirely on the monitoring infrastructure you have in place. AWS offers a comprehensive detection stack, but you have to enable and configure every piece of it.
Amazon GuardDuty
GuardDuty is a managed threat detection service that analyzes CloudTrail logs, VPC Flow Logs, and DNS query logs to identify suspicious activity. Enable it in every account and every region. It detects compromised instances communicating with known command-and-control servers, unusual API calls from anomalous locations, cryptocurrency mining, S3 bucket enumeration, and credential exfiltration attempts. The cost is modest relative to the value, and there is no infrastructure to manage. If you only enable one detection service, make it GuardDuty.
AWS Security Hub
Security Hub aggregates findings from GuardDuty, Inspector, Macie, Firewall Manager, IAM Access Analyzer, and third-party tools into a single dashboard. It also runs automated compliance checks against frameworks like CIS AWS Foundations Benchmark and AWS Foundational Security Best Practices. Enable it in every account and designate a delegated administrator account to aggregate findings across your organization. Review the Security Hub score weekly and treat critical and high-severity findings as incidents that require immediate response.
AWS CloudTrail
CloudTrail records every API call made in your AWS environment. It is the backbone of your security audit trail. Enable a multi-region organization trail that delivers logs to a centralized, encrypted S3 bucket in a dedicated logging account. Enable data event logging for S3 and Lambda to capture object-level access and function invocations. Turn on CloudTrail Insights to detect unusual API activity patterns automatically. Enable log file validation so you can cryptographically verify that logs have not been tampered with. Retain logs for a minimum of one year in active storage, with lifecycle policies to transition older logs to S3 Glacier for long-term retention.
VPC Flow Logs
Enable VPC Flow Logs on every VPC, subnet, and critical ENI. Flow logs capture metadata about IP traffic: source, destination, ports, protocol, and whether the traffic was accepted or rejected. Deliver them to CloudWatch Logs for real-time analysis or to S3 for cost-effective long-term storage. Use flow logs to identify unexpected traffic patterns, detect port scanning, and investigate security incidents after the fact.
AWS Config Rules
AWS Config continuously monitors resource configurations and evaluates them against rules you define. Enable it in every account and region. Use the managed rules for common checks: S3 buckets must be encrypted, EBS volumes must be encrypted, security groups must not allow unrestricted SSH, RDS instances must be encrypted, CloudTrail must be enabled, and root account must have MFA. For custom requirements, write custom Config rules backed by Lambda functions. When a resource drifts out of compliance, Config flags it immediately, turning compliance from a periodic audit into a continuous process.
CloudWatch Alarms for Security Events
Set up CloudWatch metric filters and alarms for security-critical events: root account usage, console sign-ins without MFA, IAM policy changes, security group modifications, NACL changes, CloudTrail configuration changes, S3 bucket policy modifications, and failed authentication attempts. Route alarms to an SNS topic that notifies your security team via email, Slack, or PagerDuty. Do not just log events. Alert on them. A log that nobody reads provides zero security value.
Incident Response
You will have a security incident. The question is not whether, but when and how quickly you respond. A documented, tested incident response plan is the difference between a contained event and a catastrophic breach.
Preparation
Build your incident response capability before you need it. This means:
- -Pre-provisioning an incident response IAM role with the permissions needed to investigate and contain incidents (read access to CloudTrail, VPC Flow Logs, GuardDuty findings, and the ability to isolate resources).
- -Creating a dedicated incident response account in your AWS Organization where forensic analysis can be conducted in isolation.
- -Documenting escalation procedures, including who to contact, communication channels, and decision-making authority.
- -Running tabletop exercises at least annually to test your team's response to realistic scenarios.
Containment Procedures
When an incident is detected, containment must be immediate and automated where possible. For a compromised EC2 instance: isolate it by swapping its security group to one that allows no inbound or outbound traffic, take an EBS snapshot for forensics, and capture instance metadata. For compromised IAM credentials: deactivate the access keys immediately, revoke all active sessions by attaching an inline deny-all policy, and review CloudTrail for every action taken with those credentials. For a compromised S3 bucket: enable versioning if not already active, review access logs for data exfiltration, and restrict the bucket policy to deny all external access. Automate these runbooks using AWS Systems Manager Automation or Step Functions so containment happens in minutes, not hours.
AWS Incident Response Services
AWS provides several services that accelerate incident response. Amazon Detective automatically analyzes and visualizes data from CloudTrail, VPC Flow Logs, and GuardDuty findings to help you determine the root cause. AWS Security Hub centralizes findings and enables automated remediation through custom actions. For organizations on Enterprise Support, AWS provides access to the AWS Customer Incident Response Team (CIRT) for hands-on assistance during active security events. Consider pre-engaging a third-party incident response retainer so that expert help is a phone call away when you need it most.
Playbook Outline
Every incident response plan should follow a structured playbook:
- 1.Detection: Identify the incident through GuardDuty, Security Hub, CloudWatch alarms, or manual report. Classify severity (critical, high, medium, low).
- 2.Triage: Validate the finding. Determine scope: which accounts, resources, and data are affected.
- 3.Containment: Isolate affected resources. Revoke compromised credentials. Preserve evidence (snapshots, log exports).
- 4.Eradication: Remove the threat. Patch vulnerabilities. Rotate all potentially affected credentials.
- 5.Recovery: Restore services from clean backups. Monitor closely for recurrence.
- 6.Post-Incident Review: Conduct a blameless retrospective. Document root cause, timeline, and remediation steps. Update playbooks and controls.
Post-Incident Review
The post-incident review is where long-term security improvement happens. Within 72 hours of resolution, bring together everyone involved for a blameless retrospective. Document the full timeline, the root cause (not just the proximate cause, but the systemic factors that allowed it), what worked well in the response, and what needs improvement. Generate concrete action items with owners and deadlines. Track them to completion. The goal is not to assign blame but to ensure the same class of incident cannot happen again.
Common Mistakes
After years of security assessments across organizations of every size, these are the mistakes we encounter most frequently. Each one has contributed to real breaches, and most are straightforward to fix once identified.
1. Overly Permissive IAM Policies
This is the single most common finding in every security audit we perform. Developers grant AdministratorAccess "temporarily" and never revoke it. Lambda execution roles get "s3:*" on "*" because it is faster than figuring out the minimum permissions. Service roles accumulate permissions over years and nobody reviews them. Use IAM Access Analyzer to generate least-privilege policies, and treat overly broad permissions as critical findings.
2. Public S3 Buckets
Despite years of high-profile breaches and AWS adding multiple layers of protection, publicly accessible S3 buckets remain a persistent problem. Enable S3 Block Public Access at the account level and enforce it through SCPs. Use AWS Config rules to detect any bucket that becomes publicly accessible. If you genuinely need a public bucket (for static website hosting, for example), use a CloudFront distribution with an origin access control instead.
3. Unrotated Credentials
We regularly find access keys that have not been rotated in over a year, sometimes three or four years. These are ticking time bombs. If any of those keys were ever committed to a repository, logged in an error message, or stored in a configuration file on a compromised system, they are potentially in the hands of an attacker right now. Automate key rotation, eliminate long-lived keys wherever possible, and run the IAM credential report monthly.
4. Disabled or Incomplete CloudTrail
Some organizations disable CloudTrail to reduce costs. Others enable it in their primary region but not in all regions, leaving blind spots where attackers can operate undetected. CloudTrail is not optional. Enable a multi-region organization trail, turn on data event logging for sensitive resources, and protect the logging bucket with an SCP that prevents anyone from deleting or modifying it.
5. Single-Account Sprawl
Running everything in a single AWS account means a compromise in one workload can cascade to every other workload. There is no blast radius containment, no separation of duties, and no way to apply different security policies to different environments. Use AWS Organizations with a multi-account strategy: separate accounts for production, staging, development, security tooling, logging, and shared services. The overhead is minimal compared to the security benefit.
6. Ignoring Unused Resources with Broad Access
Orphaned EC2 instances, forgotten Lambda functions, unused IAM roles with admin permissions, and legacy security groups with wide-open rules. These resources accumulate over time and become attack vectors. Conduct quarterly reviews to identify and remove unused resources. Use AWS Config, Trusted Advisor, and IAM Access Analyzer to find resources and permissions that are no longer in active use. If it is not serving a purpose, it is serving as an attack surface.
The Bottom Line
AWS security is not a project you finish. It is an ongoing practice that requires continuous attention, regular reviews, and a culture that treats security as a first-class operational concern. The organizations that get breached are not the ones that failed to buy the right tools. They are the ones that failed to configure, monitor, and maintain the tools they already had.
Start with the fundamentals: lock down IAM, segment your networks, encrypt everything, enable detection across every account and region, and build your incident response capability before you need it. Then iterate. Review your security posture monthly. Run penetration tests annually. Update your playbooks after every incident. Security is not a destination. It is the way you operate.
If you are looking at this list and feeling overwhelmed, that is normal. Start with the highest-impact items: enable GuardDuty, enforce MFA, enable S3 Block Public Access, and review your IAM policies. Those four actions alone will dramatically reduce your risk. Then work through the rest methodically.