Simple Storage Service (S3)
Amazon S3 (Simple Storage Service) 101
Amazon S3 – Core Concepts
- AWS’s primary storage offering
- Uses an object-based storage model (not file storage and not block storage)
- Data is stored as objects, which are kept inside buckets (containers)
- Well-suited for storing large volumes of data such as videos, audio files, images, text, and other unstructured content
- Cost-efficient
- Can be accessed through the console, CLI, APIs, and HTTP(S)
- Designed as a publicly accessible service with virtually unlimited storage and support for multiple users
- Many AWS services rely on S3 as their default source or destination for data
- Uses an object-based storage model (not file storage and not block storage)
- S3 operates as a global service with regional data placement and redundancy
- Bucket names must be unique across all AWS accounts globally
- Actual data is stored within a specific region
- Data is automatically replicated across multiple Availability Zones in that region
- Because S3 uses object storage:
- It is not a file system, so you cannot navigate it like traditional directories
- Use Amazon EFS or Amazon FSx for file-based storage
- It is not block storage, so it cannot be mounted like a drive (e.g.,
C:\or/mnt)- Use Amazon EBS for block-level storage
- It is not a file system, so you cannot navigate it like traditional directories
S3 Objects

- Objects are similar to files, though technically different in structure
- Key components:
- Key → the unique identifier for an object within a bucket
- Example:
koala.jpg - Functions similarly to a filename
- Example:
- Value → the actual data stored in the object
- Size can range from 0 bytes up to 5 TB
- 5 TB is the maximum allowed size per object (fixed limit)
- Additional attributes include:
- Version ID
- Metadata
- Access Control List (ACL)
- Subresources
- Key → the unique identifier for an object within a bucket
- Every object must reside within a bucket and cannot exist independently
S3 Buckets

- An S3 bucket acts as a container that holds objects
- Buckets are created in a specific region, which ensures data residency control
- Data remains in that region unless explicitly configured otherwise
- Each bucket can store an unlimited number of objects, making S3 highly scalable
- Buckets are created in a specific region, which ensures data residency control
- Bucket naming:
- Serves as a globally unique identifier
- Example:
koaladata
- Example:
- Must be unique across all AWS regions and accounts
- This is why bucket ARNs do not include a region
- Naming rules:
- Length must be between 3 and 63 characters
- Only lowercase letters and numbers allowed (no underscores)
- Must begin with a letter or number
- Cannot resemble an IP address format
- Serves as a globally unique identifier
- Structure:
- S3 uses a flat architecture
- There are no real folders or directories
- All objects exist at the same level within a bucket
- The AWS console may display folders, but these are just prefixes
- Example:
/images/file.jpgappears as a folder, but/images/is only part of the object key
- Example:
- S3 uses a flat architecture

- Prefixes are used for organizing and filtering objects
- Bucket limits:
- Default (soft limit): 10,000 buckets per account
- This can be increased by requesting AWS support
- Limits affect system design decisions
- For example, instead of creating one bucket per user, prefixes can be used within a single bucket
- Default (soft limit): 10,000 buckets per account
- Security:
- Buckets are private by default
- A built-in setting blocks all public access unless explicitly disabled
- Even if this block is turned off, the bucket is not automatically public—it still requires proper configuration
S3 Security (Bucket Policies & ACLs)
Providing Access to S3
- Even though S3 is accessible over the internet, it is secure by default
- Endpoints are publicly reachable from a networking perspective
- However, buckets and objects are not accessible unless permissions are explicitly granted
- S3 security mechanisms include:
- Bucket policies → resource-based policies attached directly to buckets
- Access Control Lists (ACLs) → older permission model
- Block Public Access settings → built-in safety mechanism
- IAM identity policies → attached to users, groups, or roles
- These can grant or restrict access to S3 resources
- The effective permissions result from the combined evaluation of all policies and settings
S3 Bucket Policies

- A bucket policy is a resource-based policy applied to an S3 bucket
- It defines which principals are allowed or denied access
- Limitation of identity-based policies:
- They can only be assigned to identities within your own AWS account
- Advantages of resource-based (bucket) policies:
- Can grant or deny access to external AWS accounts
- Can allow access to unauthenticated users (public access)
- Can restrict access based on conditions such as:
- IP address ranges
- MFA requirements
- Other criteria
- The
"Principal"field specifies who the policy applies to:- Example:
"Principal": "*"means anyone (anonymous access) - This field is not needed in identity policies since the principal is implied
- Example:
- Bucket policy examples are available here:
S3 bucket policy examples
When to Use Identity vs Resource Policies
- Use identity policies when:
- Managing permissions across multiple AWS services
- The resource does not support resource-based policies
- You prefer centralized IAM-based control
- Access is limited within a single account
- Use resource policies when:
- You want to manage permissions directly on the resource (e.g., S3 bucket)
- You need cross-account or public access
- In many cases, both policy types are used together depending on requirements
S3 Access Control Lists (ACLs)
- ACLs are an older method of managing permissions in S3
- They are attached as a subresource to either a bucket or an object
- Apply to:
- A single object, or
- All objects within a bucket
- Supported permission types:
READ→ allows read accessWRITE→ allows write accessREAD_ACP→ allows viewing the ACLWRITE_ACP→ allows modifying the ACLFULL_CONTROL→ grants all permissions
- Limitations:
- Cannot target specific groups of objects (e.g., by prefix)
- Offer only basic permission controls
- Lack the flexibility and granularity of IAM or bucket policies
- ACLs are considered outdated and should generally be avoided
S3 Block Public Access Setting

- These settings act as a protective layer at the bucket level
- When enabled, they prevent any public (anonymous) access
- Public access refers specifically to:
- Requests from unauthenticated or anonymous users
- Authenticated AWS identities are not impacted by this restriction
- When Block Public Access is enabled:
- It overrides any bucket policy that would otherwise allow public access
- This feature is enabled by default:
- Introduced to reduce accidental exposure of data
- Ensures that public access must be intentionally configured, not accidental
S3 Static Website Hosting & Billing
S3 Static HTTP Website Hosting
- Standard interaction with S3 objects is done through APIs, which provide both security and flexibility
- The AWS Console and CLI internally rely on these APIs
- S3 Static Website Hosting enables objects to be accessed using HTTP requests
- This allows an S3 bucket to function as a static website host
- Commonly used for sites like blogs or personal portfolios
- Setup process is straightforward:
- Enable static website hosting on the bucket and define index and error documents (HTML files)
- Index document → the default page returned when no specific object is requested
- Error document → displayed when errors occur (e.g., 404 Not Found)
- A website endpoint is automatically generated
- This endpoint allows HTTP-based access to the bucket contents
- AWS provides a default endpoint format
- To use a custom domain, you must register and configure your own DNS
- The Block Public Access setting must be disabled for browser access
- Without using Amazon CloudFront, access is limited to HTTP only (no HTTPS support)
- Configure a custom domain using Amazon Route 53
- Example workflow:
- Register a domain such as
example.org - Create an S3 bucket with the same name
- Configure DNS records to point to the bucket endpoint
- Register a domain such as
- This setup can later be integrated with CloudFront for HTTPS support
- Example workflow:
- Enable static website hosting on the bucket and define index and error documents (HTML files)
S3 Static Website Hosting – Use Cases

- Content Offloading
- Move large static files (e.g., images, videos) from compute services to S3
- Reduces load on services like EC2 and lowers costs
- Example: store images in S3 and return their URLs from an application
- Out-of-Band Failover Pages
- Host backup or maintenance pages separately from the main application
- Ensures availability even if the primary compute service fails
- Example: redirect traffic to an S3-hosted status page during downtime
- Static Website Hosting
- Ideal for simple, non-dynamic websites such as blogs or portfolio pages
S3 Billing
- S3 is known for being very cost-effective, even beyond the Free Tier
- It is widely used due to its low pricing and scalability
- Pricing components include:
- Storage usage
- Cost depends on the storage class and amount of data stored
- Data transfer out of S3
- Charges apply when data is sent out of S3 (via API or HTTP access)
- Uploading data into S3 is free of charge
- Storage usage
- Free Tier (first 12 months) typically includes:
- Up to 5 GB of standard storage
- Up to 20,000 GET requests per month
- Up to 2,000 PUT requests per month
- Even after exceeding the Free Tier, S3 remains relatively inexpensive
- However, monitoring usage and costs is still important
S3 Object Versioning & MFA Delete
S3 Object Versioning
- S3 supports keeping multiple versions of an object under the same key
- This feature is configured at the bucket level
- Each version is assigned a unique version ID (
id)- You can reference a specific version by including its ID in requests
- If no ID is provided, operations act on the latest (current) version
- All versions consume storage space, which increases cost
- Storing multiple versions of one object is equivalent to storing multiple separate objects of the same size
- Versioning states:
- Disabled, Enabled, or Suspended
- When disabled, objects have a version ID of
null - Once versioning is turned on, it cannot be turned off—only suspended
- Suspending versioning:
- Does not delete existing versions
- Older versions remain stored and continue to incur charges
- Methods to remove older versions:
- Manually iterate through objects and delete each version individually
- Download current versions, delete the bucket, recreate it, and then re-upload data
- Both approaches are time-consuming and may result in additional cost
S3 Operations With and Without Versioning
- Behavior of certain operations changes depending on whether versioning is disabled, enabled, or suspended:
| S3 Object Versioning → // S3 Operation ↓ | DISABLED | ENABLED | SUSPENDED |
|---|---|---|---|
| Standard GET | Returns the object | Returns the most recent version | Returns the most recent version |
| Version GET | Not supported | Retrieves the specified version | Retrieves the specified version |
| Standard PUT (update object) | Replaces the existing object | Creates a new version and marks it as current | Replaces the current version |
| Standard DELETE | Permanently removes the object | Adds a delete marker, older versions remain | Adds a delete marker, older versions remain |
| Version DELETE | Not supported | Deletes a specific version | Deletes a specific version |
- Delete markers:
- Act as a placeholder that makes the object appear deleted
- Standard GET requests will not return the object
- The console hides the object unless version visibility is enabled
- Removing the delete marker restores access to the previous version
MFA Delete
- A bucket-level setting that adds an extra layer of protection
- Requires multi-factor authentication (MFA) to perform sensitive actions:
- Deleting a specific object version
- Changing the versioning state of the bucket
- Requires multi-factor authentication (MFA) to perform sensitive actions:
- MFA credentials (serial number and code) must be included in API requests for these operations
S3 Performance Optimization
Default S3 Single-Stream Uploads

- By default, the
s3:PutObjectoperation uploads data as a single continuous stream - Limitations:
- If the stream is interrupted, the entire upload must be restarted
- The larger the object, the higher the risk of failure
- Performance and reliability are constrained by using only one stream
- Maximum upload size for this method is 5 GB, even though S3 supports objects up to 5 TB
- If the stream is interrupted, the entire upload must be restarted
- Single-stream uploads are particularly inefficient for global applications

- Greater distance between users and the S3 region leads to:
- Slower transfer speeds
- Increased likelihood of failure due to unstable network conditions
- Technologies like BitTorrent were designed to address similar performance challenges in distributed networks
- To improve performance and reliability, two main approaches are used:
- S3 Multipart Upload
- S3 Transfer Acceleration
S3 Multipart Upload

- This method divides a large object into smaller parts and uploads them in parallel using multiple streams
- Typically used for objects 100 MB or larger
- Key characteristics:
- Objects can be split into up to 10,000 parts
- Each part can range from 5 MB to 5 GB (except the final part, which can be smaller)
- Benefits:
- Failed parts can be retried individually without restarting the entire upload
- Faster upload speeds due to parallel transfers
- More efficient use of available network bandwidth
- It is recommended to use multipart upload whenever possible for larger files
- The AWS Management Console enables this automatically for applicable uploads
S3 Transfer Acceleration

- Uses AWS edge locations to speed up data transfers to S3
- Instead of sending data directly over the public internet to the S3 region:
- The client connects to the nearest edge location
- Data then travels through the AWS global network, which is optimized for speed and reliability
- Instead of sending data directly over the public internet to the S3 region:
- Advantages:
- Improved upload performance, especially when users are far from the S3 bucket’s region
- Reduced latency and more consistent transfer speeds
- When enabled:
- A special accelerated endpoint is provided
- This endpoint automatically routes traffic to the closest edge location
- Requirements and limitations:
- Bucket names must be DNS-compliant
- Bucket names cannot include periods (
.)
- You can test performance differences using this tool:
S3 Transfer Acceleration speed comparison tool - Conceptual comparison:
- Public internet → flexible but indirect, with variable routing and delays
- AWS global network → optimized, direct, and high-performance infrastructure for faster data transfer
Encryption 101
Encryption – Key Concepts
- Encryption is the process of transforming data into a format that cannot be understood by unauthorized users
- Common approaches: encryption at rest and encryption in transit
- Main types: symmetric and asymmetric encryption
- Plaintext refers to data in its original, unencrypted form
- It can be any type of data (text, images, files, etc.), not just written text
- Ciphertext is the result of encryption
- The data appears scrambled and cannot be interpreted without the proper key
- A key is used to encrypt and decrypt data
- It can be something simple like a password or a complex string of characters
- An algorithm is the mathematical process used to perform encryption and decryption
- Plaintext + key → ciphertext (encryption)
- Ciphertext + key → plaintext (decryption)
- Examples include AES, DES, Blowfish, RC4, RC5, and RC6

Encryption Approaches

Encryption at Rest
- Protects data that is stored, such as on disks or in cloud storage
- Prevents unauthorized access if the storage medium is stolen or compromised
- Uses a secret (key) to secure stored data
- Typically involves only one party, responsible for protecting its own data
- In AWS, services like Key Management Service (KMS) are used for this purpose
Encryption in Transit
- Secures data while it is moving between systems
- Example: transferring data between a client and a server over HTTPS
- One side encrypts the data before sending, and the other decrypts it upon receipt
- Data travels through an encrypted channel, preventing interception from being useful
- AWS communications commonly use SSL/TLS for this type of protection
Encryption Types
Symmetric Encryption
- Uses a single shared key for both encryption and decryption
- Advantages:
- Fast and efficient in terms of processing
- Disadvantages:
- Sharing the key securely between parties is difficult
- Common use cases:
- Encrypting stored data
- Encrypting transmitted data after a secure channel is established

Asymmetric Encryption and PKI
- Uses a pair of keys:
- Public key → shared openly
- Private key → kept secret
- One key encrypts the data, and the other decrypts it
- Advantages:
- Enables secure communication without prior key exchange
- Public keys can be safely distributed
- Disadvantages:
- Slower and more resource-intensive than symmetric encryption
- Common use cases:
- Establishing secure channels (e.g., SSL/TLS, SSH)
- Exchanging symmetric keys securely
- Identity verification through digital signatures
- In practice:
- Asymmetric encryption is often used to securely share a symmetric key
- After that, symmetric encryption is used for efficient data transfer

Digital Signing
- Encryption alone does not confirm the sender’s identity
- Digital signatures are used to verify authenticity
- Process:
- A sender signs data using their private key
- The receiver verifies the signature using the sender’s public key
- This ensures:
- The message was created by the expected sender
- The content has not been altered

Steganography
- Used when the goal is to hide the existence of communication, not just protect its content
- Steganography involves embedding hidden data inside another file
- Example: hiding information within an image
- Key points:
- Observers may not even realize data is being transmitted
- Extraction usually requires specific knowledge or keys
- Can be combined with:
- Encryption (to protect the hidden data)
- Digital signatures (to verify authenticity)

Envelope Encryption
Envelope Encryption – Key Concepts
- Envelope encryption is the technique of encrypting encryption keys, providing multiple layers of security.
- Think of it like locking a key inside a vault that itself requires another key to open.
- Key Encryption Key (KEK) → encrypts/decrypts other keys
- Data Encryption Key (DEK) → encrypts/decrypts actual data
- Typically:
- Key service stores KEKs
- Customers store encrypted DEKs and encrypted data
- Customers request the key service to encrypt/decrypt DEKs
- Example: AWS Key Management Service (KMS) acts as the managed key service in AWS
- Benefits:
- Permission separation: storage admins can manage data storage without access to the data itself
- Less data sent to key service: DEKs are small compared to bulk data
- Isolated blast radius: each object can use a unique DEK, limiting exposure if a key is compromised
- Combines asymmetric flexibility and symmetric efficiency:
- KEKs can be asymmetric for flexibility
- DEKs are usually symmetric for speed
Envelope Encryption – Encrypt/Decrypt Processes
Scenario: Many cat pictures are stored in Amazon S3, and we want to encrypt them securely using AWS KMS.
- KMS keys can encrypt/decrypt up to 4 KB of data → suitable for DEKs, not large objects
- KMS keys are symmetric and stay inside KMS (never leave the service)
- Permissions for KMS are independent from S3 → having access to S3 objects doesn’t guarantee KMS access
Encryption Process
- Customer creates a KMS key for use with S3
- S3 requests KMS to generate a DEK for a cat picture
- KMS generates a DEK and returns:
- Plaintext DEK → used immediately by S3 to encrypt data
- Wrapped DEK (ciphertext version) → stored alongside encrypted object
- S3 encrypts the cat picture using the plaintext DEK, then discards the plaintext DEK
- Encrypted cat picture and wrapped DEK are stored together
- Repeat for all pictures
Why envelope encryption?
- Cat pictures are much larger than 4 KB → KMS cannot encrypt them directly
- Allows use of unique DEKs per object for best security practices
- Symmetric DEKs allow fast encryption; asymmetric would be slower
Decryption Process
- Customer requests access to an encrypted cat picture
- If the customer lacks KMS permissions, S3 cannot decrypt the object
- S3 sends the wrapped DEK to KMS to unwrap (decrypt) it
- KMS verifies permissions and identifies which KMS key (KEK) was used
- KMS decrypts the DEK and returns the plaintext DEK to S3
- S3 decrypts the cat picture using the plaintext DEK, then discards the DEK
- S3 returns the decrypted cat picture to the customer
This approach ensures security, efficiency, and granular access control.
AWS Key Management Service (KMS)
AWS KMS – Overview & Key Concepts
What KMS Does
- AWS Key Management Service (KMS) creates and manages cryptographic keys.
- Supports:
- Symmetric keys (basic use, fast encryption/decryption)
- Asymmetric keys (advanced use, slower but enables secure key exchange)
- Can perform cryptographic operations:
Encrypt,Decrypt,GenerateDataKey, etc.
Security & Isolation
- Keys never leave KMS; they’re stored in HSMs (Hardware Security Modules)
- FIPS 140-2 (Level 2) compliant
- Key material stays inside HSM
- AWS KMS ≠ CloudHSM → AWS has access to KMS keys, but no access to CloudHSM keys
- Keys are region-specific, though multi-region keys exist (out of SAA-C03 scope)
- KMS is a public service with public endpoints
Cryptographic Operations

| Operation | Description |
|---|---|
CreateKey | Creates a KMS key and stores it encrypted in HSMs |
Encrypt | Encrypts plaintext using a KMS key → returns ciphertext |
Decrypt | Decrypts ciphertext → returns plaintext (KMS key info embedded in ciphertext) |
GenerateDataKey | Generates a Data Encryption Key (DEK) for bulk encryption (>4 KB) |
KMS permissions are very granular → allows strict role separation
KMS Keys
- KMS key = logical container with metadata, backed by key material
- Old term: Customer Master Key (CMK)
- Can directly operate on data ≤ 4 KB
- Key material = actual bytes/numbers used in cryptographic operations
- Metadata includes:
- Key ID, Creation date, Key policy, Description, State (Enabled/Disabled)
Types of KMS keys
- AWS-owned (default, free)
- Used automatically by AWS services (S3, SQS, DynamoDB)
- Shared across accounts → low admin overhead, but limited security controls
- Customer-owned
- AWS-managed (free) → AWS creates/manages, rotates automatically yearly
- Customer-managed ($1/month) → customer creates and manages manually
- Supports aliases (unique per region)
- Supports automatic rotation (yearly) or on-demand rotation
- Can be used by AWS services or custom apps
Billing
- Customer-managed keys: $1/month + $0.03 per 10k API calls
- AWS-owned or AWS-managed keys: free
KMS Key Policies & Permissions
- Every KMS key has a key policy (resource policy)
- Can only be modified for customer-managed keys
- KMS does not automatically trust IAM users → trust must be explicitly added
- Access controlled by key policy + IAM policies
- Optional: grants (not in scope for SAA-C03)
Data Encryption Keys (DEKs)

- DEKs are generated by KMS for bulk encryption (> 4 KB)
- Key part of envelope encryption
kms:GenerateDataKeyreturns:- Plaintext DEK → used immediately, then discarded
- Ciphertext DEK → stored with encrypted object
- KMS never stores DEKs; they’re for client or AWS service use
- Can choose:
- Same DEK for multiple objects
- Unique DEK per object (best practice for isolated security)
Demo: Shell Commands
echo "find all the doggos, distract them with the yumz" > battleplans.txt# Encrypt file
aws kms encrypt \
--key-id alias/catrobot \
--plaintext fileb://battleplans.txt \
--output text \
--query CiphertextBlob \
| base64 --decode > not_battleplans.enc # Decrypt file
aws kms decrypt \
--ciphertext-blob fileb://not_battleplans.enc \
--output text \
--query Plaintext | base64 --decode > decryptedplans.txt
- Note:
kms:Decryptdoes not require explicitly specifying the KMS key; the key info is embedded in the ciphertext.
S3 Encryption
S3 Encryption In Transit
- S3 enforces encryption during data transfer
- S3 endpoints require HTTPS connections, ensuring that clients use SSL/TLS when sending or retrieving data.
S3 Object Encryption (Encryption At Rest)

- Data is encrypted on arrival at S3 endpoints before it is written to storage → this is encryption at rest.
- Encryption applies to objects, not buckets
- Each object can have different encryption settings.
- Buckets can have a default encryption configuration to automatically encrypt new objects.
- Two main approaches for encryption at rest:
- Client-Side Encryption (CSE): client provides ciphertext
- The client controls the encryption keys and process
- S3 acts purely as storage; no cryptographic operations are performed
- AWS never sees the unencrypted data or keys
- Suitable for strict compliance requirements
- Tradeoff: client is responsible for all encryption/decryption operations, which can be resource-intensive
- Server-Side Encryption (SSE): client provides plaintext
- S3 performs encryption/decryption operations
- Tradeoff: client has less direct control over encryption, but encryption is offloaded to S3
- Client-Side Encryption (CSE): client provides ciphertext
- AWS requires SSE for all objects
- Storing unencrypted objects is no longer allowed
- If using CSE, SSE must also be applied
S3 Server-Side Encryption (SSE)
- SSE types:
- SSE-S3: S3-managed keys (default)
- SSE-KMS: KMS-managed keys
- SSE-C: customer-provided keys
- Each type involves tradeoffs depending on who manages the keys and level of control required
SSE-S3 (AES-256)

- S3 manages keys and encryption operations
- Uses AES-256 encryption and is suitable for general-purpose workloads
- Envelope encryption: S3 Master Key generates a unique Data Encryption Key (DEK) for each object, which is stored alongside the encrypted object
- Advantages:
- Minimal administrative effort
- Strong AES-256 algorithm
- Limitations:
- Limited control over keys
- No auditing/logging of key usage available to customer
- Role separation not supported (any S3 admin can decrypt all objects)
- Not recommended for high-security environments (e.g., finance, healthcare)
SSE-KMS

- S3 performs encryption/decryption using customer-owned KMS keys
- KMS manages the keys; S3 requests use of keys for encryption operations
- KMS key options:
- AWS-managed: automatic yearly rotation, less configurable
- Customer-managed: flexible management, manual or automatic rotation possible
- Advantages:
- Key usage can be logged and audited (e.g., via CloudTrail)
- Supports role separation: S3 admins cannot decrypt objects they did not create
- Allows use of other encryption algorithms if required
- Limitations:
- Higher administrative overhead compared to SSE-S3
- Keys remain hosted within AWS
SSE-C

- S3 encrypts objects using customer-provided keys
- Clients must supply the key for each operation
- KMS is not involved
- S3 performs the encryption operation and discards the key
- Key hash is stored with the encrypted object to verify future decryption attempts
- Advantages:
- Keys are never stored in AWS
- Supports role separation: only the client can access objects
- Offloads encryption work from client while maintaining key control
- Limitations:
- More administrative effort than SSE-KMS
- AWS performs encryption operations with your key temporarily; if complete separation is required, only CSE ensures no AWS exposure
Summary Table for S3 Encryption At Rest
| S3 Encryption Method | Key Management | Cryptographic Processing | Notes / Extras |
|---|---|---|---|
| SSE-S3 | Managed by S3 | Encryption/decryption handled by S3 | – No control over keys – No role separation |
| SSE-KMS | Managed by KMS (AWS or customer) | S3 performs crypto operations using KMS keys | – Key rotation control – Supports role separation and auditing |
| SSE-C | Provided by customer | S3 uses customer key for crypto ops | – Keys never stored in AWS – Supports role separation |
| CSE (Client-Side Encryption) | Customer | Encryption/decryption handled by client | – S3 only stores ciphertext – AWS never sees plaintext |
S3 Bucket Keys
Scaling Challenges with S3 SSE-KMS
- In standard SSE-KMS, each object upload requires generating a unique DEK via
kms:GenerateDataKey→ one KMS API call per object.- This can become costly at high upload rates (e.g., tens of thousands of objects per second).
- The
kms:GenerateDataKeyoperation has throttling limits.- Max PUTs per second per KMS key is finite, which can constrain large-scale workloads.
S3 Bucket Keys for SSE-KMS
- Bucket keys are temporary keys created by a KMS key to generate DEKs for all objects in a bucket.
- This reduces KMS API calls and lowers costs at scale.
- Benefits:
- Significantly fewer KMS calls: only one call to create the bucket key; subsequent DEKs are generated locally by S3.
- CloudTrail logging is consolidated per bucket: object-level DEK generation is no longer logged, reducing log volume.
- Limitations:
- Not retroactive: existing objects encrypted before enabling bucket keys won’t benefit.
- Compatibility:
- Works with S3 Replication (SRR and CRR).
- Encryption configuration is preserved in the target bucket.
- ETag nuance: replicated objects may have changed ETags if bucket keys are used, but this is mostly irrelevant now that plaintext storage is disallowed.
S3 Object Storage Classes
Overview of S3 Storage and Classes
- S3 provides cost-effective storage and multiple storage classes to optimize costs and performance.
- Each class represents a tradeoff between storage cost, retrieval speed, and data redundancy.
Key S3 characteristics:
- 11 nines of durability (99.999999999%)
- For 10 million objects, expect 1 object loss every 10,000 years.
- PUT confirmation (
HTTP/1.1 200 OK) ensures durable storage.
- Checksums (Content-MD5 & CRCs) detect and repair data corruption.
- Data replication across at least 3 AZs ensures regional resilience.
- Exception: One-Zone Infrequent Access (1Z-IA).
- Low first-byte latency (milliseconds) for most classes, except archival.
- Objects can be made public via URLs or static website hosting.
Billing overview:
- Storage fee: $/GB per month, depends on storage class.
- Data transfer OUT: ~$0.01/GB; transfer IN is free.
- Request fee: per 1,000 requests.
- Additional fees may apply for retrieval, minimum duration, or minimum object size depending on class.
S3 storage classes:
- “Warm” storage (frequent or infrequent access):
- Standard, Standard-IA, One-Zone-IA
- “Cold” storage (archival):
- Glacier Instant Retrieval (IR), Glacier Flexible Retrieval (FR), Glacier Deep Archive (DA)
- Intelligent-Tiering:
- Automatically moves objects between tiers based on access patterns.
S3 Standard (Default)

- Default class for frequently-accessed, critical data.
- Balanced in features and cost: no retrieval fees, no minimum storage duration, no minimum object size.
- Suitable for most workloads requiring high durability and low latency.
S3 Standard-Infrequent Access (IA)

- About half the storage cost of Standard, but incurs retrieval fees and minimum charges.
- Designed for data accessed rarely (~1x/month) but still important.
- Minimum duration: 30 days; minimum object size: 128 KB.
- Good for cost-efficient storage of long-lived, infrequently-accessed data.
S3 One Zone-Infrequent Access (1Z-IA)

- Lower storage cost than Standard or Standard-IA.
- Stores data in only one AZ (no multi-AZ replication).
- Suitable for replaceable, infrequently-accessed data.
- Same retrieval fee, minimum duration (30 days), and minimum object size (128 KB) as Standard-IA.
- Use cases: secondary copies, intermediate processing data, or data that can be regenerated.
S3 Glacier Instant Retrieval (IR)

- Cheaper than Standard-IA with higher retrieval cost.
- Minimum storage duration: 90 days.
- Designed for rarely-accessed, irreplaceable data.
- Provides millisecond first-byte latency, making it suitable for data that requires instant access.
S3 Glacier Flexible Retrieval (FR)

- About 1/6th the cost of Standard, intended for archival purposes.
- Cold storage: objects are not immediately accessible; only metadata is stored in S3.
- Retrieval options: Expedited (1–5 mins), Standard (3–5 hrs), Bulk (5–12 hrs).
- Minimum size: 40 KB; minimum duration: 90 days.
- Best for long-term archival data accessed infrequently.
S3 Glacier Deep Archive (DA)

- Lowest cost storage, for data rarely accessed.
- Objects stored in a “frozen” state; retrieval can take hours to days.
- Minimum size: 40 KB; minimum duration: 180 days.
- Suitable for regulatory compliance and secondary backups, not primary backups.
- Retrieval types: Standard (12 hrs), Bulk (up to 48 hrs).
S3 Intelligent-Tiering

- Automatically moves objects between storage tiers based on access frequency.
- Reduces administrative overhead for long-lived data with changing or unknown access patterns.
- Tiers: Frequent Access, IA, Archive-IA, Archive-Access, Deep Archive.
- Bottom two tiers do not provide instant retrieval; data is retrieved asynchronously.
- Monitoring and migration incur a small management fee per 1,000 objects.
- No retrieval fee for accessing objects.
S3 Lifecycle Configuration
S3 Lifecycle Management, Configuration, and Rules

- S3 Lifecycle Rules allow automatic transition or expiration of objects/versions in a bucket after a defined period, helping to optimize costs for large buckets.
- Rules consist of conditions and actions, and they can apply to a whole bucket or specific groups of objects (defined by prefix or tags).
Action Types
- Transition Actions
- Move objects to a different storage class after a specified time.
- Example: Move objects to Glacier-IR 30 days after creation.
- Expiration Actions
- Automatically delete objects or previous versions.
- Example: Delete previous versions on the first of each month.
Important: Conditions are not based on object access or usage.
- For usage-based management, rely on S3 Intelligent-Tiering instead.
- Lifecycle rules are ideal for objects with predictable, time-based behavior.
Transition Flow
- S3 transitions follow a waterfall model:
Std → Std-IA → Intelligent-Tiering → 1Z-IA → Glacier-IR → Glacier-FR → Glacier-DA
- Objects can only move downward/right in this hierarchy.
- You cannot automatically move objects upward/left (e.g., Std-IA → Std).
- Manual reclassification is always possible via Console, CLI, or API.
Exceptions:
- Cannot transition directly from 1Z-IA to Glacier-IR.
- Usually, objects progress sequentially, but skipping tiers is possible.
Key Considerations
- Cost for small objects: Transitioning small objects from Std → IA or Intelligent-Tiering may increase storage cost.
- Minimum duration rules:
- Objects must remain in Standard at least 30 days before moving to Std-IA, 1Z-IA, or Intelligent-Tiering via lifecycle rules.
- If the same rule transitions from Std → IA and then IA → Glacier, objects must stay in IA for an additional 30 days before moving to Glacier.
- Using multiple rules can bypass this restriction.
- Lifecycle rules are most effective for objects with consistent time-based patterns, not for access-based patterns.
S3 Replication
S3 Replication – Overview
- S3 replication automatically copies and syncs objects from a source (SRC) bucket to a destination (DST) bucket.
Replication Types:
- By region:
- Same-Region Replication (SRR) – SRC and DST in the same region.
- Cross-Region Replication (CRR) – SRC and DST in different regions.
- By account:
- Same-Account Replication – both buckets in the same account.
- Cross-Account Replication – buckets in different accounts.
Use Cases:
- SRR:
- Aggregate logs or audit data into one bucket.
- Sync data between TEST & PROD environments.
- Improve resilience while keeping data within a single region.
- CRR:
- Provide disaster recovery in a different region.
- Reduce latency by placing data closer to end-users.
S3 Replication – Architecture
- S3 needs an IAM role with permission to read from SRC and write to DST.
- Same account: DST bucket automatically trusts the role.
- Cross-account: DST bucket must explicitly trust the role from the external account via bucket policy.
S3 Replication – Features
- Replicate all objects (default) or a subset (filtered by prefix or tags).
- Can select the storage class for replicated objects in DST (default = same as SRC).
- Ownership:
- Default: replicated objects are owned by SRC account.
- For cross-account replication, you may need to change ownership to DST account.
- Replication Time Control (RTC):
- 15-minute SLA, disabled by default.
- Provides metrics and faster replication but incurs extra cost.
S3 Replication – Considerations & Limitations
- One-way replication by default; bi-directional is possible.
- Not retroactive: only replicates new objects by default; use Batch Replication for existing objects.
- Versioning required on both SRC and DST buckets.
- Works with all encryption types (SSE-C, SSE-S3, SSE-KMS), but SSE-KMS requires extra configuration.
- SRC account must have access to all objects for replication.
- Only user-generated events are replicated; lifecycle transitions are not replicated.
- Objects in Glacier-FR or Glacier-DA are not replicated.
- Deletes are not replicated by default; enable
DeleteMarkerReplicationif needed. - No replication chaining: objects from bucket 1 → bucket 2 do not automatically replicate to bucket 3.
S3 Presigned URLs
Limitations of Anonymous S3 Access to Private Resources
- S3 objects can typically be accessed using:
- IAM users – require authentication and authorization with long-term credentials.
- IAM roles – require assuming a role and using temporary credentials.
- Public buckets – allow anonymous access without authentication.

- Challenge:
- How to share private or sensitive objects without requiring users to authenticate to AWS?
- IAM-based access requires identity management and may not provide a smooth user experience.
- Making the bucket public is not acceptable for restricted content.
- Solution: Presigned URLs
S3 Presigned URL – Architecture

- Presigned URLs allow temporary access to private S3 objects using the permissions of an IAM identity.
How it works:
- An IAM user or role calls
generatePreSignedURLand provides:- Credentials
- Expiration time
- Object key
- Operation type (GET for download or PUT for upload)
- S3 returns a presigned URL, which can be shared with external users.
- The recipient uses the URL to access S3 without direct authentication.
- The request is executed as if performed by the IAM identity that generated the URL.
- Authorization data is embedded in the URL.
- Actions performed are logged under that IAM identity.
- Security:
- The URL is valid only for a limited time and expires automatically.
Common Use Cases of Presigned URLs
- Secure sharing of private content
- Applications can provide temporary access to uploaded or processed files.
- Offloading uploads/downloads to S3
- Clients interact directly with S3 instead of routing data through application servers.

- Serverless architectures
- Avoid running backend servers just to manage access to S3 objects.
Presigned URL – Important Behaviors
- Avoid using IAM roles to generate presigned URLs when possible:
- The URL becomes invalid when the role’s temporary credentials expire, which may happen before the URL’s configured expiration.
- Permissions are evaluated at request time, not at creation time:
- If the IAM identity loses access after generating the URL, the URL will no longer work.
- A presigned URL can be generated even if:
- The IAM identity does not currently have access to the object.
- The object does not exist yet.
- The IAM identity has no S3 permissions at all.
AccessDenied). - Error behavior:
AccessDenied→ IAM identity lacks required permissions.NoSuchKey→ Object does not exist.
S3 Select and Glacier Select
S3/Glacier Select – Overview

- Retrieving large objects from S3 can be inefficient:
- Downloading very large files (e.g., multiple TBs) takes significant time.
- Data transfer costs apply to the entire object size.
- Performing filtering on the client side is not effective:
- The full object must still be downloaded before filtering.
- This results in unnecessary time and cost.
- S3 Select and Glacier Select allow retrieval of only specific portions of an object instead of downloading the entire file.
- Filtering is performed server-side using SQL-like expressions.
Key Features
- Improves performance and cost efficiency:
- Can be up to 4× faster and reduce costs by up to 80% compared to client-side filtering.
- Supported formats:
- CSV
- JSON
- Parquet
- Compression support: BZIP2 (for CSV and JSON)
- Flexible solution for processing structured data stored in S3 or Glacier.
- Not enabled by default and must be explicitly configured before use.
S3 Events
S3 Events – Architecture

- S3 Event Notifications allow you to receive alerts when specific actions occur in a bucket.
- Examples include object creation and deletion.
- Commonly used to build event-driven architectures (EDA).
- When an event occurs, S3 sends a JSON message to the configured destination.
- Configuration is done by updating the bucket’s notification subresource:
Configuration Components
- Event destinations:
- AWS Lambda
- Amazon SQS
- Amazon SNS
- Each destination must have a resource policy that allows S3 to send events.
- Event types:
- Object creation (
Put,Post,Copy,CompleteMultipartUpload) - Object deletion (
Delete,DeleteMarkerCreated) - Object restore (from Glacier storage classes)
- Replication events (e.g., success, failure, threshold delays)
- Object creation (
Key Consideration
- S3 Event Notifications is a basic and older mechanism for event handling.
- Amazon EventBridge is generally preferred for modern event-driven designs because it:
- Supports a wider range of event sources
- Provides more flexible routing and filtering
- Integrates with a larger set of targets
S3 Access Logs
S3 Server Access Logging – Architecture

- S3 Server Access Logging provides detailed records of requests made to a bucket and its objects, helping improve visibility and auditing.
- Logs are delivered to a separate target bucket.
Log Structure
- Log files contain multiple log records, each on a new line.
- Each record includes fields such as date, time, request type, and status code.
- Fields are space-delimited, similar in format to Apache access logs.
Logging Process
- Logging is handled by the S3 Log Delivery Group, an AWS-managed background service.
- Log delivery operates on a best-effort basis and is not real-time.
- Delivery may take several hours.
Configuration
1. Source Bucket
- Enable logging using:
- AWS Management Console, or
PUT Bucket Loggingvia CLI/API
- Specify:
- Target bucket
- Optional prefix to organize logs
- A single target bucket can store logs from multiple source buckets using different prefixes.
2. Target Bucket
- Must grant write permissions to the S3 Log Delivery Group (typically via bucket ACL).
Important Considerations
- Log lifecycle is not managed automatically:
- You should configure lifecycle rules to transition or delete old logs.
- Logs are intended for analysis, not real-time monitoring.
Common Use Cases
- Auditing and security analysis
- Analyzing access patterns and usage trends
- Investigating unexpected S3 cost changes
S3 Object Lock
S3 Object Lock – Key Concepts
- S3 Object Lock is used to protect object versions from being deleted or overwritten, either temporarily or permanently.
- Enables a Write-Once-Read-Many (WORM) model.
- Can be applied to individual objects or set as a default at the bucket level.
- Two main mechanisms (can be used together):
- Legal Hold
- Retention Period (Governance mode or Compliance mode)
Important Considerations
- Versioning must be enabled, since Object Lock applies to object versions.
- Best enabled during bucket creation.
- Enabling it on an existing bucket requires contacting AWS Support.
- Once enabled:
- Object Lock cannot be disabled
- Versioning cannot be turned off
S3 Object Lock – Legal Hold
- Each object version has a boolean flag indicating whether a legal hold is active.
- No expiration is associated with this lock.
- The lock remains until it is manually removed.
- Operation used:
s3:PutObjectLegalHold
Use Cases
- Prevent accidental deletion or modification of specific object versions.
- Mark certain versions as critical or under legal review.
S3 Object Lock – Retention Period
- Locks an object version for a defined time period (in days or years).
- During this period, the object version cannot be modified or deleted.
- After expiration, normal operations are allowed again.
- Configured using
put-object-lock-configurationwith a required mode.
Retention Modes
1. Governance Mode
- Provides protection but allows authorized users to override the lock.
- Requires special permission:
s3:BypassGovernanceRetention. - Requests must also include the header:
x-amz-bypass-governance-retention:true
- Note:
- The AWS Management Console automatically includes this header, so users with permission can bypass the lock through the UI.
Use Cases:
- Prevent accidental deletion or modification via CLI/SDK.
- Testing retention policies before enforcing stricter controls.
2. Compliance Mode
- Provides strict, non-bypassable protection for the duration of the retention period.
- Object versions cannot be modified, deleted, or unlocked, even by the root user.
Use Cases:
- Regulatory or legal requirements requiring fixed data retention (e.g., financial or medical records).
S3 Access Points
Scaling Limitations for Granular Configurations in S3 Buckets
- Large S3 buckets may require fine-grained access control across different teams, applications, or use cases.
- Relying on bucket policies alone does not scale well:
- Policies can become overly large and complex.
- Managing multiple overlapping permissions becomes difficult.
- Increased risk of misconfiguration due to policy complexity.
- A more scalable approach is to use S3 Access Points, assigning one per team, application, or use case.
S3 Access Points – Architecture
- S3 Access Points provide logical views into a bucket, allowing access to specific subsets of objects.
- Each access point has its own DNS endpoint and independent access controls.
- Benefits:
- Simplifies management of large buckets.
- Enables separation of access by team or application.
- Creation:
- CLI:
aws s3control create-access-point --name <name> --account-id <id> - Can also be created via the AWS Management Console.
- CLI:
Access Point Policies
- Each access point has its own policy that controls access to its specific subset of objects.
- Similar to a bucket policy, but limited to that access point’s scope.
- Recommended design:
- Use the bucket policy to allow broad access, typically requiring access through access points.
- Use access point policies for fine-grained permission control.
- This approach:
- Reduces overall policy complexity.
- Enables delegation of access management to different teams.
Network Access Control
- Access points can be configured to allow access only from a specific VPC.
- Access is enforced using a VPC Endpoint (VPCE) and its associated policy.
LAB: S3 Multi-Region Access Point (MRAP)
S3 Multi-Region Access Point (MRAP) – Overview
- S3 MRAP provides a single global endpoint that automatically routes requests to the nearest available bucket.
- Key characteristics:
- Combines multiple S3 buckets across different regions.
- Routes requests (GET, PUT, etc.) based on lowest network latency.
- Once created, buckets cannot be added or removed from the MRAP.
- Replication support:
- Buckets can be configured to replicate data one-way or bidirectionally.
- Failover capability:
- Traffic is directed to active buckets.
- If a region becomes unavailable, requests are routed to a failover bucket in another region.
Stage 1 – Create an MRAP
- Create two S3 buckets in different regions (e.g., Canada and Sydney).
- Enable versioning on both buckets.
- In S3:
- Navigate to Multi-Region Access Points → Create
- Add both buckets to the MRAP
- Use default settings for the remaining configuration
- Notes:
- MRAP setup typically takes under 30 minutes, but can take up to 24 hours.
Stage 2 – Configure Replication
- Open the MRAP → Replication and failover → Create replication rules
- Configuration:
- Select replicate across all buckets
- Apply replication to all objects
- Result:
- Objects uploaded to one bucket are automatically replicated to the other bucket(s)
Stage 3 – Test the MRAP
Preparation
- Open each bucket in separate tabs
- Keep the MRAP ARN available
- Use AWS CloudShell in different regions
Test Process
- Create a test file: dd if=/dev/urandom of=<file_name> bs=1M count=10
- Upload to MRAP: aws s3 cp <file_name> s3://<MRAP_ARN>
- Observe which bucket receives the file first
Test Scenarios
- From Tokyo:
- File is stored first in the Sydney bucket (closest region)
- Then replicated to Canada
- From Ohio:
- File is stored first in the Canada bucket
- Not immediately available in Sydney
- Eventually replicated
- From Mumbai:
- Initial bucket selection may vary
- Depends on latency and real-time network conditions
Key Insight
- Replication across regions is not instantaneous.
- Applications must be designed to handle replication delay in global architectures.
Stage 4 – Cleanup
- Delete the MRAP
- Empty and delete the associated S3 buckets
Network Storage and Data Lifecycle
Amazon EFS (Elastic File System)
Amazon EFS (Elastic File System) – Architecture

- EFS is AWS’s implementation of the NFSv4 protocol, used for shared file storage.
- Key characteristics:
- Provides a network-based file system that can be mounted by one or more Linux EC2 instances.
- Enables storage to be decoupled from compute, allowing instances to remain stateless.
- Supports concurrent access from multiple instances.
- Access model:
- EFS is a private service, accessed via mount targets within VPC subnets.
- Can be extended to on-premises environments using VPN, Direct Connect, or VPC peering.
- For high availability, mount targets should be deployed across multiple Availability Zones.
- Data persistence:
- Files are stored independently of EC2 instances.
- Data remains intact even if instances are terminated and recreated.
- Compatibility:
- Supported only on Linux-based systems.
- For Windows environments, use Amazon FSx for Windows File Server instead.
- File system structure:
- Uses a hierarchical directory model typical of Linux systems.
- Can be mounted to paths such as
/mnt/dataor/nfs/media.
- Permissions:
- Uses POSIX permissions, ensuring compatibility across Linux distributions.
EFS Performance Settings
Throughput Modes
- Bursting (Default)
- Throughput increases as the file system grows in size.
- Uses a burst credit model similar to GP2 EBS volumes.
- Enhanced Throughput Modes
- Elastic Throughput
- Automatically adjusts based on workload demand.
- Suitable for unpredictable or variable access patterns.
- Provisioned Throughput
- Allows specifying throughput independently of storage size.
- Useful for consistent, high-performance requirements.
- Elastic Throughput
Performance Modes
- General Purpose (Default)
- Optimized for low latency.
- Suitable for web applications, CMS platforms, and general file sharing.
- Max I/O
- Designed for high throughput and parallel workloads.
- Trades lower latency for increased scalability.
- Common in analytics or large-scale processing workloads.
EFS Storage Settings
Storage Classes
- Standard
- Intended for frequently accessed data.
- Infrequent Access (IA)
- Lower cost for less frequently used files.
- Archive
- Lowest-cost tier for rarely accessed data.
- Lifecycle policies can automatically transition files between storage classes.
- Files can return to the Standard tier when accessed.
Additional EFS Features
- Availability options:
- Data can be stored across multiple AZs or confined to a single AZ.
- Backup integration:
- Supports automated backups through AWS Backup.
- Encryption:
- Supports encryption using AWS KMS.
- Access requires permissions for both EFS and the associated KMS key.
DEMO: Using EFS with EC2 WordPress Instances
Demo: Implementing a Basic EFS
- In this demo, an EFS file system is created within a custom VPC.
- Use the same configuration settings demonstrated in the reference material.
- Best practice:
- Deploy a mount target in each Availability Zone where EFS will be accessed.
- Mount targets and security groups:

- Each mount target must have an associated security group.
- In this setup, all mount targets share the same security group, allowing connectivity with other resources.
Configuring an EC2 Instance to Use EFS
- The following commands are executed on a Linux EC2 instance:
df -k- Displays all currently mounted file systems and available disk space.
- Initially confirms that no EFS file system is mounted.
sudo mkdir -p /efs/wp-content- Creates the directory structure for mounting EFS.
- The
-poption ensures intermediate directories are created if needed.
sudo dnf -y install amazon-efs-utils- Installs required utilities for interacting with EFS.
- The
-yflag automatically approves installation prompts.
sudo nano /etc/fstab- Opens the file that defines file systems to mount at boot.
- Add the following entry at the end: <file-system-id>:/ /efs/wp-content efs _netdev,tls,iam 0 0
- Replace
<file-system-id>with the actual EFS ID.
sudo mount /efs/wp-content- Mounts the file system based on the fstab configuration.
df -k- Verifies that the EFS file system is now mounted.
- After mounting:
- Files created in
/efs/wp-contentare stored in EFS. - Other EC2 instances connected to the same EFS can access the same data.
- This enables a shared network file system across instances.
- Files created in
WordPress Architecture with EFS
- This design shifts from a monolithic setup to a scalable and resilient architecture:
- Key changes:
- WordPress media is no longer stored on the instance’s local file system.
- Data is centralized:
- Database → stored in RDS
- Media/content → stored in EFS
- Benefits:
- Multiple EC2 instances can access the same shared content.
- Supports horizontal scaling, allowing the number of instances to grow or shrink based on demand.
- Mount location:
- EFS is mounted at: /var/www/html/wp-content
- This directory stores WordPress uploads and media.
- CloudFormation note:
!ImportValueis used to reference values exported from other stacks.
AWS Backup 101
AWS Backup – Basic Concepts

- AWS Backup is a fully managed service for data protection, supporting both backup and restore operations.
- Includes centralized management, monitoring, and auditing features.
- Provides a single interface to manage backups across environments:
- Supports cross-region backups for improved resilience.
- Supports cross-account backups, often integrated with AWS Organizations or Control Tower.
- Supports a wide range of AWS services, allowing both:
- Backup storage management
- Backup policy configuration
AWS Backup – Key Components
Backup Plans
- Define how backups are created and managed:
- Frequency
- Determines how often backups run (e.g., hourly, daily, weekly).
- Can use CRON expressions.
- Some services support continuous backups for Point-In-Time Recovery (PITR).
- Backup Window
- Specifies when backups start and how long they can run.
- Lifecycle Rules
- Control when backups transition to cold storage or expire.
- Backups moved to cold storage must remain there for at least 90 days.
- Backup Vault
- Defines where backups are stored.
- Cross-Region Copy
- Allows automatic replication of backups to another region.
- Frequency
Backup Resources
- Identify the AWS resources included in a backup plan.
- Examples: S3 buckets, RDS databases, EBS volumes.
Backup Vault
- Acts as the storage container for backups.
- Key characteristics:
- At least one vault is required.
- Encrypted using a KMS key.
- By default, backups can be modified or deleted.
- Vault Lock (WORM protection):
- Enables Write-Once-Read-Many behavior for compliance use cases.
- Includes a 72-hour grace period before becoming fully enforced.
- After activation:
- Backups cannot be deleted or altered before retention expires.
- Even AWS cannot override the lock.
- Note:
- Vault Lock is separate from S3 Object Lock and Glacier storage features.
On-Demand Backups
- Allow manual backup creation outside of scheduled plans.
- Useful for ad hoc or pre-change backups.
Point-In-Time Recovery (PITR)
- Available for supported services such as RDS and S3.
- Enables restoring data to a specific moment within the retention period.
Key Takeaways
- Centralizes backup management across accounts and regions.
- Supports both scheduled and manual backups.
- Provides compliance features like Vault Lock.
- Enables fine-grained recovery through PITR.
SQL Databases & RDS
Database Models
Databases – Basic Concepts
- A database (DB) is a system designed to store, organize, and process data.
- It is important to distinguish databases from storage systems:
- Storage holds raw data such as files, images, or videos without built-in processing.
- Databases add structure and enable operations like querying, sorting, and analysis through query languages.
- Think of storage as a collection of raw files, while a database resembles a structured dataset that can be searched and analyzed.
- There are many database types, differing in:
- How data is stored on disk
- How it is managed in memory
- How it is retrieved and presented
- Broad classification:
- Relational (SQL) databases
- Non-relational (NoSQL) databases
Core Concepts
- Schema
- Defines the structure of data, including attributes, data types, and relationships.
- Typically fixed in advance and difficult to modify later.
- Key / Index
- A unique identifier for each record in a dataset.
- Ensures that records can be efficiently located.
- Composite Key
- A combination of multiple attributes used together to uniquely identify a record.
Relational Databases (SQL / RDBMS)
- SQL (Structured Query Language) is used to query relational databases.
- A Relational Database Management System (RDBMS) organizes data into structured tables:
- Tables group related data.
- Rows represent individual records.
- Columns define attributes.
- Key characteristics:
- All rows must follow the same schema.
- Each row is uniquely identified by a primary key.
- Relationships between tables are defined using join tables.
- Relationship types include:
- One-to-one
- One-to-many
- Many-to-many
- Limitation:
- Schema rigidity makes it difficult to adapt to rapidly changing or highly dynamic data structures.

SQL Databases: Row vs Column Design

Row-Based Databases (OLTP)
- Data is stored row by row.
- Optimized for transactional operations (insert, update, delete).
- Strengths:
- Efficient for handling complete records.
- Ideal for applications with frequent transactions.
- Weakness:
- Less efficient for operations focused on specific columns.
- Common use cases:
- Transaction systems (orders, accounts, inventory).
Column-Based Databases (OLAP)
- Data is stored column by column.
- Optimized for analytical queries across large datasets.
- Strengths:
- Efficient for aggregations and reporting.
- Enables advanced analytics and insights.
- Weakness:
- Not suited for frequent transactional updates.
- Common use cases:
- Data warehousing and business intelligence
- Example: Amazon Redshift
Non-Relational (NoSQL) Databases
- Include database models that do not follow the relational structure.
- Typically provide flexible or minimal schemas.
- Designed for scalability and handling diverse data formats.
Key-Value Stores

- Store data as simple key-value pairs.
- Keys are unique; values are not interpreted by the database.
- Strengths:
- High performance and scalability
- Simple data model
- Use cases:
- Caching
- Session storage
Wide Column Stores

- Extend the key-value model with multiple attributes per item.
- Use a consistent key structure but allow flexible attributes.
- Strengths:
- Highly scalable
- Suitable for large-scale applications
- Example: Amazon DynamoDB
Document Databases

- Store data as structured documents (e.g., JSON).
- Documents can have varying structures within the same database.
- Strengths:
- Flexible schema
- Supports complex, nested data
- Powerful querying capabilities
- Use cases:
- Content management
- User profiles and catalogs
- Example: Amazon DocumentDB
Graph Databases

- Represent data as nodes (entities) and edges (relationships).
- Relationships are stored directly and can be efficiently queried.
- Strengths:
- Excellent for relationship-heavy data
- Fast traversal of connected data
- Use cases:
- Social networks
- Recommendation engines
- Organizational structures
- Example: Amazon Neptune
Database Models – Summary
| Model | Structure | Typical Use Cases |
|---|---|---|
| Relational (SQL) | Fixed schema, tables with rows and columns | Structured data with defined relationships |
| Row-based (OLTP) | Rows stored together | Transactional systems |
| Column-based (OLAP) | Columns stored together | Analytics and reporting |
| Key-value | Simple key-value pairs | Caching, simple lookups |
| Wide column | Flexible attributes with structured keys | Large-scale applications |
| Document | JSON/XML documents | CMS, user data, catalogs |
| Graph | Nodes and relationships | Relationship-driven systems |
Databases: ACID vs BASE
CAP Theorem
- CAP = Consistency, Availability, Partition tolerance → properties of distributed DB systems.
- Consistency (C)
- Every read returns the most recent write, or an error if that’s not possible.
- If you write a new value and read immediately, you either get the new value or an error.
- Availability (A)
- Every request receives a response, but it may not be the latest data.
- Partition Tolerance (P)
- DB continues to operate even if network partitions or node failures occur.
- Messages between nodes may fail, but the system keeps running.
- CAP Theorem: A distributed DB cannot guarantee all three properties simultaneously — it must choose two.
Trade-off Example:
- If a network partition occurs:
- Choose Consistency + Partition Tolerance (CP):
- Reject some reads → ensures latest data is returned → availability decreases
- Choose Availability + Partition Tolerance (AP):
- Accept all reads → may return stale data → consistency decreases
- Choose Consistency + Partition Tolerance (CP):
- ACID vs BASE → different transaction models that reflect CAP trade-offs:
- ACID → favors consistency
- BASE → favors availability
ACID Transaction Model (SQL / RDBMS)

- ACID = Atomic, Consistent, Isolated, Durable
- Most SQL DBs use ACID transactions for reliable operations (e.g., banking).
- Atomic
- Transaction succeeds entirely or not at all.
- $10 transfer example: money must leave Account A and arrive in Account B.
- Consistent
- DB moves from one valid state to another.
- Invalid states are never allowed.
- Isolated
- Parallel transactions do not interfere; end result is as if transactions were sequential.
- Durable
- Committed transactions persist even after crashes.
- Once the DB reports success, data is stored safely on non-volatile memory.
- Limitation: ACID can limit scalability due to strict rules.
- Example: financial systems, SQL DBs (RDS, Aurora).
BASE Transaction Model (NoSQL / DynamoDB)

- BASE = Basically Available, Soft state, Eventually consistent
- Used by highly scalable NoSQL DBs.
- Basically Available
- R/W operations available as much as possible, without strict consistency guarantees.
- Soft State
- Consistency is not enforced in DB; handled by the application.
- Data returned may not be the latest.
- Eventually Consistent
- Given enough time, the DB will converge to the latest state.
- Immediate consistency is not guaranteed by default.
- Many BASE DBs (like DynamoDB) offer optional ACID-style transactions for applications that require them.
- BASE DBs are highly scalable and performant because the DB does not enforce strict consistency by default.
- Example: Amazon DynamoDB (DDB)
- Supports eventually consistent reads (default)
- Supports strongly consistent reads if requested
- Supports ACID transactions when needed
Exam Power Notes (AWS)
- ACID mentioned → assume SQL / RDBMS (RDS, Aurora)
- BASE mentioned → assume NoSQL / DynamoDB
- If NoSQL + ACID → may indicate DynamoDB transactions
Databases in EC2
Monolith vs DB-Split Architecture
- Monolith Architecture:
- Web server + application + database all hosted on a single EC2 instance
- Simple setup, but tightly coupled components
- DB-Split Architecture:

- Application and database are separated into different components
- Database typically hosted on:
- Another EC2 instance, or
- An AWS-managed database service (preferred)
- Key Considerations:
- DB can be placed in a different Availability Zone (AZ) for resilience
- Introduces network dependency between app and DB
- Cross-AZ communication may incur additional cost
- Why split the architecture?
- Independent scaling (scale app and DB separately)
- Ability to use specialized AWS services (e.g., managed DBs)
- Better performance, resilience, and flexibility
- Monolithic architectures are generally discouraged in modern AWS design
Why Databases on EC2 Are Bad Practice
- In most cases, AWS-managed database services are preferred over self-hosted DBs on EC2
1. Administrative Overhead
- Requires managing:
- OS patching
- DB software updates
- Compatibility with applications
- Backup and disaster recovery must be manually configured
- Replication setup requires manual effort and expertise
2. Single Availability Zone Limitation
- EC2 instances are tied to one AZ
- If the AZ fails → DB becomes unavailable
- Requires:
- Regular backups
- Snapshot management (e.g., EBS + S3)
- Adds operational complexity and risk
3. Fewer Features Compared to Managed Services
- AWS-managed DBs provide:
- Built-in high availability
- Automated backups
- Read replicas
- Performance optimizations
- EC2-hosted DBs miss out on these advanced capabilities
4. Poor Scalability and No Serverless Options
- EC2 scaling is manual or limited
- No native serverless DB option
- Instance runs continuously → higher baseline cost
When a Database on EC2 Might Be Justified
These scenarios should be carefully validated:
1. OS-Level Access Required
- Need full control of the underlying operating system
2. Advanced Database Customization
- Requires root-level DB access for tuning
- Often driven by vendor requirements, not true business needs
3. AWS Does Not Support the Requirement
- Specific DB engine or version not available in AWS-managed services
- Specialized OS + DB combination required
- Custom replication or architecture not supported by AWS
4. Business Decision
- Sometimes chosen due to organizational preference, even if not optimal
Key AWS Exam Takeaways
- Default recommendation:
→ Use AWS-managed database services - Avoid:
→ Running databases on EC2 unless there is a clear, justified need - Architecture best practice:
→ Decouple application and database layers
Amazon RDS (Relational Database Service) 101
Amazon RDS – Key Concepts
- What it is: AWS-managed DB server (RDS instance)
- Similar to on-premises DB servers, but AWS handles HW, OS, installation, and maintenance
- Pros:
- No need to manage OS or DB installation
- DB engine maintenance mostly automated
- Easy integration with other AWS services
- Cons:
- No OS-level or SSH access (except in RDS Custom)
- Important distinction:
- RDS is DBServer-as-a-Service (DBServeraaS), not DBaaS
- You pay for the DB server/instance, not just a single database
- Can host multiple databases per instance
- RDS is DBServer-as-a-Service (DBServeraaS), not DBaaS
- Supported DB Engines:
- Open-source: MySQL, MariaDB, PostgreSQL
- Commercial: Oracle, MS SQL Server (licensing fees may apply)
- Amazon Aurora is separate
- AWS-designed DB engine compatible with some RDS engines
- Offers additional improvements and features
RDS Architecture

- RDS instances run inside a VPC and are deployed in subnets
- Access:
- Private subnets: via VPN, Direct Connect, VPC peering
- Public subnets: can have public IPs (discouraged for security)
- Accessed via DNS CNAMEs
- DB Subnet Group:
- A list of VPC subnets for RDS deployment
- Can include subnets across multiple AZs
- Must be selected when launching any RDS instance
- Best practice: 1 subnet group per RDS deployment
- Multi-AZ Mode:
- Deploys primary + standby in different AZs
- Each has dedicated EBS storage
- Synchronous replication from primary → standby
- Backups occur from standby (no performance impact)
- Backups:
- Stored in AWS-managed S3
- S3 replicates data across multiple AZs
- Snapshots safe from AZ failures
- Read Replicas (0+):
- Asynchronous replication
- Can be same or cross-region
- Use cases: scale read load, resilience, disaster recovery
RDS Billing Overview
- RDS billing is resource-based, similar to EC2
- Instance fee:
- Billed hourly (per-second granularity)
- Cost depends on instance size & type
- Multi-AZ mode: extra cost (extra instance + storage)
- Storage fee: per GB/month
- Storage type affects price (e.g., Provisioned IOPS more expensive)
- Data transfer fee: per GB in/out of instance
- Free within the same region
- Backups & snapshots: per GB/month
- Free backup storage = size of your DB storage
- Licensing fee: if using commercial DB engines
DEMO: Migration of WordPress DB to a Different DB Tier
Migration of MariaDB to a Different EC2 Instance
- Following instructions, WordPress (WP) installed in monolith instance, blog post with images created.
- Media in this WP blog post is stored in the local filesystem. When moving to a fully elastic architecture, media should be migrated to a shared filesystem since it should not reside inside EC2 instances.
- Steps to migrate MariaDB from monolith instance to a different EC2 instance (IP: 10.16.59.228):
- Create backup of source DB: mysqldump -u root -p a4lwordpress > a4lwordpress.sql
- Restore backup to destination DB: mysql -h 10.16.59.228 -u a4lwordpress -p a4lwordpress < a4lwordpress.sql
- Update WordPress configuration: sudo nano /var/www/html/wp-config.php Replace: /** MySQL hostname */
define(‘DB_HOST’, ‘localhost’); With: /** MySQL hostname */
define(‘DB_HOST’, ‘10.16.59.228’); - Stop MariaDB service on the original instance: sudo service mariadb stop
Migration of MariaDB to RDS (Free Tier)
Part 1: Creation of an RDS Instance
- Create a DB Subnet Group before deploying an RDS instance:
- Can include subnets from different Availability Zones (AZs) in the VPC.
- Select subnets marked for the DB tier.
- CIDRs can be verified in the VPC service.
- RDS creation wizard options:
- DB engine and version selection is critical. Some versions may not be compatible with Aurora.
- Templates: PROD, DEV/TEST, Free tier.
- Free tier: only supports single-instance deployment, no Multi-AZ.
- Configure: DB instance identifier, storage, backups, etc.
- Security Group (SG) must be assigned to control inbound access.
- Optional: create an initial DB (default = none).
- Once provisioned, every RDS instance receives an endpoint CNAME and port, which are used to connect from apps.
Part 2: Migration from EC2 DB Instance to RDS
- Update RDS SG: Allow inbound access from the EC2 WordPress app instance by including its SG.
- Migration steps:
- Create a backup / SQL dump from the EC2 DB.
- Restore the dump to the RDS instance using its endpoint CNAME.
- Update
wp-config.phpwith the RDS endpoint. - Stop the MariaDB service on the EC2 instance.
- RDS deletion: You will be prompted to keep a final snapshot and backups.
RDS Multi-AZ Deployments
RDS Multi-AZ – Instance Deployment
- Historically, the only mode of Multi-AZ deployment providing high availability (HA) is the RDS Multi-AZ Instance Deployment.
- A more modern architecture exists: RDS Multi-AZ Cluster Deployment.
- Architecture: Primary instance + one standby instance
- Primary and standby are in different AZs of the same region.
- Primary data is synchronously replicated to the standby.
- Replication is storage-level only, which is less efficient than cluster deployment and depends on the DB engine.
- DB operations are committed only after being registered on both primary and standby.
- Standby carries extra cost; no free tier.
- DB access: All reads and writes are done via the DB CNAME pointing to the primary instance.
- Standby is never accessed directly.
- Backups: Can be performed from the standby.
- Backups are stored in S3 and replicated across AZs.
- No performance impact on the primary instance.
- Failover: Improves availability.
- Possible reasons: AZ outage, primary failover, manual failover, instance change, software patching.
- RDS automatically updates the DB CNAME to point to the standby, which becomes the new primary.
- Failover duration: 60–120 seconds.
- Clearing in-app DNS cache can reduce downtime.
- Standby improves availability and allows backups without affecting primary performance, but does not provide read scaling, since all DB access goes through the primary.
RDS Multi-AZ – Cluster Deployment
- Architecture: One Writer instance + two Reader instances, all in different AZs of the same region.
- Provides higher HA than Multi-AZ Instance deployment.
- Synchronous replication from Writer to Readers.
- DB operations are committed once the Writer and at least one Reader confirm the transaction, ensuring data resilience across AZs.
- Usage:
- Writer handles Reads and Writes.
- Readers handle Reads only, allowing read scaling.
- Applications must be aware of separate roles for Writer and Readers.
- More instances → higher cost than Multi-AZ Instance deployment.
- Important:
- Do not confuse with Aurora clusters. Aurora supports more than two Reader instances and has shared storage across instances.
- Each RDS instance has local EBS storage; cluster replication is at the instance level.
- Endpoints:
- Cluster endpoint: Points to Writer, supports Reads, Writes, and admin operations.
- Reader endpoint: Routes Reads to an available Reader. In some cases, may direct to Writer.
- Instance endpoints: Point to specific instances; not recommended for general use since they do not tolerate failures.
- Transaction logs:
- Record all transactions.
- Allow efficient replication and faster failover (~35 seconds + transaction log application).
- Other advantages over Multi-AZ Instance deployment:
- Faster hardware (Graviton architecture + NVMe SSD).
- Fast writes to local storage, then flushed to EBS.
- Combines high performance with resilience benefits of EBS.
RDS Snapshots, Automatic Backups, and Restore
Backing Up Data in RDS

- RDS data can be backed up to S3 via EBS snapshots
- Options:
- Manual Snapshots
- Automated Backups
- Data is stored in AWS-managed S3 buckets
- Buckets are visible in RDS console UI, but not in the S3 console UI.
- Benefits: data replicated across multiple AZs → improved resiliency.
- Snapshots/backups use EBS snapshots under the hood
- Snapshots cannot be seen from EBS console, only from RDS console.
- Options:
- I/O pause during backup
- PROD: Usually Multi-AZ enabled; backups from standby → no read performance impact.
- Write performance can pause briefly while replication occurs to standby.
- Reads are generally higher in volume than writes, so overall DB performance is usually unaffected.
- DEV/TEST: Single-AZ → backup causes I/O pause because there is only one instance.
- PROD: Usually Multi-AZ enabled; backups from standby → no read performance impact.
- Incremental snapshot/backup architecture
- First snapshot is full, subsequent snapshots are incremental.
- Incremental snapshots contain only changes since last snapshot → faster than first snapshot.
- Deleting a snapshot in the chain does not break functionality.
RDS Manual Snapshots
- Run manually or via scripts/applications.
- Do not expire automatically; live beyond RDS instance lifecycle.
- Must be manually deleted to avoid storage costs.
- Customer decides frequency (hourly, daily, weekly, etc.) → affects RPO.
- When deleting an RDS instance, prompted to create a final snapshot to preserve data.
RDS Automated Backups
- Automated snapshots triggered daily within a defined backup window.
- For Single-AZ instances, schedule during low-traffic periods due to I/O pause.
- Expire automatically after retention period (0–35 days).
- 0 days → automated backups disabled.
- Backups expire even if the RDS instance is deleted.
- Preserve data by creating a final manual snapshot before deletion.
- Transaction logs uploaded to S3 every 5 minutes → allows point-in-time restore within retention period.
- Typical RPO: 5 minutes.
- Restore can be done to any second if transaction logs exist for that point.
- Restoring requires applying transaction logs → increases restoration time (RTO longer than manual snapshots).
- Cross-Region Replication (CRR) is optional → replicates backups and transaction logs to another region; charges apply.
Restoring Data in RDS
- Restoring creates a new RDS instance → new IP, DB CNAME, and endpoint.
- Applications must update to the new endpoint.
- Manual snapshots: Restore to the snapshot creation time → single point in time.
- Automated backups: Restore to any 5-minute point in time within retention period → great for data corruption recovery.
- Restores can be slow, especially for large databases.
- Read replicas can improve RTO significantly.
RDS Read-Replicas (RRs)
RDS Read-Replicas (RRs) – Architecture

- RDS Read-Replica (RR) = a read-only replica of an RDS instance
- Can be used for reads (unlike standby replica in Multi-AZ instance) → allows read performance scaling
- An instance can have up to 5 direct RRs
- Can be in the same region or a different region (cross-region RRs)
- RRs are separate from the main RDS architecture
- Each RR has its own endpoint, independent from the main RDS instance endpoints
- Requires application support: apps need to be configured to use a RR
- By default, applications do not know about RRs
- No automatic failover
- Each RR has its own endpoint, independent from the main RDS instance endpoints
- Asynchronous replication
- Data is committed on the main instance first, then replicated to its RRs
- Lag can occur depending on network conditions and write volume
- RRs can have their own RRs → lag increases further
- Cross-region RRs improve global read performance
- Users can read from the closest region efficiently
- Network handled transparently by AWS, data encrypted in transit
- Multi-AZ cluster deployment combines Multi-AZ instance deployment + RRs
- 2 Reader instances in Multi-AZ cluster deployment are part of the main architecture
- Key exam point:
- Synchronous replication → Multi-AZ
- Asynchronous replication → RRs (excluding Aurora)
Promotion of RDS Read-Replicas & Disaster Recovery
- RRs are read-only until promoted
- Once promoted, they become a normal RDS instance
- Promotion is quick → provides low RTO
- Improves global availability and resilience
- RR in a different region can be promoted if the main region fails
- Offers near-zero RPO
- Data is continuously synced from the main DB, minimal potential for data loss
- Ideal for quick recovery from failure (not data corruption)
- Important caution: Do not use RRs for data corruption recovery
- Since replication is constant, corrupted data is also replicated
- In case of data corruption, rely on manual snapshots and automated backups
- Higher frequency or higher quality snapshots/backups improves RPO
DEMO: RDS Multi-AZ & Snapshot Restore
When taking a snapshot, it is a good idea to include the DB engine and DB version in the snapshot name. Multi-AZ deployment and creation of a standby replica are recommended for PROD usage:
- The primary instance is left untouched for backups.
- Provides resilience in case of a primary AZ outage.
Backup windows are defined when creating the instance and can be adjusted later in the instance settings. When creating a standby replica (Multi-AZ), it can be created immediately or in the next backup window:
- Creating it immediately may cause temporary outages.
- Standby replica creation and synchronous replication setup will take some time.
An RDS instance can be rebooted with failover if a standby replica is configured:

- This simulates a primary AZ outage.
- Failover will take 60–120 seconds (for Multi-AZ instance deployment).
After restoring a snapshot, the new instance will take some time to become available.
- Once available, the application (e.g., EC2 instance) must update its configuration to point to the new RDS instance.
RDS Data Security
RDS Encryption
RDS Encryption in Transit
- RDS data is encrypted while being transferred between RDS and clients.
- SSL/TLS is available and can be set to mandatory on a per-user basis.
RDS Encryption at Rest

- RDS data is encrypted when written to disk.
- Two options:
- EBS Volume Encryption with KMS
- Default option, handled by the host and underlying EBS storage.
- DEKs generated from AWS-managed or customer-managed KEKs/CMKs.
- DEKs are loaded on hosts as required for encryption/decryption operations.
- Cannot be removed once enabled.
- Transparent to the DB engine; DB sees unencrypted data.
- Transparent Data Encryption (TDE)
- Native DB engine encryption (supported by RDS MSSQL and RDS Oracle).
- Data encrypted by DB engine before being written to disk.
- RDS Oracle can integrate with AWS CloudHSM, giving stronger key controls.
- CloudHSM is secured by the customer, keeping AWS out of the trust chain.
- Encrypted RDS instances propagate encryption to replicas and snapshots using the same configuration and keys.
RDS Authentication and Authorization

- Authentication: how users log in.
- Authorization: how access is controlled inside RDS.
Local DB Users
- RDS logins are normally handled via local DB users with username & password.
- A local DB user is created when provisioning an RDS instance (e.g., admin).
- Local DB users are not IAM users; they are controlled by the DB engine.
IAM Authentication for RDS
- Allows IAM identities to access RDS without a password:
- Local DB user is configured to use AWS authentication tokens.
- IAM policy maps the IAM identity to a local DB user.
- IAM identity can generate a short-lived DB auth token (valid 15 minutes).
- IAM authentication only; authorization is still controlled internally by the DB engine.
RDS Custom
RDS Custom – Key Facts
- Niche topic; surface-level understanding is usually sufficient.
- Middle ground between RDS and self-managed DB on EC2:
- RDS: fully managed DB-as-a-Service
- OS and DB engine access is limited.
- Customers cannot see RDS instances in EC2, EBS volumes, or S3 snapshots.
- DB Engine on EC2: self-managed
- Customer responsible for OS, engine, backups, and all overhead.
- RDS Custom: combines benefits of both approaches
- Automation features of RDS
- Access to OS and DB engine for advanced customization
- RDS: fully managed DB-as-a-Service
- Supported only for MSSQL and Oracle.
- Runs fully within your AWS account
- RDS Custom instances are visible in EC2, EBS volumes, and S3 snapshots.
- ENIs are injected into your VPC, allowing network access.
- Can connect using SSH, RDP, or EC2 Session Manager.
- Customizing RDS Custom instances
- Pause DB automation to make changes without disruptions.
- Resume full automation for normal production use.
Amazon Aurora 101
Amazon Aurora – Overview
- DB engine designed by AWS
- Part of Amazon RDS, but distinct product with unique features and architecture.
- Two compute paradigms:
- Aurora Provisioned – customer deploys cluster of instances.
- Aurora Serverless – capacity managed by AWS, customer sets ACUs.
- Storage: shared cluster volume → high performance, improved availability.
Aurora Provisioned Architecture
Cluster (DB Instances)
- Aurora Cluster = 1 primary + 0–15 read replicas
- Instances can be in same or different AZs.
- Replicas used for reads → read scalability.
- Replicas can replace primary → high availability.
- Better than RDS Multi-AZ cluster deployments:
- RDS Multi-AZ only supports 2 read replicas natively.
Storage (Cluster Volume)

- No local storage – all instances share a cluster volume.
- Shared storage → fast provisioning & failover.
- SSD-based → high IOPS, low latency.
- Max size: 128 TiB.
- 6 storage nodes across AZs → improved availability and resilience.
- Writes from primary synchronously replicated to all storage nodes.
- Replication happens at storage level → minimal performance impact.
- Automatic detection and repair of storage failures.
Access (Endpoints)

- Cluster endpoint: always points to primary (read, write, admin).
- Reader endpoint: load balances reads across available instances (primary or replicas).
- Instance endpoints: unique per instance, useful for testing/diagnosis.
- Custom endpoints: user-defined.
Aurora Billing (Provisioned)
- No free-tier (micro instances not supported).
- Compute: charged per hour, per second (10-minute minimum).
- Storage: billed based on consumed data.
- Historically used High Water Mark (HWM): billed for historic max consumed storage.
- IO requests may incur small additional cost.
- Backups: up to 100% of DB size included for storage at no extra cost.
Aurora Features
- Backups & restores work like RDS.
- Restores create a new cluster → apps must update endpoints.
- Backtrack: in-place rollback to a previous point in time without restoring from backup.
- Configured per-cluster, reduces downtime during data corruption.
- Fast clones: copy-on-write storage.
- Only differences stored → fast, space-efficient cloning.
Aurora Global Database
Aurora Global DB – Overview

- Purpose: Global-level replication of an Aurora Provisioned cluster.
- Primary region:
- 1 read/write (R/W) instance + 0–15 read-only (R/O) replicas.
- Only region that accepts writes.
- Secondary regions:
- 1–16 read-only replicas.
- All instances are read-only during normal operations.
- Can be promoted to primary for disaster recovery.
Replication Architecture
- Storage-level replication → no extra CPU required → no performance impact on primary.
- Replication latency: ~1 second → very low RPO and RTO.
- Direction: one-way only (primary → secondary).
Benefits
- Cross-region disaster recovery & business continuity
- Failover possible if primary region fails.
- Very low downtime (low RTO) and minimal data loss (low RPO).
- Global read scaling
- Secondary regions serve read traffic → reduces latency internationally.
- Ideal for applications with global users needing fast reads.
Aurora Multi-Master Writes
Aurora Single-Master Mode (Default)

- Architecture: 1 R/W instance + 0–15 R/O replicas.
- Endpoints:
- Cluster endpoint → writes
- Reader endpoint → load-balanced reads
- Failover:
- A replica is promoted to R/W on failure.
- Not instant → cluster endpoint updated → brief downtime.
- Use case: Standard Aurora Provisioned clusters; good for most applications that can tolerate brief failover delays.
Key point: Single-Master clusters have a single write target; reads can be scaled via replicas.
Aurora Multi-Master Mode

- Architecture: 1–16 R/W instances (all instances can write).
- Endpoints:
- No cluster or reader endpoints → app must connect directly to instances.
- Writes:
- Approved by all storage nodes (quorum)
- If a node rejects → write fails
- Once committed, writes are replicated to all nodes and in-memory caches → reads are consistent
- Failover:

- Handled by the app
- Connections maintained to multiple instances → instant failover
- Can support FT apps if app is properly configured
- Benefits vs Single-Master:
- Better availability → apps can write/read from any instance
- Instant failover → minimal disruption
- Challenges:
- App must handle load balancing for both reads and writes
- App must implement automated failover
Key point: Multi-Master allows multiple writers simultaneously, enabling high availability and potential FT applications, but app complexity increases.
Quick comparison table:
| Feature | Single-Master | Multi-Master |
|---|---|---|
| Write Instances | 1 | 1–16 |
| Read Scaling | Reader replicas | App handles reads |
| Failover | Replica promoted, cluster endpoint updated | Instant via app, no promotion needed |
| App Complexity | Low | High (must manage reads/writes & failover) |
| Use Case | Standard HA | High availability, FT applications |
Aurora Serverless
Aurora Serverless – Key Concepts
- Serverless offering, conceptually comparable to AWS Fargate for ECS
- Aurora Provisioned → users allocate database instances with fixed sizes and are responsible for managing them
- Aurora Serverless → eliminates the need to pre-provision or handle database instances
- Reduces operational management effort
- More aligned with a Database-as-a-Service (DBaaS) model
- Reminder: RDS and Aurora Provisioned operate as DB server–as-a-service
- Pricing is based on consumption, billed per second
- Aurora Serverless clusters operate using Aurora Capacity Units (ACUs)
- Each ACU corresponds to a defined amount of compute power and memory
- Users configure minimum and maximum ACU limits for the cluster
- The system automatically adjusts capacity within this range depending on workload demand
- Aurora Serverless v1 has been retired; Serverless v2 is now the supported version
- Serverless v2 introduces several enhancements over v1:
- Fine-grained and continuous scaling in increments of 0.5 ACUs, with near-instant responsiveness
- Always active; unlike v1, it does not pause during inactivity, removing cold start delays
- Supports Multi-AZ configurations for high availability and disaster recovery
- Enables read replicas and global database capabilities
- Improved handling of large numbers of concurrent connections through connection pooling and bursting
- Serverless v2 introduces several enhancements over v1:
Aurora Serverless v1 – Architecture

- This section needs updating for Serverless v2, as the original material focuses only on v1
- Uses the same storage architecture as provisioned Aurora clusters
- Maintains the same durability (6 storage nodes distributed across Availability Zones)
- However, storage is accessed through ACUs instead of provisioned instances
- Serverless v1 relies on Aurora Cluster Units (ACUs) to interface with storage
- These should not be confused with Aurora Capacity Units, despite the shared abbreviation
- Characteristics of Aurora Cluster Units:
- Drawn from a shared warm pool managed by AWS and used across multiple customers
- Stateless in nature
- Do not include local storage, allowing rapid allocation to clusters when needed
- Once assigned, they connect to the cluster’s storage similarly to provisioned instances
- The number and size of ACUs dynamically adjust to match workload demand
- If demand increases beyond current capacity, additional ACUs are added, up to the configured maximum
- If demand decreases, excess ACUs are removed, but never below the defined minimum
- In v1, setting the minimum to zero allows the cluster to pause, resulting in charges only for storage
- This behavior is not supported in Serverless v2, where a minimum capacity is always maintained
- Client connections are routed through an AWS-managed proxy layer
- This process is transparent to users
- Applications connect to the proxy, which manages communication with the ACUs
- Enables seamless scaling without disrupting application connections
- Users only define minimum and maximum ACU values; connection handling is abstracted away
- Billing model:
- Compute: charged based on the number of ACUs used at any given time
- Storage: billed similarly to Aurora Provisioned
Aurora Serverless – Benefits
- Simpler management model with reduced administrative overhead for database capacity
- Automatic and seamless scaling of compute and memory resources without interrupting connections
- Cost efficiency through usage-based pricing
Aurora Serverless – Use Cases
- Workloads with infrequent usage (e.g., low-traffic websites)
- Applications with variable demand patterns, including occasional spikes
- Removes the need to allocate fixed capacity in advance
- Unpredictable workloads where demand is difficult to estimate
- New applications with unknown usage patterns
- Avoids the need to resize database instances later, which can cause disruptions
- Development and testing environments
- In Serverless v1, databases could pause during inactivity, incurring only storage costs
- This capability is not available in Serverless v2

- Multi-tenant applications with subscription-based billing models
- Increased database usage typically aligns with increased revenue
- Infrastructure scaling naturally matches business growth
- Overall, Aurora Serverless provides a flexible solution suitable for a wide range of scenarios
Demo: Migrating WordPress DB to Aurora Serverless v1
- Creating a Serverless cluster from a provisioned snapshot is considered a restore, not a migration
- Restoration is limited to compatible database engine versions
- Building a cluster from scratch requires more setup compared to restoring from a snapshot
- After restoring, ACU usage is typically high due to increased demand during initialization

- In Serverless v1, compute capacity can pause after a period of inactivity
- This occurs regardless of the configured minimum ACUs
- When a paused cluster is accessed, it must resume before handling requests
- Applications must tolerate longer connection times in this scenario

- This behavior does not apply to Serverless v2
- A baseline level of compute capacity is always maintained
- As a result, there are no delays caused by resuming from a paused state

RDS Proxy
Database Proxies – Overview
- Why use database proxies?
- Establishing and terminating database connections consumes time and system resources
- This impact is especially noticeable for smaller database operations
- For minimal read/write actions, the connection setup often represents most of the total execution time
- Also significant when there is a high volume of connections
- For example, in serverless environments, each Lambda invocation may create and close a connection
- With many concurrent executions, this results in a large number of connections, increasing latency and cost
- Since Lambda billing is based on execution duration, this is inefficient
- This impact is especially noticeable for smaller database operations
- Managing database failures within application logic increases complexity and risk
- Applications must handle retry timing, connection timeouts, and failover behavior, which adds overhead
- Establishing and terminating database connections consumes time and system resources
- A database proxy is positioned between applications and the database to address these challenges
- Architecture: Application(s) → DB Proxy (connection pooling) → Database
- The proxy maintains persistent database connections
- Applications connect to the proxy and reuse pooled connections
- Enables multiplexing, where multiple application requests share fewer backend connections
- Drawback: Operating and maintaining a database proxy (including scaling and fault tolerance) can be complex
- AWS RDS Proxy is a managed service that simplifies this responsibility
RDS Proxy – Architecture
- RDS Proxy is a fully managed database proxy service for RDS and Aurora
- Provides built-in high availability and automatic scaling
- Reduces operational effort compared to self-managed proxy solutions
- Accessible only within a VPC
- It cannot be directly reached from the public internet
- Offers connection pooling, which lowers database load by:
- Reducing the overhead of repeatedly opening and closing connections
- Allowing multiplexing, so fewer connections are required between the proxy and the database
- Handles database failures and failover without exposing complexity to applications
- Applications connect using a proxy endpoint
- This interaction is transparent, so applications behave as if they are connecting directly to the database
- During failover, the proxy automatically redirects traffic to the new primary instance
- This process happens in the background without requiring application changes
- Applications can continue attempting connections even if the database is temporarily unavailable
- In Aurora environments, failover duration can be reduced by more than 60%
- Applications connect using a proxy endpoint
- Supports enforcing SSL/TLS for secure connections
Good-Fit Scenarios for RDS Proxy
- Situations with excessive database connections
- Common in smaller instance types such as T2 or T3
- Multiplexing reduces the number of active connections to the database
- AWS Lambda workloads
- Reusing persistent connections through the proxy reduces connection setup time
- This lowers Lambda execution duration and associated costs
- Lambda functions can use IAM roles for authentication, which can also be applied when connecting through RDS Proxy
- Reusing persistent connections through the proxy reduces connection setup time
- Applications requiring long-lived connections, such as SaaS platforms, where minimizing latency is important
- Environments where high resilience to database failures is required
- RDS Proxy helps minimize failover time and shields applications from underlying disruptions
RDS & Aurora – Summary Table
| AWS Relational DB model | Architecture & AZ coverage | Storage | Scalability & Failover |
|---|---|---|---|
| RDS Single-AZ | Single database instance deployed in one Availability Zone | EBS | – No support for scaling |
- No automatic failover capability
RDS Multi-AZ instance | One primary instance with a standby replica located in another AZ (spans 2 AZs) | Each instance is backed by its own EBS volume | – No scaling supported - Automatic failover: standby is promoted to primary (typically 60–120 seconds)
RDS Multi-AZ cluster | – One writer instance and two read replicas - Distributed across 3 AZs | – Each instance uses its own EBS storage
- Transaction logs are utilized
- Local storage is periodically flushed to EBS for better performance | – Supports read scaling
- Automatic failover: a reader can be promoted to writer (around 30 seconds plus time to apply transaction logs)
Aurora Provisioned | – One primary (read/write) instance with 0–15 read replicas - Spans multiple AZs depending on the number of replicas | – No local instance storage
- Shared SSD-backed cluster volume across all instances
- Storage replicated across 6 nodes in 3 AZs | – Enables read scaling (increases with more replicas)
- Automatic failover: a replica takes over as primary very quickly due to shared storage
Aurora Serverless | – No fixed database instances are provisioned - Users define minimum and maximum capacity using ACUs
- Compute capacity is accessed through a proxy layer | – Same storage approach as Aurora Provisioned
- Cluster volume is accessed via ACUs instead of database instances | – Provides automatic and flexible scaling based on demand
- No traditional failover model, as failover handling is abstracted away from clients
AWS Database Migration Service (DMS) & AWS Schema Conversion Tool (SCT)
Database Migrations
- Database migration refers to transferring all data from a source database to a target database
- Data may retain the same structure, configuration, and schema, or it may require changes
- A key challenge is handling migrations between different database engines
- This process is generally complex
- Often requires significant manual effort from start to finish
- Vendor-provided tools may assist in certain cases
- Typically involves replication setup and/or restoring from backups
- A major consideration is how to manage ongoing data changes during the restore process
AWS Database Migration Service (DMS) – Architecture

- AWS Database Migration Service (DMS) is a managed solution designed to simplify database migrations
- Supports most commonly used database engines such as MySQL, Aurora, Microsoft SQL Server, MariaDB, MongoDB, PostgreSQL, Oracle, and Azure SQL
- While it supports many migration scenarios:
- Schema conversion requires the use of AWS Schema Conversion Tool (SCT)
- Some limitations exist (for example, certain databases can only act as targets)
- Runs on an EC2-based Replication Instance, with defined source and destination endpoints
- The Replication Instance runs replication software (one or more tasks) and communicates with DMS
- Replication tasks define how the migration is executed
- Source and destination endpoints store connection details for both databases
- At least one endpoint must reside within AWS
- DMS cannot be used to migrate strictly between two on-premises databases
- At least one endpoint must reside within AWS
- The Replication Instance runs replication software (one or more tasks) and communicates with DMS
- By default, data is transferred over the network (e.g., Direct Connect, VPN, VPC peering)
- For very large datasets, network transfer may be inefficient or costly
- In such cases, AWS Snowball devices can be used along with DMS and SCT for faster bulk transfer
- Types of migration jobs:
- Full Load
- Performs a one-time transfer of all existing data
- Requires database downtime during the migration process
- Suitable only when downtime is acceptable
- Full Load + Change Data Capture (CDC)
- First performs a full data load
- Simultaneously captures ongoing changes during migration
- After the initial load, captured changes are applied to the destination
- Eventually, both databases reach synchronization
- Final cutover involves stopping applications briefly and redirecting them to the new database
- Results in minimal downtime, often close to zero
- CDC Only
- Used when bulk data transfer is handled outside of DMS (e.g., native tools like Oracle import/export)
- DMS is then used only to replicate ongoing changes
- Full Load
- DMS does not natively handle schema conversion
- The AWS Schema Conversion Tool (SCT) is used alongside DMS when needed
- Common use cases and benefits:
- Frequently used for large-scale database migrations
- Well-suited for migrating on-premises databases to AWS
- Enables migrations with minimal or no downtime
- In exam scenarios, DMS is typically the default choice when migration involves AWS and no special constraints are mentioned
AWS Schema Conversion Tool (SCT)
- Used to convert or modify database schemas between different engines or versions
- Should not be used when migrating between compatible databases of the same engine
- Example: on-premises MySQL to RDS MySQL
- Useful for migrations involving different database types
- Example: SQL Server to MySQL, or Oracle to Aurora
- Also helpful in large migrations involving DMS and Snowball
- Should not be used when migrating between compatible databases of the same engine
- Supports:
- OLTP databases such as MySQL, SQL Server, and Oracle
- OLAP databases such as Teradata, Oracle, Vertica, and Greenplum
- Operates as a standalone tool independent of DMS
- Can be used outside AWS, including for migrations between on-premises databases
Large Database Migrations with DMS and AWS Snowball
- Some migrations involve very large datasets (multi-terabyte scale)
- Transferring such volumes over the network can be slow and resource-intensive
- AWS Snowball devices can be used for bulk data transfer into and out of AWS
- Request a Snowball device from AWS
- Use SCT to extract and store data locally onto the device
- Ship the device back to AWS, where the data is uploaded to Amazon S3
- DMS then transfers data from S3 to the target database
- Optionally, CDC can capture ongoing changes and apply them to the destination
- Although SCT is used in this workflow, it does not violate its typical usage rule
- In this case, SCT converts the database into a generic file format for storage on the Snowball device rather than performing schema transformation between engines
NoSQL Databases & DynamoDB
Amazon DynamoDB (DDB) 101
DDB Concepts
- DynamoDB is a NoSQL database offering at the table level (DB Table-as-a-Service)
- Uses a wide-column data model, supporting key-value and semi-structured data (similar to document-style databases)
- Fully serverless, with no need to manage servers or underlying infrastructure
- Commonly used in serverless architectures and large-scale web applications within AWS
- Exposes public endpoints for access
- In comparison, RDS and Aurora are DB server–based services, operate privately, and use SQL-based engines
- DynamoDB capacity represents database performance, specifically operation throughput
- Measured using Write Capacity Units (WCUs) and Read Capacity Units (RCUs)
- Increasing capacity directly improves throughput and performance
- Scaling models:
- Provisioned capacity
- Requires defining WCUs and RCUs
- Can be manually configured for predictable workloads
- Can also be automatically adjusted using scaling policies
- On-demand capacity
- Fully managed performance model with automatic scaling
- No need to specify capacity in advance
- Provisioned capacity
- Key features:
- Built-in regional resilience with high availability across multiple AZs
- Global replication can be optionally enabled
- Data is automatically replicated across storage nodes without user configuration
- Uses SSD-backed storage, providing very low latency (typically under 10 ms)
- Accessible through the AWS console, CLI, and APIs
- Supports backups and point-in-time recovery
- Encryption at rest is enabled
- Integrates with event-driven architectures by triggering actions on data changes
- Does not support traditional SQL queries
- Supports PartiQL, which offers SQL-like query capabilities
- Pricing is usage-based:
- Charges for read/write operations (based on RCUs and WCUs)
- Storage usage
- Optional features such as point-in-time recovery
- Reserved capacity options are available for long-term cost savings
- Built-in regional resilience with high availability across multiple AZs
- DynamoDB considerations (exam and real-world):
- Prefer DynamoDB for NoSQL use cases
- Not suitable for relational data models
- Lacks features typically found in relational databases
- Ideal for key-value access patterns
DDB Tables

- A table represents a collection of items sharing a common primary key structure
- It is the fundamental unit of configuration in DynamoDB
- Tables can store an unlimited number of items
- An item is equivalent to a row
- A primary key must be defined when creating a table, with two options:
- Simple primary key (Partition Key)
- Each item must have a unique partition key value
- Composite primary key (Partition Key + Sort Key)
- Each item must have a unique combination of partition and sort key values
- Simple primary key (Partition Key)
- Attributes have no fixed schema
- Items can contain different attributes or none at all
- Maximum item size is 400 KB
- Includes keys, attribute names, and attribute values
- Table capacity determines performance and is measured in RCUs and WCUs
- 1 WCU supports 1 KB of write throughput per second
- 1 RCU supports 4 KB of read throughput per second
- Most operations consume a minimum amount of capacity
- For example, reading 100 bytes still consumes 1 RCU
DDB Backups

On-Demand Backups
- Manual full-table backups that persist until explicitly deleted
- Comparable to manual snapshots in RDS
- Restore options:
- Can restore within the same region or to a different region
- Can include or exclude indexes
- Encryption settings can be modified during restore
Point-in-Time Recovery (PITR)

- Provides continuous backups by recording all changes made to a table for up to 35 days
- Disabled by default
- Enables restoration of a table to any specific second within the retention window
DynamoDB – Operations, Consistency & Performance
DDB Capacity and R/W Operations
- DynamoDB offers two capacity modes:
- Provisioned capacity → RCUs and WCUs are explicitly configured per table
- On-demand capacity → suited for unpredictable workloads or when minimizing administrative effort
- No need to define capacity in advance; DynamoDB manages it automatically
- Pricing is based on the number of read/write units consumed (per million requests)
- Can be significantly more expensive than provisioned capacity in some cases
- It is possible to switch between provisioned and on-demand modes, although certain limitations apply
- Key trade-off: operational control (provisioned) versus simplicity and potentially higher cost (on-demand)
- Every table operation consumes at least 1 RCU or 1 WCU
- Exception: eventually consistent reads may consume less than 1 RCU
- 1 RCU supports one read request of up to 4 KB per second
- Even if the data size is smaller, it still consumes 1 RCU
- Larger items consume multiple RCUs (e.g., 400 KB = 100 RCUs)
- Capacity is replenished every second
- 1 WCU supports one write request of up to 1 KB per second
- In provisioned mode, each table includes burst capacity pools for reads and writes
- Burst capacity is calculated as provisioned capacity multiplied by 300 seconds
- Provisioned values represent sustained throughput, while burst capacity allows temporary spikes
- If burst capacity is exhausted, a
ProvisionedThroughputExceededexception occurs- Requests are throttled and must be retried or capacity increased
- Other internal operations may also consume burst capacity, so it should not be heavily relied upon
- Capacity efficiency considerations:
- Smaller items are generally more efficient than larger ones for read/write operations
- Retrieving data in fewer requests is more efficient than making multiple smaller requests
DDB Common Read Operations
DDB Query Operation
- The Query operation requires a single partition key value and can optionally include a sort key or a range of sort key values
- Returns all matching items
- Capacity consumption is based on the total size of the items returned
- Filtering returned attributes does not reduce consumed capacity, as the full item size is still read
- A Query can only retrieve items associated with a single partition key
- It cannot scan across the entire table
- Example:
- A table stores weather data where SensorID is the partition key and day of the week is the sort key
- Querying with
PK=1returns all items for that sensor- If total size is 4 KB, this consumes 1 RCU
- Querying with
PK=1andSK=MONreturns one item- If size is 2.5 KB, it still consumes 1 RCU
- Performing one query for all items under a partition key is more efficient than multiple queries for individual sort key values
DDB Scan Operation
- The Scan operation evaluates every item in a table and returns those matching specified criteria
- It is the most flexible but also the least efficient operation
- Consumes capacity for all items scanned, regardless of whether they are returned
- Example:
- Scanning a table to retrieve all items with a specific attribute or within a value range
- Even if only a subset of items is returned, the operation consumes capacity based on the total size of all scanned items
- For instance, scanning 14 KB of data consumes 4 RCUs, even if only part of that data is returned
DDB Consistency Model for Read Operations
- A database is considered consistent when read operations always return the most recent data
- Eventually consistent systems allow slight delays in propagation but are easier to scale
- Strong consistency is required for use cases where stale data is unacceptable
- DynamoDB replicates data across multiple storage nodes in different Availability Zones
- A leader node handles write operations and propagates updates to other nodes
- If the leader fails, a new leader is elected
- Two read consistency options are available:
- Strongly consistent reads
- Always read from the leader node
- Guarantees the latest data is returned
- Eventually consistent reads
- Can read from any replica node
- May return slightly outdated data
- Costs 50% less than strongly consistent reads
- Allows reading twice as much data for the same RCU allocation
- Strongly consistent reads
- Example:
- If replication is not yet complete across all nodes, an eventually consistent read may return outdated data depending on which node is accessed
- Strongly consistent reads always return the most recent data
DDB Performance Calculations (Provisioned Capacity)
- Steps to calculate required capacity:
- Identify whether operations are reads or writes
- For reads, determine if strong consistency is required
- Determine the average item size
- Calculate request frequency (operations per second)
- Compute RCUs or WCUs required per item
- Multiply by the number of operations per second to get total capacity
- Identify whether operations are reads or writes
- Example 1:
- 10 devices write one item per second, each item is 2.5 KB
- Write operation → 2.5 KB rounds up to 3 WCUs per item
- Total required capacity: 3 × 10 = 30 WCUs
- Example 2:
- 10 items read per second, each item is 2.5 KB, eventual consistency is acceptable
- Read operation → 2.5 KB rounds up to 1 RCU per item
- Total: 1 × 10 = 10 RCUs
- With eventual consistency, required capacity is reduced by half → 5 RCUs
- Auto scaling can be used to dynamically adjust RCUs and WCUs based on traffic patterns
DynamoDB Indexes (LSIs and GSIs)
DynamoDB Indexes
- Reminder: Query and Scan have limitations
- Query is the most efficient DynamoDB operation, but it can only target a single partition key (PK) value at a time.
- Optionally, you can also filter by a single sort key (SK) value or a range of SK values.
- Scan is more flexible but much less efficient because it reads the entire table.
- Query is the most efficient DynamoDB operation, but it can only target a single partition key (PK) value at a time.
- DynamoDB indexes provide alternative ways to access table data
- Help improve the speed of data retrieval.
- Types of indexes:
- Local Secondary Index (LSI) → different SK
- Choose when strong, immediate consistency is needed.
- Global Secondary Index (GSI) → different PK and SK
- Typically used as the default option.
- Local Secondary Index (LSI) → different SK
- Index design tip: When designing a base table, choose PK and SK based on the primary access pattern. Indexes are meant to provide alternative ways to query data, not replace the main access path.
- Indexes are sparse
- Indexes only include items that match their criteria.
- Example: if a table has 10 items but only 5 have attribute X, an index on X will only track those 5 items.
- Scans using indexes only process items included in the index, which makes them more efficient than scanning the whole table.
- Indexes only include items that match their criteria.
- Index projections:
ALL→ include all attributes from the base table in the indexINCLUDE→ choose specific attributes to includeKEYS_ONLY→ include only the key attributes (not the full item values)- Projection choices affect query performance.
- Projecting attributes increases capacity usage.
- Queries that request attributes not projected require extra backend retrieval, which is slower and less efficient.
Local Secondary Index (LSI)
- Provides an alternative sort key (SK) for a table
- Partition key remains the same
- Maximum of 5 LSIs per base table
- Must be created at the same time as the table; cannot be added later
- Shares the base table’s capacity (RCUs and WCUs) in provisioned mode
- Supports strong, immediate consistency
- Example: For a weather station table, create an LSI for sunny days
- Querying the base table by PK alone cannot filter by the sunny day attribute.
- Using the LSI, you can efficiently query items with PK=1 and SK=Sunny Day, returning all sunny days for station 1.
- Scanning this LSI only reads items with the sunny day attribute, reducing overall capacity usage.
Global Secondary Index (GSI)
- Provides alternative partition key (PK) and sort key (SK)
- Default limit: 20 GSIs per table (can request more from AWS support)
- Can be created at any time, offering more flexibility than LSIs
- Has its own capacity allocation in provisioned mode (RCUs and WCUs)
- Only supports eventual consistency, as data is replicated asynchronously from the base table
- If immediate consistency is required, use an LSI instead
- Example: For the weather station table, create a GSI with PK=Alarm and SK=Station ID
- Allows efficient queries for items in an alarm state
- Optionally filter by a specific station or range of stations
- Scanning this GSI only processes items in the alarm state, improving efficiency
DynamoDB Streams & Triggers
DynamoDB Streams
- DynamoDB Stream = chronological sequence of changes to items in a table
- Maintains a 24-hour rolling history
- Internally uses a Kinesis Data Stream
- Enables event-driven architectures (EDA) through DynamoDB triggers
- Enabled per table
- Tracks all types of item modifications: inserts, updates, and deletes
- Example: removing an orange attribute from an item in a table Diagram:

- Stream view types: determine what data is included in the stream
KEYS_ONLY→ captures just the PK (and optionally SK) of the changed item- Does not show the specific changes, but sufficient to query the table for the item
NEW_IMAGE→ captures the complete item after changes- Can use the item’s new state directly without querying the table
OLD_IMAGE→ captures the complete item before changes- Allows inspection of the previous state, and comparing with the new state to detect changes
NEW_AND_OLD_IMAGES→ captures both pre-change and post-change versions- Provides full visibility of exactly what changed without extra queries
- Note: if an item is created, the pre-change record is empty; if an item is deleted, the post-change record is empty
DynamoDB Triggers

- Database triggers = automated events triggered by table changes
- Each event contains data about the change, which can be used to perform actions automatically
- Traditional DBs, like Oracle, have long supported triggers
- AWS implementation: DynamoDB Streams + Lambda functions
- Fully serverless, no infrastructure management required
- Efficient architecture:
- Unlike polling, which consumes resources continuously, triggers only use compute when relevant changes occur
- Common use cases:
- Reporting & analytics – e.g., generate a report when inventory levels change
- Data aggregation – e.g., count votes in a voting application
- Messaging & notifications – e.g., send alerts when a user posts a new chat message
DynamoDB Global Tables
DynamoDB Global Tables

- Global Tables = multi-master, cross-region replicated DynamoDB tables
- There is no single master table; all tables act as replicas of the global table
- Supports write replication across all tables
- Implementation steps:
- Create DynamoDB tables in multiple regions
- Select any table and configure it to link with others
- Linked tables become replicas in the same global table configuration
- Conflict resolution:
- Uses last-writer-wins
- DynamoDB selects the most recent write and replicates it to all regions
- Provides predictable results
- Uses last-writer-wins
- Multi-master capability:
- Reads and writes can occur in any region
- Typically achieves sub-second replication across regions
- Consistency considerations:
- Strongly consistent reads are only guaranteed in the same region as the write
- Cross-region replication is asynchronous, so global applications must be able to handle eventual consistency
- Common use cases:
- Enhance global application performance
- Provide global high availability
- Enable disaster recovery and business continuity across regions
DynamoDB Accelerator (DAX)
Using a Traditional Cache with DynamoDB

DynamoDB Accelerator (DAX) – Overview

- DAX = fully managed, DynamoDB-integrated in-memory cache
- Dramatically improves read performance without requiring the application to manage the cache
- Reduces overall database operations and associated costs
- DAX SDK:
- Installed in the application, removes cache management overhead
- Application interacts with DAX as if it were DynamoDB; queries are automatically routed to DAX
- Cache hits respond in microseconds (µs)
- Cache misses are handled internally by DAX, which retrieves the data from DynamoDB and updates the cache
DAX Architecture

- Private service deployed in your VPC → DAX nodes form a DAX cluster
- Primary node: read/write, replicates updates to other nodes
- If the primary fails, a new primary is elected automatically
- Replica nodes: read-only
- Deploy nodes across multiple AZs for regional high availability
- Primary node: read/write, replicates updates to other nodes
- Applications connect to the DAX cluster through a single endpoint, which load-balances requests across nodes
- Cache hits provide responses significantly faster than querying DynamoDB directly (µs vs. ms)
- On a cache miss, DAX retrieves data from DynamoDB, updates the primary node, and replicates to replicas
DAX Features
- DAX maintains two types of caches:
- Item cache: stores results of
GetItemandBatchGetItemoperations- Must specify the item’s PK (and SK if used)
- Query cache: stores results of Query and Scan operations
- Also caches the query parameters → identical subsequent queries return cached results
- Item cache: stores results of
- Write-through caching supported: data is written to DAX and DynamoDB simultaneously
- Eventual consistency only: replication across DAX nodes is asynchronous
- Scalability: cluster can scale up (larger nodes) or out (more nodes)
- When to use DAX:
- Read-heavy or bursty workloads where the same data is frequently accessed
- Reduces read capacity units (RCUs) and improves cost efficiency
- Applications needing minimal read latency
- Reduce operational overhead from managing a custom in-memory cache
- Read-heavy or bursty workloads where the same data is frequently accessed
DynamoDB TTL
DynamoDB Time-to-Live (TTL)

- TTL = per-item timestamp that marks when an item should expire and be deleted automatically
- No write capacity is consumed, and it does not incur extra costs, so database performance is unaffected
- Expiration is handled by background system processes
- Enabled per table → specify which attribute holds the expiration timestamp
- Timestamp is expressed as the number of seconds since Epoch (1 January 1970, 00:00:00)
- To expire an item, set its timestamp attribute to the desired expiration time
- Once TTL is enabled:
- TTL processors run on each partition (per PK value)
- One process periodically scans items in the partition to check if they are expired
- If the item’s timestamp is in the past, it is marked as expired
- Marked items are still queryable until deletion
- Another process scans for expired items to remove them from the table and indexes
- A corresponding delete event is also recorded in DynamoDB Streams if configured
- One process periodically scans items in the partition to check if they are expired
- TTL processors run on each partition (per PK value)
- Optional 24-hour TTL event stream
- Records all deletions for auditing purposes
- Note: this stream is separate from standard DynamoDB Streams that track item changes
- Use cases:
- Automatically delete user or sensor data after a set period (e.g., one year)
- Retain sensitive data temporarily to meet compliance requirements
Amazon Athena 101
Amazon Athena – Core Concepts
- Serverless interactive query service
- Enables ad-hoc SQL-like queries on data stored in Amazon S3
- Athena can also access other sources using federated queries, but S3 is the primary focus
- Pay only for the data scanned during queries, plus the S3 storage costs; no additional fees
- Enables ad-hoc SQL-like queries on data stored in Amazon S3
- Schema-on-read
- The schema is applied when the query runs, transforming the raw data into a table-like structure
- Original data in S3 is never modified
- Think of it like a lens: the underlying data is unchanged, but the schema presents it in a structured format
- Different from schema-on-write databases, which require data to conform to the schema before storing
Athena Architecture

- Supports multiple source formats: JSON, XML, AVRO, log files, etc.
- Schemas (with tables) define how to interpret raw data as queryable structures
- Tables do not actually store data, they define how to project source data for queries
- Queries stream the data through the schema at runtime
- Query results can be sent to other services, e.g., Amazon QuickSight for visualization
- No upfront costs or infrastructure management; you only pay for queries
- Federated queries allow accessing non-S3 sources via Lambda-based connectors
Athena Use Cases
- Queries where data transformation or loading is unnecessary
- Ad-hoc or occasional queries on S3 data without managing servers
- Serverless querying for cost-sensitive workloads
- Analyzing AWS logs, such as VPC Flow Logs, CloudTrail, ELB logs, and cost reports
- Querying Glue Data Catalog tables or web server logs
- Not suitable if a traditional DB (SQL/NoSQL) is required; Athena is designed for querying raw data without a database
Demo: Querying OpenStreetMap’s Planet OSM with Athena
Objective: Retrieve locations of all veterinary facilities in a specific geographic region.
- Determine coordinates of the area using Google Earth
- Source data in S3 bucket
s3://osm-pds/planet/contains Planet OSM data:node– individual points with metadataway– boundaries or areasrelationship– relationships between nodes/ways
- Create an S3 bucket to store query results
Querying in Athena
- Even though databases and tables are created in Athena:
- No actual data is stored inside Athena
- Billing is only for queries executed against the schemas
- Create a database:
CREATE DATABASE A4L;
- Create a
planettable:
CREATE EXTERNAL TABLE planet (
id BIGINT,
type STRING,
tags MAP<STRING,STRING>,
lat DECIMAL(9,7),
lon DECIMAL(10,7),
nds ARRAY<STRUCT<ref: BIGINT>>,
members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
changeset BIGINT,
timestamp TIMESTAMP,
uid BIGINT,
user STRING,
version BIGINT
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';
- Test query – retrieve 100 rows:
SELECT * FROM planet LIMIT 100;
- Query all veterinary amenities in a region (e.g., Brisbane, AUS):
SELECT * FROM planet
WHERE type = 'node'
AND tags['amenity'] IN ('veterinary')
AND lat BETWEEN -27.8 AND -27.3
AND lon BETWEEN 152.2 AND 153.5;
Screenshot of Demo inside Athena:

Amazon ElastiCache
ElastiCache Overview
- In-memory caching service for applications requiring high performance
- Managed caching service supporting Valkey (Redis fork), Redis OSS, or Memcached
- Caches provide much faster access than disk-based databases (e.g., RDS) but store temporary data only
- Common use cases and architectures:
- Cache frequently-read data for read-heavy workloads with low latency requirements
- Reduces load on primary databases and lowers costs
- Relational databases struggle with high loads → performance may degrade or costs increase significantly
- Store session data to enable stateless servers for high availability (HA) and fault-tolerant (FT) systems
- Cache frequently-read data for read-heavy workloads with low latency requirements
- Best practice: define a cache invalidation strategy to ensure cached data remains current
- Requires application-level integration; the application must understand the caching logic
ElastiCache Architectures
Caching Architecture

- Application uses an in-memory cache alongside a database (e.g., Aurora)
- Read operation: app requests data from ElastiCache
- Cache hit: ElastiCache returns data quickly at low cost
- Cache miss or stale data: app queries Aurora for data
- App writes data back to ElastiCache for subsequent requests
- Subsequent reads of the same data are likely cache hits, reducing database load
- Improves scalability: allows many users without proportionally increasing database load
- Cache hits typically respond in <1ms latency
Session State Architecture

- ElastiCache stores user session information
- When a user connects to an application instance through an ALB, the instance writes and updates the session in ElastiCache
- If the session is interrupted, a new instance can retrieve the session from ElastiCache → seamless to the user
- Fault-tolerant: if the serving instance fails, the ALB reconnects the user to another instance, which loads the session from cache
ElastiCache Engines
- Supported engines:
- Valkey (Redis fork)
- Memcached
- Latest Redis OSS
- Both provisioned and serverless offerings exist
- Supports multiple programming languages and instance types/sizes
- Larger and faster memory configurations are recommended for high-performance workloads
Redis vs Memcached Comparison
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Advanced (lists, sets, sorted sets, hashes, bit arrays) | Simple (strings only) |
| Multi-AZ / HA | Supports replication across AZs → regionally resilient | No replication; sharding possible but no regional resilience |
| Backups | Supported | Not supported |
| Multi-threading | Single-threaded | Multi-threaded → can leverage multi-core CPUs for higher performance |
| Transactions | Supported → multiple operations treated atomically | Not supported |
- Pronunciation of Memcached: “mem-cash-dee” or “mem-cashed”
- Background: Redis licensing changes led to the Valkey fork. AWS now supports Valkey, Memcached, and the latest Redis OSS
Amazon Redshift 101
Redshift Overview
- Petabyte-scale data warehouse (DWH) service
- Columnar, OLAP-optimized database → built for analytics workloads
- Not a row-based OLTP database like RDS/Aurora
- Review differences: Database Refresher lecture
- Capable of ingesting large volumes of data from multiple operational sources and preparing it for analysis
- Access via SQL interface: supports JDBC/ODBC
- JDBC → platform-independent, works in Java
- ODBC → driver-dependent, language-independent
- Columnar, OLAP-optimized database → built for analytics workloads
- Private service → deployed in a single AZ within a VPC
- Not serverless → no public endpoints by default
- Single-AZ design provides high-performance networking
- Only AZ-level resilience; no built-in cross-region HA
- Security controls: VPC, IAM, KMS encryption, CloudWatch monitoring
- Faster to provision than an on-prem DWH, but still requires setup time
- For immediate, ad-hoc queries without provisioning, use Athena
- Primarily ETL-oriented, but supports:
- Redshift Spectrum: query S3 data directly without loading it into Redshift
- Federated queries: query external databases (AWS and non-AWS) directly
- Both features require a Redshift cluster but skip the data-loading step, saving time
- Additional features:
- Billing: pay-per-use
- Integrates with AWS services: QuickSight, Lake Formation
- By default, uses public routing for external service access
- Enhanced VPC Routing can enforce routing via VPC network configuration
- Custom DNS, security groups, NACLs, VPC gateways
- Enhanced VPC Routing can enforce routing via VPC network configuration
Redshift Architecture

- Redshift cluster deployed in a single subnet (AZ)
- Leader Node: coordinates client requests and compute nodes
- Handles query parsing, planning, and aggregation
- Connect via JDBC/ODBC
- Compute Nodes: execute queries and store data
- Divided into slices, each with dedicated memory and disk
- Slices operate in parallel to process workloads efficiently
- Leader Node: coordinates client requests and compute nodes
- Data ingestion sources:
- Load from AWS services: S3, DDB, RDS
- Migrate using DMS
- Stream via Kinesis Data Firehose (S3 as intermediate)
- Data replication:
- Writes replicated to additional nodes → AZ resilience
- Supports S3 backups & restores → regional or cross-region storage
Redshift Resilience and Recovery

- AZ-resilient by design
- Writes replicated to a secondary node
- Entire cluster fails if the AZ goes down
- Recovery options:
- Automatic backups to S3
- Occur ~every 8 hours or every 5 GB written
- Retention configurable: 0–35 days (default 1 day)
- Manual snapshots
- Customers manage retention; snapshots persist indefinitely if desired
- Automatic backups to S3
- Backup capacity equal to cluster size is included at no cost
- Incremental backups only store changes since the previous backup
- Backups benefit from S3 resilience and security
- Regional durability by default; global durability via S3 replication
- Snapshots can restore to any region → quick DR deployment if primary AZ/region fails