High Availability (HA) and Scaling
AWS Regional and Global Architecture
Global vs Regional Applications
- Global applications (e.g., Netflix) are composed of multiple independent regional applications, each duplicated in different AWS regions.
- This approach simplifies scaling and managing large platforms.
- Three common AWS system architectures:
- Single-region systems – all infrastructure resides in one region
- Single-region systems with disaster recovery – secondary/failover region exists
- Multi-region systems – designed to remain operational even if one or more regions fail
- AWS services are mostly regional, with a few global services. All regions together form the AWS Global Infrastructure.
Architectural Components of Cloud Systems
Global Components
- Global Service Location & Discovery
- Determines how users locate the application (e.g.,
netflix.com) - Example: DNS configuration in Amazon Route 53
- Determines how users locate the application (e.g.,
- Content Delivery & Optimization
- Ensures application content is delivered efficiently to users worldwide
- Can use globally distributed storage or rely on a central origin
- Example: Amazon CloudFront caches content closer to users
- Global Health Checks & Failover
- Monitors infrastructure health across regions
- Automatically redirects users to healthy regions during failures
- Example: Route 53 health checks
- Example: Netflix architecture with primary region in the US and secondary in Australia
- Global Architecture Example

Regional Components
- Regional Entry Point
- Defines how users access regional infrastructure
- Example: VPC endpoints, ALBs, API Gateway
- Scaling & Resilience
- Maintains performance under regional load fluctuations or failures
- Example: Auto Scaling Groups (ASGs), ALBs
- Application Services & Components
- Implements core functionality for the region
- Example: EC2 instances, S3 buckets
Regional Application Tiers
- Tiers = logical groupings of application functionality
- Regional Application Tiers

- Web Tier
- Entry point for regional applications
- Abstracts users from underlying infrastructure
- Example: ALB, API Gateway
- Compute Tier
- Provides application logic and processing
- Example: EC2, Lambda, ECS
- Storage Tier
- Supports compute services with data storage
- Example: S3, EBS, EFS
- CloudFront can use S3 as an origin for media
- Database Tier
- Stores structured data for compute tier
- Example: RDS, Aurora, DynamoDB, Redshift
- Caching Tier
- Reduces read load on databases by caching data in memory
- Example: ElastiCache, DynamoDB Accelerator (DAX)
- Application Services
- Additional AWS services enhancing the application
- Examples:
- Simple services: notifications, email (SNS, SES)
- Architecture-changing services: decoupling components (SQS, Kinesis, Step Functions)
EC2 Launch Configurations (LCs) and Launch Templates (LTs)
EC2 Launch Configuration (LC)
- Definition: A pre-defined configuration template for EC2 instances.
- Specifies AMI, instance type, storage, SSH key pair, networking & security groups, user data, IAM role, etc.
- Important: Once created, a launch configuration cannot be modified. Any changes require creating a new LC.
- Use case:
- Used exclusively by Auto Scaling Groups (ASGs) to launch instances with the specified configuration.
- Cannot launch standalone EC2 instances outside of an ASG.
EC2 Launch Template (LT)

- Definition: An advanced version of launch configurations, offering more flexibility and features.
- AWS recommends using LTs over LCs.
- Key features:
- Versioning:
- Multiple versions of a single LT can exist.
- Each version is immutable.
- Standalone instance launch:
- Can launch EC2 instances directly from the EC2 console or CLI, independent of ASGs.
- Extended EC2 feature support:
- Placement groups, capacity reservations, and unlimited mode for burstable instances (T2/T3).
- Versioning:
- Use case:
- ASGs can reference an LT or LC to know the configuration for scaling out.
EC2 Auto Scaling Groups (ASG)
EC2 Auto Scaling Group (ASG) – Key Concepts

- Definition: A logical collection of EC2 instances.
- Provides automatic horizontal scaling and self-healing capabilities.
- Commonly used with Launch Configurations (LCs) or Launch Templates (LTs) and Elastic Load Balancers (ELBs) to build elastic and resilient architectures.
- Instance provisioning:
- ASGs rely on an associated LC or LT to define how instances are launched during scaling events.
- At any time, an ASG is linked to only one LC or LT.
- This association can be updated, but only one configuration is active at a time.
- All instances launched follow the currently attached LC/LT.
- Conceptual distinction:
- ASG defines when and where instances run
- LC/LT defines what instances look like
EC2 ASG – X:Y:Z Model
- ASG capacity is defined using three values:
- Minimum (MIN) – the lowest number of instances allowed
- Desired Capacity – the target number of running instances
- Maximum (MAX) – the upper limit of instances
- Example:
1:2:4- MIN = 1, DESIRED = 2, MAX = 4
- Core behavior:
- The ASG continuously ensures the number of running instances matches the desired capacity.
- If below desired → launches new instances
- If above desired → terminates instances
- The ASG continuously ensures the number of running instances matches the desired capacity.
- Adjusting desired capacity:
- Manually: user updates desired value directly
- Automatically: scaling policies adjust capacity based on metrics (e.g., CPU usage)
- Desired capacity is always kept within the MIN–MAX range
EC2 ASG – Architecture

- VPC integration:
- ASGs operate within a VPC and can span multiple subnets and Availability Zones (AZs)
- Instances are launched across configured subnets
- By default, ASG tries to distribute instances evenly across AZs
- Health checks:
- Default: EC2 status checks
- Optional: integrate with ELB/ALB health checks
- If an instance is unhealthy, ASG replaces it automatically (self-healing)
- Simple high availability pattern:
- Use a Launch Template
- Configure multiple subnets across AZs
- Set ASG to
1:1:1 - Result:
- Instance is automatically replaced if it fails
- Can recover in another AZ if needed
- Provides a low-cost HA setup for a single instance
EC2 ASG – Scaling Processes
- ASG includes configurable processes that control scaling behavior
- These processes can be suspended (disabled) or resumed (enabled)
- Key processes:
- Launch: controls instance creation
- Terminate: controls instance removal
- AddToLoadBalancer: registers instances with an ELB
- AlarmNotification: enables response to CloudWatch alarms
- AZRebalance: maintains even distribution across AZs
- HealthCheck: performs health monitoring
- ReplaceUnhealthy: replaces failed instances
- ScheduledActions: enables time-based scaling
- Standby: allows instances to be temporarily removed from service
- Useful for maintenance without termination
EC2 ASG – Additional Features
- Scaling granularity:
- Using smaller instance types allows more precise scaling adjustments
- Integration with load balancing:
- Automatically registers/deregisters instances with ELB target groups
- Provides abstraction between users and infrastructure
- Enables dynamic and elastic scaling
- Cost model:
- ASGs themselves are free of charge
- Costs come from the underlying resources (EC2, etc.)
- Use cooldown periods to prevent excessive scaling and unnecessary costs
EC2 Auto Scaling and ASG Scaling Policies
EC2 Auto Scaling – Types
- Manual Scaling
- Adjust ASG capacity values directly without a scaling policy.
- MIN, DESIRED, MAX values are static but can be modified manually via AWS Console, CLI, or custom scripts.
- Customer controls scaling logic and execution.
- Use cases: testing ASGs, cost management, or urgent capacity changes.
- Scheduled Scaling
- Automatically updates DESIRED capacity at specified times.
- Does not require a scaling policy.
- Implemented using scheduled actions in the ASG.
- MIN and MAX values remain unchanged and must be modified separately.
- Useful for predictable traffic patterns, e.g., peak business hours or promotional events.
- Dynamic Scaling
- Automatically adjusts DESIRED capacity based on real-time metrics or CloudWatch alarms.
- Requires a scaling policy attached to the ASG.
- Only DESIRED capacity is updated automatically; MIN and MAX still need manual configuration.
- Dynamic scaling types:
- Simple Scaling – single-step scale out/in actions
- Step/Stepped Scaling – multi-step scaling based on metric thresholds
- Target Tracking – ASG maintains a metric near a defined target
ASG Scaling Policies (Dynamic Scaling)
- Purpose: Define rules that automatically adjust ASG capacity in response to metric changes or CloudWatch alarms.
- Metrics:
- Internal (EC2-based): CPU, memory, disk I/O, etc.
- Some require the CloudWatch Agent to be installed.
- External (outside EC2): e.g., SQS queue length.
- Example: scale an EC2 worker pool based on
ApproximateNumberOfMessagesVisiblein SQS to speed up processing when the queue grows.
- Example: scale an EC2 worker pool based on
- Internal (EC2-based): CPU, memory, disk I/O, etc.
- Cooling Period:
- Time interval (seconds) after a scaling action before the next action can occur.
- Prevents rapid, repetitive scaling and helps reduce costs, since EC2 instances have minimum billing increments.
EC2 ASG – Simple Scaling

- Typically uses one rule for scale out and one for scale in, guided by a CloudWatch alarm.
- Example: Average CPU utilization for the ASG:
- Scale Out: CPU > 50% → +2 instances
- Scale In: CPU < 50% → -2 instances
- Limitation: Inflexible. The number of instances added/removed is fixed regardless of how far metrics exceed thresholds.
EC2 ASG – Step/Stepped Scaling

- Builds on Simple Scaling with multiple rules and steps for different thresholds.
- Example: CPU-based rules:
- Scale Out:
- 50–59% → 0 instances
- 60–69% → +1 instance
- 70–79% → +2 instances
- 80–100% → +3 instances
- Scale In: parallel rules with proportional decrements.
- Scale Out:
- Advantages:
- Reacts more accurately to changing workloads.
- Handles variable load efficiently.
- Preferred over Simple Scaling for most real-world scenarios.
EC2 ASG – Target Tracking
- ASG attempts to maintain a metric near a defined target value automatically.
- Example: Desired average CPU = 40% → ASG adds or removes instances to stay close to 40%.
- Not all metrics are supported.
- Supported example:
ALBRequestCountPerTarget– average number of requests per target behind an Application Load Balancer for your ASG.
- Supported example:
- Ideal for maintaining steady performance without manual intervention.
Auto Scaling Group (ASG) Lifecycle Hooks
EC2 ASG – Lifecycle Hooks

- Default ASG behavior for instance state transitions:
- Launch: EC2 instance moves from
Pending→InServiceautomatically. - Termination: EC2 instance moves from
Terminating→Terminatedautomatically. - Without lifecycle hooks, ASG carries out these transitions immediately, leaving no opportunity to intervene.
- Launch: EC2 instance moves from
- Purpose of lifecycle hooks:
- Allow custom actions during instance launch or termination.
- Pause the instance at a transition state so you can perform tasks before the ASG continues.
- How lifecycle hooks work:
- Instances are temporarily paused during launch/termination.
- The ASG resumes the transition when either:
- Timeout expires – default 3600 seconds (1 hour). ASG can then either continue or abandon the action.
CompleteLifecycleActionis executed – signals that the custom action is complete.- Can be invoked via AWS CLI (
complete-lifecycle-action) or programmatically.
- Can be invoked via AWS CLI (
- Example use cases for custom actions:
- Initialize or load data before marking a new instance
InService. - Backup logs, clean up resources, or perform maintenance before terminating an instance.
- Initialize or load data before marking a new instance
- Integration with other AWS services:
- Works with Amazon EventBridge and SNS notifications for event-driven workflows triggered by instance launch or termination.
AWS Elastic Load Balancer (ELB)
Traditional Load Balancer (LB)
- A server that receives client connections and distributes them evenly across multiple backend resources.
- Clients do not interact directly with backend servers.
- Backend resources can fail or scale without affecting the client experience.
AWS Elastic Load Balancer (ELB) – Architecture

- ELB = fully managed load balancer within your VPC.
- Nodes deployed in at least 2 AZs → ensures high availability and scalability.
- Backend resources are registered in a Target Group (TG):
- Can include EC2 instances, EC2 Auto Scaling Groups, Lambda functions, etc.
- Elastic emphasizes AWS cloud’s ability to scale automatically.
- Historical note: ELB started with only EC2 instances as targets; now it supports multiple resource types.
Configuration highlights:
- IP addressing: IPv4-only or dual-stack (IPv4 + IPv6).
- DNS records resolve to ELB nodes; client connections are distributed across nodes.
- AZ selection: at least one subnet per AZ, minimum 2 AZs.
- Nodes are automatically replaced if they fail and can scale with load.
- Internet-facing vs internal ELB:
- Internet-facing → nodes have public + private IPs.
- Internal → nodes have private IPs only, used for internal tier separation.
- Listeners: define what traffic ELB accepts and how it forwards it to targets.
- ❗ Backend resources can reside in different subnets or be private, regardless of whether the ELB is public or internal.
Subnet requirements for ELB scaling:
- Needs 8+ free IPs per subnet; AWS recommends /27 or larger subnets.
- Technically, a /28 subnet can work (16 IPs minus 5 reserved = 11 usable), but /27 is safer for exam purposes.
ELB – Abstraction of Infrastructure

- Without ELB: clients connect directly to servers → tight coupling between tiers.
- Failures or scaling events disrupt client experience.
- With ELB: clients are abstracted from the backend → loose coupling.
- Servers can fail or scale, clients remain unaffected.
ELB – Cross-Zone Load Balancing

- Traditionally, each LB node only distributed traffic to targets in its own AZ.
- Could cause uneven load if AZs had different numbers of instances.
- Cross-zone balancing: each node can distribute traffic evenly across all registered targets, regardless of AZ.
- Enabled by default in Application Load Balancers (ALBs).
- Important for exams: know that cross-zone load balancing solves uneven distribution issues in multi-AZ setups.
ELB Types – Classic, Application, and Network Load Balancers
AWS Elastic Load Balancer (ELB) – Overview and Types
- v1 (Legacy) – Classic Load Balancer (CLB)
- Considered outdated; migrate to ELB v2 if still in use.
- v2 (Current) – Modern Load Balancers
- Application Load Balancer (ALB) – operates at Layer 7, supports HTTP(S) and WebSockets.
- Network Load Balancer (NLB) – operates at Layer 4, supports TCP, TLS, and UDP.
Note: A fourth type, Gateway Load Balancer (GWLB), exists for specific network appliance use cases and is handled separately.
ELBv1 – Classic Load Balancer (CLB)

- Introduced in 2009; intended for basic load balancing of HTTP(S) and other low-level protocols.
- Connects directly to a single pool of backend instances or ASGs; distributes traffic evenly.
Drawbacks:
- No SNI support → limited to one SSL certificate per CLB → scaling for multiple domains becomes expensive.
- Does not support target groups; can only manage a single backend pool.
- Single-protocol listeners only; cannot combine multiple protocols.
- Cannot perform HTTP-level routing decisions (no path-based or host-based routing, no per-rule health checks).
ELBv2 – Modern Load Balancer

- Released in 2016 with two main types:
- Application Load Balancer (ALB) – Layer 7: HTTP(S), WebSockets
- Network Load Balancer (NLB) – Layer 4: TCP, TLS, UDP
Advantages over CLB:
- Faster performance, lower cost, and supports target groups.
- ALBs allow multiple rules → 1 SSL certificate per rule → host multiple domains and apps on a single ALB.
Application Load Balancer (ALB)

- True Layer 7 load balancer; interprets HTTP(S) protocol and can make routing decisions based on content.
- Supports WebSockets over existing HTTP(S) listeners.
Listener Rules:
- Determine how requests are handled; rules processed in priority order.
- Default rule acts as a fallback.
- Can match conditions such as
host-header,http-header,path-pattern,query-string,source-ip,http-request-method. - Actions can include:
forward,redirect,fixed-response,authenticate-oidc,authenticate-cognito.
Benefits:
- Routes traffic based on HTTP content, cookies, headers, user IP, and application behavior.
- Supports Layer 7 health checks to evaluate application-level availability.
- Handles multiple SSL certificates per rule (SNI support) → consolidate multiple apps/domains on one ALB.
Limitations:
- Cannot handle non-HTTP(S) protocols (SMTP, SSH, gaming).
- Cannot use Layer 4 listeners; no support for static EIPs.
- Higher processing overhead → slightly slower than NLBs.
- HTTPS connections are terminated at the ALB → backend receives a new connection; SSL termination must be configured.
Network Load Balancer (NLB)
- Layer 4 load balancer → supports TCP, TLS, UDP, and TCP_UDP listeners.
- Extremely high throughput → handles millions of requests per second with very low latency (~25% of ALB latency).
- Can use static public IPs (EIPs) → useful for firewall whitelisting.
- Supports TCP pass-through → end-to-end SSL encryption preserved.
Limitations:
- Cannot inspect HTTP(S) content → no content-based routing, cookies, or session stickiness.
- Health checks only at network level (ICMP/TCP), not application-aware.
- No native support for multiple SSL certificates (SNI).
Use Cases:
- Applications that do not use HTTP(S) → SMTP, SSH, gaming, or custom TCP/TLS protocols.
- Services exposed privately via AWS PrivateLink.
- High-performance, low-latency applications requiring static IPs.
Choosing Between ALB and NLB
- Default choice: ALB for typical web applications requiring Layer 7 routing.
- Choose NLB when you need:
- End-to-end SSL encryption between client and backend
- Static IP (EIP) for whitelisting
- Ultra-high performance (millions of requests/sec, low latency)
- Non-HTTP(S) Layer 7 protocols (e.g., SMTP, SSH, gaming)
- PrivateLink services for secure VPC access
In all other cases, ALB provides greater flexibility with Layer 7 features.
ELB – SSL Termination
AWS ELB – Approaches for Handling SSL

- ELBs can manage HTTPS traffic using three main paradigms, each with advantages and trade-offs:
- SSL Bridging – default behavior for ALBs
- SSL Pass-through – typical for NLBs
- SSL Offload – optional configuration for ALBs
ALB – SSL Bridging
- Uses two separate SSL connections: client ↔ ALB and ALB ↔ backend.
- SSL is terminated at the ALB, then re-encrypted toward the backend.
- Default for HTTPS listeners on ALBs.
Key Points:
- ALB requires an SSL certificate and handles encryption/decryption.
- Allows Layer 7 awareness, so routing and actions based on HTTP(S) traffic are possible.
- AWS has access to the SSL certificate, which may not meet strict security policies.
- Backend instances also need SSL certificates and perform their own crypto operations.
- Ensures end-to-end encryption, increasing overall security.
- Compute overhead on instances → higher latency and cost under heavy traffic.
- Managing SSL certificates on multiple instances adds administrative overhead.
NLB – SSL Pass-through
- Establishes a single uninterrupted SSL connection from client to backend.
- NLB forwards traffic directly to targets without decrypting it.
- Uses TCP listeners; no SSL termination at the NLB.
Key Points:
- NLB does not require an SSL certificate.
- No Layer 7 awareness → cannot inspect or route based on HTTP(S) content.
- AWS does not see your SSL certificate → better for environments with strict security requirements.
- Certificates can be managed externally (e.g., CloudHSM) for additional security.
- Backend instances must handle SSL themselves.
- Ensures fully encrypted communication.
- Compute overhead and administrative effort are similar to SSL Bridging.
ALB – SSL Offload
- Uses encrypted connection between client and ALB, but plaintext from ALB to backend.
- Data is secure over the internet but unencrypted within AWS network.
- Requires an HTTPS listener, but this is not the default ALB behavior.
Key Points:
- ALB requires an SSL certificate for the client connection and handles encryption/decryption.
- Layer 7 awareness maintained → can take actions based on HTTP(S) traffic.
- AWS has access to the certificate, so may not fit high-security environments.
- Backend instances do not need SSL certificates; traffic is unencrypted.
- Reduces instance CPU usage → lower latency and cost at scale.
- Simplifies administration → no certificate management per instance.
- Least secure option if AWS network security is a concern.
SSL Handling in ELB – Overview Table
| SSL Approach | Load Balancer Type | Listener Protocol | Certificate on ELB | Certificate on EC2 Instances |
|---|---|---|---|---|
| SSL Bridging | Application Load Balancer (ALB) | HTTPS | Present | Present |
| SSL Pass-through | Network Load Balancer (NLB) | TCP | Not used | Required |
| SSL Termination (Offload) | Application Load Balancer (ALB) | HTTPS | Present | Not required |
ELB – Session Affinity (Stickiness)
ELB Session/Connection Stickiness – Overview
- By default, ELB connections are not sticky. Traffic is distributed evenly across all backend instances.
- The ELB does not remember which instance a client previously connected to.
- Works fine for stateless applications or when user sessions are stored externally (e.g., ElastiCache, DynamoDB).
- Can cause issues for server-side session storage → client may lose session if routed to a different instance.
- The ELB does not remember which instance a client previously connected to.
- Session stickiness ensures a client connects to the same backend instance for a defined period or until that instance fails.
- Enabled at the ELB Target Group (TG) level.
- Useful when applications rely on stateful server-side sessions.
- Trade-offs:
- Uneven backend load is possible → high-traffic clients may overload a single instance while others remain underutilized.
- Best practice: design applications to be stateless whenever possible.
ALB Session Stickiness

AWSALBcookie binds a client to a specific instance for a configurable time (1 second to 7 days).- How it works:
- Cookie is generated on the first client connection to the ALB.
- Subsequent requests include the
AWSALBcookie, instructing the ALB which backend instance to use.
- Client remains connected to the same instance until:
- The cookie expires, or
- The backend instance fails.
NLB Session Stickiness
- Stickiness is based on client source IP (SRC IP affinity).
- Requests from the same client IP are routed consistently to the same backend instance.
- Can set a stickiness duration; after that, the client may be routed to a different instance.
- No cookies required → NLB does not generate or use cookies.
- NLB does not inspect HTTP headers, so no
AWSNLBcookie exists.
- NLB does not inspect HTTP headers, so no
Demo – ALB Session Stickiness
- Default behavior: connecting to the ALB DNS endpoint routes you to a backend instance using round-robin.
- Enabling stickiness:
- Go to Target Groups → select your target group → Attributes → Edit.
- Enable session stickiness and configure duration.

- Behavior after enabling:
- The client connection is locked to a specific instance via the
AWSALBcookie. - If the locked instance fails or is stopped, the client is routed to a new instance and the cookie updates.
- Restarting the original instance does not revert the client back; the stickiness follows the new instance.
- The client connection is locked to a specific instance via the
ASG – ELB Integration and Health Checks
ASG-ELB Integration

- EC2 Auto Scaling Groups (ASGs) can be registered with an ELB Target Group (TG).
- Instances launched or terminated by the ASG are automatically added or removed from the TG.
- Benefits of combining ASGs with ELBs:
- ELB abstracts infrastructure → client traffic continues uninterrupted even if instances scale or fail.
- ASG provides dynamic scaling and elasticity → automatically adjusts compute capacity.
- App-aware scaling → ASG can use ELB health checks alongside EC2 instance status checks to determine instance health.
ASG Health Checks
- Purpose: ASGs use health checks to monitor instance status and automatically replace unhealthy instances.
- Types of health checks:
- EC2 instance status checks (default)
- An instance is considered healthy only if it is in the
Runningstate AND passes both 2/2 status checks. - Instances in states like
Stopping,Stopped,Shutting Down, orTerminatedor failing checks are marked unhealthy.
- An instance is considered healthy only if it is in the
- ELB health checks (optional, when ASG is attached to ELB)
- An instance is healthy only if both ELB checks and EC2 status checks pass.
- Provides network-level (L4) or application-level (L7) health monitoring (L7 only with ALB).
- Misconfiguration can cause problems:
- Example: ELB checks a simple HTML page, but the app’s backend (e.g., database) is failing → ASG may continuously replace instances unnecessarily.
- Custom health checks
- Any external system can mark instances as healthy/unhealthy.
- Allows ASGs to meet specific business requirements or integrate with monitoring tools.
- Health Check Grace Period
- Delay before ASG begins monitoring instance health after launch.
- Default = 300 seconds (5 minutes).
- Ensures instances have time to start up and initialize applications before health checks begin.
- A too-short grace period can trigger continuous provisioning and termination loops, as instances fail checks before they are fully ready.
Gateway Load Balancer (GWLB) – Traffic Management for Network Appliances
Scaling Challenges with Network Security Appliances

- Some applications require inspection-based network security to prevent sensitive data leaks or detect malicious activity.
- This often involves deploying a virtual security appliance in a dedicated subnet that inspects all traffic entering or leaving a VPC.
- Malicious or unwanted traffic can be blocked, improving overall security posture.
- Problem: traditional inline security appliances do not scale efficiently.
- App instances that scale dynamically require security appliances to scale proportionally, creating tight coupling between app and security layers.
- This approach becomes complex and inefficient, especially for multi-application environments.
AWS Gateway Load Balancer (GWLB) – Core Concepts

- GWLB enables load balancing of traffic to virtual appliances in separate VPCs.
- Provides transparent, inline security while allowing appliances to scale independently.
- Supports deployment and management of third-party virtual appliances such as firewalls, intrusion detection systems, or data inspection tools.
- Functionality:
- Works like a Layer 3/4 load balancer (similar to NLB) but encapsulates traffic to backend appliances using the GENEVE protocol.
- Main advantage: horizontal scaling of security appliances without changing app or client traffic flow.
- Two key components:
- GWLB Endpoint (GWLBE)
- Resides in a VPC and acts as the ingress/egress point for traffic.
- Can be used as the next hop in route tables, integrating with VPC traffic flows.
- Conceptually similar to standard VPC endpoints but with enhanced capabilities.
- GWLB (Load Balancer itself)
- Distributes traffic across multiple backend virtual appliances (EC2 instances running security software).
- GWLB Endpoint (GWLBE)
- Configuration: GWLBs are managed at the VPC level, not within the EC2 console, even though they are technically a type of ELB.
GENEVE Protocol in GWLB
- GENEVE encapsulates traffic for inspection within the security VPC.
- Ensures original source and destination IPs remain intact, so traffic can return correctly to the originating VPC.
- Traffic is tunneled to security appliances with temporary addresses, and encapsulation is removed when returning to the customer VPC.
- Flow stickiness:
- Each flow is consistently sent to the same appliance, allowing stateful inspection.
- Traffic inspection is transparent to applications and clients, maintaining seamless connectivity.
Example GWLB Architecture and Traffic Flow

- Traffic enters the IGW, destined for an ALB with a public IP in subnet
10.16.9.0/20. - IGW updates the destination IP to the ALB’s private IP and forwards traffic to GWLBE2.
- GWLBE2 sends packets to the GWLB in the
securityVPC. - GWLB encapsulates traffic using GENEVE, preserving original source/destination IPs, and forwards to a selected security appliance.
- Security appliance inspects traffic and either blocks or returns it.
- Packets are returned to GWLB with GENEVE encapsulation removed.
- Traffic passes back through GWLBE2 to the application VPC (
catagramVPC). - Local routing directs traffic from GWLBE2 to the ALB.
- ALB distributes traffic to the appropriate application instance.
- The return path follows the same logic, maintaining traffic integrity and inspection throughout.
Serverless and Application Services
Architecture Deep Dive Concepts
CatTube (Example App)
- Reference application: CatTube, a video-sharing platform focused on cat content, where users upload and stream videos
- Core components:
- Users upload videos
- Uploaded content is processed into multiple formats and resolutions
- This is the most compute-intensive part of the system
- The platform serves videos and handles playlists, channels, and user interactions
- Requires reading from and writing to databases
- This same application is used to illustrate different architectural patterns
- Notes marked with relate specifically to the CatTube example
Monolithic Architecture

- A single application unit (one server or system) that contains all components
- Even if logically separated in code, everything runs together on the same machine
- Represents a traditional design approach
- Advantages:
- Simple to build and understand
- Disadvantages:
- Tight coupling between components
- Failures in one part can impact the entire system
- If the upload feature breaks, processing and storage may also be affected
- Scaling is uniform
- Typically requires vertical scaling
- Cost inefficiency
- All parts run continuously, even when not needed
- Processing resources remain active even when no uploads occur
- Tight coupling between components
- Monolithic systems are generally inefficient at scale and can lead to higher operational costs
Tiered Architecture

- Application is divided into separate layers (tiers)
- Each component runs in its own tier (same or different servers)
- Example: UPLOAD, PROCESSING, and STORAGE are isolated tiers
- Advantages:
- Independent scaling per tier (vertical or horizontal)
- Adding internal load balancers can:
- Improve availability
- Enable horizontal scaling
- Reduce direct dependency on infrastructure
- Processing tier can scale without affecting upload or storage tiers
- Disadvantages:
- Still tightly coupled due to synchronous communication
- Each tier depends on immediate responses from others
- Always-on requirement
- Each tier must remain active for the system to function
- Upload depends on processing being available
- No scale-to-zero capability
- Processing must run even when idle
- Failure propagation
- If processing fails or slows, uploads are impacted
- Still tightly coupled due to synchronous communication
Asynchronous Queues Architecture
- Introduces queues to enable asynchronous communication
- Messages are sent to and retrieved from a queue
- Ordering may follow FIFO or other models
- Key idea:
- Decouple tiers using queues instead of direct communication
- Advantages:
- Loose coupling between components
- Services interact with queues instead of each other
- Independent scaling
- Components can scale from zero to very high capacity
- Loose coupling between components
- CatTube workflow:
- UPLOAD
- Stores video in S3
- Sends a message to a queue
- Upload process completes immediately without waiting
- PROCESSING
- Worker instances pull tasks from the queue
- ASG can scale dynamically (including MIN=0)
- Instances launch only when jobs exist and terminate when done
- Reduces idle cost
- Same pattern applies between processing and storage layers
- UPLOAD


Microservices Architecture

- System is split into many small, independent services
- Each service focuses on a specific function
- A microservice is:
- A self-contained application that performs a single responsibility well
- Types:
- Producer → generates data/events (e.g. upload service)
- Consumer → processes data/events (e.g. processing service)
- Hybrid → does both (e.g. storage/management service)
- Communication:
- Often uses queues or events
- Large systems may require many queues, increasing complexity
- Event-driven approaches help simplify this
Event-Driven Architecture (EDA)

- Built around event producers and consumers
- Producer: generates events when something happens (e.g. upload, click)
- Consumer: reacts to those events (e.g. process video, log activity)
- Some components act as both
- Key characteristics:
- No constant running or polling
- Resources are only used when handling events
- System automatically scales up and down based on activity
- Benefits:
- Efficient resource usage
- High scalability and responsiveness
- Event Router:
- Central system that distributes events to consumers
- Contains an event bus for continuous event flow
- Ensures proper routing of events
- Example: Amazon EventBridge
- Many serverless systems follow this model, where compute runs only when triggered by events
AWS Lambda Basics
AWS Lambda – Key Concepts
- Function-as-a-Service (FaaS) model designed for short-lived, single-purpose code execution
- A Lambda function is the code unit executed by the service
- It represents the main configuration entity in AWS Lambda
- Informally, “a Lambda” usually refers to a Lambda function
- A Lambda function is the code unit executed by the service
- Before running, a function must define a Runtime Environment (RTE) (e.g. Python 3.8)
- Memory is explicitly configured, while vCPU allocation scales indirectly based on that memory
- When invoked, the function is loaded into the selected runtime and executed
- Billing is based only on execution time and resources used
- Core component of serverless and event-driven architectures
- Typically low cost, with a free tier covering initial usage and low per-invocation pricing beyond that
AWS Lambda – Architecture

- A Lambda function consists of code, configuration, and supporting components
- Requires:
- A defined programming language/runtime
- A deployment package (downloaded and executed at runtime)
- Configured resource settings
- Although often referred to as just code, it includes more—similar to how an AMI includes more than just a VM image
- Requires:
- Supports multiple runtimes such as Python, Node.js, Java, and others
- Lambda Layers allow extending functionality or even enabling custom runtimes
- Selecting a runtime determines the available libraries and environment setup
- Each invocation typically results in a new runtime instance being created
- Code is loaded, executed, and then the environment is terminated
- Future executions usually start fresh, though reuse can occur in some cases
- Stateless execution model
- No guaranteed persistence between runs
- Code must function correctly without relying on previous executions
- Docker note:
- Traditional container usage is not the same as Lambda execution
- While Lambda supports container images, they are specifically built for Lambda’s environment
- Standard container-based compute (e.g. ECS) should not be confused with Lambda
Resource Configuration
- Memory: 128 MB to 10,240 MB (configurable in 1 MB increments)
- vCPU: Scales proportionally with memory (approximately 1 vCPU per 1769 MB)
- Temporary storage:
- Default 512 MB (expandable to 10,240 MB)
- Mounted at
/tmpand should be treated as ephemeral
- Maximum execution time: 900 seconds (15 minutes)
- Longer workflows require orchestration tools such as Step Functions
- Execution role:
- IAM role assumed by the function
- Controls permissions and access to AWS services
AWS Lambda – Common Use Cases
- Serverless application backends (e.g. API Gateway + Lambda)
- File processing pipelines
- Example: processing or transforming files uploaded to S3
- Database-triggered processing
- Reacting to changes in DynamoDB via streams
- Scheduled automation (serverless cron jobs)
- Triggered by EventBridge or CloudWatch Events
- Real-time stream processing
- Handling incoming data from services like Kinesis
Demo: Creating and Running a Lambda Function
- Deploy a CloudFormation stack to provision required resources (e.g. EC2 instances)
- Create an execution role with permissions (e.g. logging and EC2 control actions)
- Navigate to Lambda and create a new function
- Assign a name and choose a runtime (e.g. Python 3.9)
- Attach the execution role
- Add function code
- Example: script to stop EC2 instances using environment variables


5. Configure environment variables
- Example:
EC2_INSTANCEScontaining instance IDs
6. Run a test invocation
- Verify output in logs and confirm changes in EC2 console
7. Optionally create another function (e.g. to start instances) and test similarly
8. Perform cleanup by deleting Lambda functions and the CloudFormation stack

9. Clean-up: remove the Lambda functions that were created, and then delete the CloudFormation stack to terminate all associated resources.
AWS Lambda Networking
Public Lambda (Default)

- Lambda executes within the AWS public network environment
- Can communicate with publicly accessible AWS services (e.g. SQS, DynamoDB) and the internet
- This is the default configuration and is suitable for most use cases
- A private setup is only needed when specific VPC access is required
- No need to configure a customer VPC
- Advantage: delivers optimal performance since it runs on shared AWS-managed infrastructure
- Limitation: cannot reach private VPC resources unless those resources are exposed externally
- External access requires public IPs and appropriate security configurations
Private Lambda

- Lambda can be configured to run inside a VPC by attaching network interfaces (ENIs)
- Typically placed in private subnets
- Functions operating in a VPC must follow standard VPC networking rules
- Can access internal resources if security groups and NACLs allow it
- Cannot access external services unless additional configuration is in place
- Internet access requires a NAT Gateway and Internet Gateway
- Private access to AWS services can be enabled using VPC Endpoints
- The execution role must include permissions for EC2 networking operations
Private Lambda ENI Injection – Old vs New Approach
Old approach:

- Each function invocation created and attached an ENI inside the VPC
- Drawbacks:
- Increased latency due to ENI creation during execution
- Poor scalability, especially with high concurrency
- Large number of ENIs could impact VPC performance
New approach:

- AWS precomputes combinations of subnets and security groups used by functions
- Creates shared ENIs per unique subnet + security group combination
- Multiple function executions can reuse the same ENIs
- Benefits:
- Much better scalability for concurrent executions
- Eliminates per-invocation ENI creation delays
- There is an initial setup delay (around 90 seconds) when configuring networking
- This happens only once during setup or configuration changes, not during execution
AWS Lambda Security, Monitoring, and Versioning
AWS Lambda – Security

- Lambda execution role
- An IAM role assumed by the Lambda function during execution
- The trust policy allows Lambda to assume the role
- The permissions policy defines what the function is allowed or denied to do
- Example: read data from DynamoDB and write it to S3
- Lambda resource policy
- A resource-based policy attached to the Lambda function
- Determines which principals are allowed to invoke the function
- Can grant access to AWS services (e.g. S3, SNS) or external AWS accounts
AWS Lambda – Monitoring
- Execution logs are stored in CloudWatch Logs
- The Lambda execution role must include permissions to write logs
- If logs are missing, it usually means permissions were not configured correctly
- The Lambda execution role must include permissions to write logs
- Metrics are automatically collected in CloudWatch
- Includes invocation count, errors, retries, and execution duration
- No additional setup required for basic metrics
- Supports AWS X-Ray integration
- Enables distributed tracing across components
- Useful for tracking request flows in serverless applications
AWS Lambda – Versioning and Aliases
- Lambda supports function versioning (e.g. v1, v2, v3)
- Each version includes both code and configuration
- Versions are immutable
- Once published, they cannot be modified
- Each version has a unique ARN
- Aliases act as pointers to specific versions
- Common examples:
DEV,STAGE,PROD - Can be updated to reference different versions over time
- Common examples:
$LATESTis a built-in alias- Always points to the most recently updated version of the function
AWS Lambda Invocation Methods
Lambda – Synchronous Invocation

- Function called directly via CLI or API, sending input and waiting for the response
- Function executes and returns a result or an error
- Client waits for the outcome
- Success or failure is reported in the same request
- Error handling and retries are managed by the caller
- Commonly used when humans invoke functions via API Gateway endpoints in serverless applications
Lambda – Asynchronous Invocation

- Function triggered by an event, no immediate response tracked
- Typical for AWS services invoking Lambda (e.g., S3 event notifications)
- Lambda manages its own errors
- Failed executions can be retried automatically (0–2 times, configurable)
- Functions must be idempotent to handle retries safely
- Idempotent means running multiple times produces the same end result
- Example: updating a bank account balance by explicitly setting it (idempotent) vs. incrementing it blindly (not idempotent)
- Event destinations can be configured for success or failure
- Failed events may go to Dead-Letter Queues (DLQs) such as SQS or SNS for later investigation
Lambda – Event Source Mapping

- Lambda polls a source (queue or stream) for data and processes it in batches
- Used for sources that don’t automatically generate events (e.g., SQS, Kinesis, DynamoDB streams, MSK)
- Batches of events are sent to Lambda
- Batch size should fit within Lambda’s 15-minute timeout to ensure complete processing
- Execution role permissions required
- Lambda must have read access to the source for polling-based invocations
- Asynchronous event-based invocations typically include all needed information in the event
- Error handling options
- Failed batches can be sent to SQS or SNS for monitoring, retries, or diagnostics
AWS Lambda Execution Environment
AWS Lambda – Cold vs Warm Starts

- Lambda runs inside an execution context (RTE)
- Think of it as a lightweight container with allocated resources for the Lambda code
- Functions must generally be stateless, even if context reuse is possible
- Always assume a new execution context may be created on each invocation
- Cold start = creating a new execution context
- Provision hardware and environment
- Download and initialize runtimes (libraries, interpreters, packages)
- Load deployment package (your code and dependencies)
- Impact: cold starts take extra time (hundreds of milliseconds or more), especially noticeable in synchronous requests
- Warm start = reusing an existing execution context
- Context from a previous invocation may be reused if function is called again shortly after
- No need to reinitialize environment or load packages → faster execution
- Important: never rely on context reuse
- Idle contexts are removed automatically
- Concurrent executions create multiple contexts → parallel executions often result in multiple cold starts
AWS Lambda – Reducing Cold Start Latency
- Provisioned Concurrency
- Pre-creates a fixed number of execution contexts ready to serve requests → reduces cold start delays
- Ideal for:
- Predictable high traffic periods
- Prewarming environments before a production release
- Use
/tmpfor caching- Temporarily store frequently used files or data (e.g., media assets) between invocations
- Functions must still handle clean environments in case
/tmpis empty
- Define reusable components outside the handler
- Code outside the function handler persists in the execution context between invocations
- Example: database connections, heavy library objects
- Ensure fallback logic exists if the context is cold
- Summary: Optimize performance by reusing components when possible, but design Lambda to work correctly on a completely fresh execution context every time.
Amazon EventBridge – Serverless Event Bus Service
Amazon EventBridge – Overview and Architecture

- Purpose: Centralized event management for your AWS account
- Tracks changes in AWS services (e.g., EC2 instance terminated)
- Provides:
- Event visibility via a near-real-time stream (Event Bus)
- Delivery of events to configured targets
- Enables event-driven architectures (EDAs)
- Event Bus: a stream that collects events
- Default Event Bus: automatically available in every AWS account; contains all system events
- Custom event buses can be created to handle events from external sources or other accounts
- Event Routing Pattern: “If X happens, or at Y time(s), send info to Z”
X= event generated by an AWS service (event producer)Y= scheduled times (via cron expressions or EventBridge Scheduler)Z= event target/consumer (e.g., Lambda, SQS, SNS)
- Rules: match events or schedules
- Event Pattern Rule → triggered by specific events
- Schedule Rule → triggered at defined times
- Matched events are sent to one or more targets
- Event Format: JSON
- Contains relevant details for targets to consume (e.g., instance ID, new state, timestamp)
CloudWatch Events (CWEvents)
- EventBridge supersedes CWEvents
- CWEvents was the original service for event handling but now redirects to EventBridge
- Underlying architecture/APIs are the same, but EventBridge adds more capabilities
- CWEvents could only monitor the default event bus
- EventBridge supports custom event buses for third-party or application-generated events
DEMO: Building a Simple Event-Driven Architecture (EDA)
Protect an EC2 Instance if It Gets Stopped
- Create a Lambda function that automatically restarts any EC2 instance that enters the
stoppedstate:

- Create an EventBridge rule to monitor EC2 state changes.
- Event to track:
EC2 Instance State-change Notification. - Generate a JSON sample of the event to see what information is delivered.

5. Fill out the event pattern to match instances entering the Stopped state.
- Optionally, filter by specific instance IDs if you only want certain instances protected.

6. Assign the Lambda function as the target for this EventBridge rule.

7. Test the setup by stopping an instance. The Lambda should automatically restart it after a short period.
8. Check logs in CloudWatch Logs for function execution details:
- Each Lambda function creates a log group
- Each execution creates a log stream

Stop All EC2 Instances at a Specific Time Every Day
- Create a schedule rule in EventBridge
- Use the EventBridge Scheduler for a modern UI and flexible scheduling options.
- Unlike the old method, you can define schedules outside of event buses.
- Traditional “Create Rule” only allowed Unix CRON format and required UTC time.
- Specify the schedule using a Unix CRON expression
- Check the time zone carefully:
- CRON expressions default to UTC
- EventBridge Scheduler UI may display times in your local time zone
- Verify next trigger times to ensure the schedule is correct
- Check the time zone carefully:

- Assign the Lambda function that stops EC2 instances as the target for this scheduled rule.

- Wait for the scheduled time
- The Lambda function will automatically stop the instances at the specified time.
- If the EC2 protection Lambda is still active, any protected instances will automatically restart after being stopped.
Serverless Architecture Overview
Serverless – Core Idea
- Serverless = minimal/no server management
- Concept more than technology; mainly a software architecture.
- You don’t manage the underlying servers, but they exist behind the scenes.
- Benefits: lower cost, reduced administrative overhead, less operational risk.
Key Characteristics
- Small, specialized functions
- Each function does a single task well.
- Functions start, execute, and stop quickly.
- Billing is per execution.
- Stateless & ephemeral
- Functions can run anywhere, independently.
- Event-driven
- Functions execute only when triggered.
- Consumption-based model: low idle costs.
- Managed services first
- Use services like S3, DynamoDB, Cognito instead of self-hosting.
- Code only what’s necessary.
- FaaS (Function-as-a-Service)
- Cheap, scalable compute for general tasks.
- AWS Lambda = main compute engine; avoids self-managed EC2 whenever possible.
Example: CatTube Serverless Architecture

- Frontend: Static website on S3 + client-side JS.
- Authentication:
- Third-party IDP (e.g., Google) → returns ID token.
- Cognito swaps ID token for temporary AWS credentials.
Video Upload Workflow
- Upload video → S3
Originalsbucket. - New object triggers Lambda function via S3 event.
- Lambda creates Elastic Transcoder jobs → outputs in
Transcodebucket. - Video metadata added to DynamoDB.
Media Access Workflow
- Client requests media → triggers Lambda function.
- Lambda loads metadata from DynamoDB + media from
Transcodebucket. - Lambda returns URLs for client to access media.
Key Points:
- Entire workflow is serverless.
- No EC2 or managed database servers.
- Event-driven + managed services = scalable, cost-efficient, maintainable.
Amazon SNS (Simple Notification Service) – Pub/Sub Messaging Service
Amazon SNS – Core Idea
- Serverless PUB-SUB messaging service
- Coordinates sending and delivery of messages to multiple destinations.
- Payloads ≤ 256 KB (good to remember: SNS is not for large files).
- Publicly accessible → connects over the internet.
- Heavily used for notifications across AWS (e.g., CloudWatch, CloudFormation).
SNS Architecture

- Regionally resilient → highly available, durable, and scalable.
- Secure → supports server-side encryption (SSE).
Key Entities
- SNS Topic – main entity
- Holds configuration and permissions.
- Enables 1-to-many communication:
- Publisher → sends messages to topic.
- Subscribers → receive messages.
- Supported subscriber types: HTTP(S), email, SQS, Lambda, SMS, mobile push.
- Entities can act as both publisher and subscriber.
- Topic Policy – resource policy defining who can read/write and supports cross-account access.
- Message Filters – allow subscribers to receive only relevant messages.
- Delivery Status & Retries – confirm message delivery and retry until success.
- Single SNS topic → multiple SQS queues (or other subscribers).
- Enables parallel processing of different workloads, e.g., handling different video sizes/bitrates in CatTube.
- Highly exam-relevant concept: fanout = topic → many subscribers.
Amazon API Gateway (APIGW) Basics
Application Programming Interface (API)
- Mechanism that enables communication between applications
- Example: issuing an HTTP GET request to
https://<URL>/cats/images/<your-cat-img-id>to retrieve an image stored on a remote system
- Example: issuing an HTTP GET request to
- OpenAPI Specification
- A widely adopted standard for defining APIs
- Simplifies API import and export processes
- Swagger UI: commonly used web interface for API visualization and testing
- Other API tools: Postman, Insomnia, Bruno
- Example: Swagger UI displaying an API interface

Amazon API Gateway (APIGW) – Key Concepts
- AWS-managed service used to build, publish, and manage APIs
- Serves as the entry point for client applications
- Controls API endpoints, resources, and HTTP methods
- Operates between client applications and backend integrations
- Integrations refer to backend services providing functionality
- Supports multiple API types:
- HTTP APIs
- REST APIs
- WebSocket APIs
- Serves as the entry point for client applications
- Service Characteristics
- Fully serverless
- Public-facing service that can expose AWS or on-premises systems
- Designed for high availability and scalability at the regional level
Features and Capabilities
- Security and Traffic Management
- Handles authorization
- Implements throttling to control request rates
- Provides caching mechanisms
- API Management
- Supports OpenAPI definitions
- Enables request and response transformations
- Service Integrations
- Direct integration with AWS services (e.g., DynamoDB, SNS)
- In many cases, removes the need for dedicated backend compute
- Cross-Origin Resource Sharing (CORS)
- Manages browser-based cross-domain access
- Allows web applications from one domain to call APIs hosted in another domain
- Example: JavaScript hosted in an S3 bucket invoking an API Gateway endpoint
- Migration Support
- Can act as a frontend layer while backend systems are being migrated or redesigned
Amazon API Gateway – Architecture

- Request Flow
- Authorization → validation → request transformation
- Backend integrations process the request
- Response transformation → preparation → return to client
Authentication Methods

- No Authentication
- Publicly accessible APIs
- Amazon Cognito User Pools
- Users authenticate via Cognito; API Gateway validates the token
- Lambda Authorizer
- Custom authorization using a Lambda function to validate tokens
- IAM-Based Authentication
- Uses AWS credentials provided in request headers
- Considered an advanced approach
Endpoint Types
- Edge-Optimized
- Requests are routed through the nearest CloudFront edge location
- Regional
- Designed for clients within the same AWS region
- Does not utilize CloudFront by default
- Private
- Accessible only within a VPC using interface endpoints
Stages

- An API configuration is deployed to a stage
- Each stage has:
- A unique endpoint URL
- Independent configuration settings
- Enables versioning (e.g., development, testing, production)
- Each stage has:
- Supports:
- Rollbacks for safe deployment management
- Isolation between environments
- Canary Deployments
- Routes a portion of traffic to a new version
- Allows gradual rollout and validation
- Can be promoted to the primary version after testing
Caching

- Improves performance by reducing calls to backend services
- Backend is only invoked on cache misses
- Key points:
- Configured at the stage level
- TTL set to 0 disables caching
- Cache data can be encrypted
Common HTTP Errors
- Important for troubleshooting and exam preparation
4XX – Client Errors (Invalid Requests)
- 400 – Bad Request
- Generic client-side error with multiple possible causes
- 403 – Forbidden
- Access denied by authorizer or blocked by a Web Application Firewall (WAF)
- 429 – Too Many Requests
- Throttling limit exceeded; client must retry later
5XX – Server Errors (Backend Issues)
- 502 – Bad Gateway
- Invalid response returned from backend service (e.g., Lambda)
- 503 – Service Unavailable
- Backend service is unavailable or down
- 504 – Gateway Timeout
- API Gateway timeout limit is 29 seconds
- Requests must complete within this limit, regardless of backend timeout settings
AWS Step Functions Basics
AWS Lambda – Limitations
- Lambda is a Function-as-a-Service (FaaS) offering with a maximum execution time of 15 minutes
- Not suitable for running full, long-lived applications within a single function
- Chaining multiple functions can simulate stateful behavior
- However, this approach is generally discouraged
- Does not scale efficiently
- Increases architectural complexity
- Lambda execution environments are stateless
- Execution context and data are not preserved between runs
- However, this approach is generally discouraged
- These constraints are intentionally designed
- Reinforces the idea that Lambda is optimized for short-lived, event-driven workloads
- Understanding service limitations is critical for proper architectural decisions
AWS Step Functions – Key Concepts (State Machines)
- Service used to orchestrate long-running, serverless workflows
- Workflows are defined using State Machines
- Helps overcome Lambda limitations such as execution time constraints
- Example use case:
- Retail order processing systems (e.g., large-scale e-commerce platforms)
- Processes may span hours or days
- Require coordination across multiple steps and services
- Retail order processing systems (e.g., large-scale e-commerce platforms)
- Workflow Types:
- Standard Workflows
- Maximum duration of up to 1 year
- Suitable for long-running and durable processes
- Express Workflows
- Maximum duration of up to 5 minutes
- Optimized for high-volume, short-duration workloads
- Easier retry handling for failed executions
- Common use cases include data processing and event-driven pipelines
- Standard Workflows
AWS Step Functions – State Machines (SM)
- A State Machine is the core construct of Step Functions
- Represents a serverless workflow: START → states → END
- Each state processes input, applies logic, and produces output
- Key Characteristics:
- Coordinates multiple steps and services within a workflow
- Maintains data flow between states
- Designed for complex, multi-service architectures
- Uses IAM roles to interact securely with other AWS services
- Invocation:
- Can be triggered by services such as API Gateway, IoT, EventBridge, Lambda, or manually
- Typically used for backend orchestration
- Amazon States Language (ASL)
- JSON-based language used to define state machines
- Enables creation, modification, and export of workflows
Common State Types in Step Functions
- Flow Control States
SUCCEEDandFAIL→ define workflow termination outcomesWAIT→ pauses execution until a time or duration is reachedCHOICE→ enables conditional branching based on inputPARALLEL→ executes multiple branches simultaneouslyMAP→ processes a list of items by iterating over each element
- Task State
- Represents a unit of work within the workflow
- Delegates execution to external services such as:
- Lambda, ECS, DynamoDB, SNS, SQS, AWS Batch, Glue, SageMaker, EMR, or other Step Functions workflows
- The state machine itself does not execute tasks directly; it coordinates execution across services
AWS Step Functions – Example Architecture

- Example: Pet Cuddle-o-Tron application
- Demonstrates a workflow where timed actions and notifications are triggered at different intervals
- Highlights orchestration across multiple services and time-based events
LAB: Building the Serverless Pet Cuddle-O-Tron
Pet Cuddle-O-Tron – Overview
End-State Architecture (Simplified – After Stage 5)
- Demonstrates a complete serverless workflow integrating frontend, API layer, orchestration, and messaging services

End-State Architecture (Extended – After Stage 7)
- Expands functionality to support multiple notification channels (email and SMS)

Stage 1: Configure Amazon Simple Email Service (SES)

- Amazon SES is used to send emails within the application
- Initially operates in sandbox mode
- Emails can only be sent to verified identities to prevent misuse
- Initially operates in sandbox mode
- Configuration steps:
- Create and verify two SES identities:
- Sender email address
- Receiver email address
- These identities must be explicitly authorized before use
- Create and verify two SES identities:
Stage 2: Configure Email Lambda Function

- A Lambda function is responsible for sending emails via SES
- Key setup steps:
- Create a Lambda execution role
- Permissions required:
- SES (send emails)
- SNS and Step Functions (if extended)
- CloudWatch Logs (for logging)
- Permissions required:
- Configure the Lambda function:
- Runtime: Python 3.9
- Assign the execution role
- Logic:
- Accept input parameters (email, message)
- Call SES to send email notifications
- Create a Lambda execution role

Stage 3: Configure Step Functions State Machine

- The State Machine acts as the orchestration layer of the application
- Controls workflow execution and service interactions
- Setup process:
- Create a State Machine IAM role
- Permissions:
- Invoke Lambda functions
- Publish to SNS
- Write logs to CloudWatch
- Permissions:
- Define the State Machine:
- Type: Standard
- Logging: Enabled (ALL)
- Workflow logic:
WAITstate delays execution based on inputTASKstate invokes the email Lambda functionPASSstate completes execution
- Create a State Machine IAM role
- The State Machine manages the sequence and data flow between components
Stage 4: Configure Backend API (API Gateway + Lambda)

- The backend exposes functionality through an API
- Architecture:
- API Gateway acts as the entry point
- A Lambda function processes requests and triggers the State Machine
- Implementation steps:
- Create API Lambda function
- Accepts input from API Gateway
- Validates required parameters
- Starts State Machine execution
- Create API in API Gateway:
- Type: REST API
- Endpoint type: Regional
- Configure API components:
- Resource:
/petcuddleotron - Method:
POST - Integration: Lambda function
- Enable Lambda Proxy Integration
- Resource:
- Enable CORS
- Allows browser-based clients hosted on different domains to call the API
- Deploy API:
- Stage name:
Prod - Capture the invoke URL for frontend integration
- Stage name:
- Create API Lambda function
Stage 5: Configure Frontend (S3 Static Website)

- The frontend is hosted using Amazon S3 static website hosting
- Setup steps:
- Create S3 bucket:
- Must have a globally unique name
- Public access enabled for static content
- Create S3 bucket:

- Configure bucket:
- Apply bucket policy to allow public read access
- Enable static website hosting
- Define
index.htmlas entry point

- Upload frontend assets:
- HTML, CSS, JavaScript, and image files
body {
padding-top: 40px;
padding-bottom: 40px;
background-color: #eee;
}
hr {
border-top: solid black;
}
div #error-message {
color: red;
font-size: 15px;
font-weight: bold;
}
div #success-message, #results-message {
color: green;
font-size: 15px;
font-weight: bold;
}
.form-signin {
max-width:480px;
padding: 15px;
margin: 0 auto;
}
.form-signin .form-signin-heading,
.form-signin .checkbox {
margin-bottom: 10px;
}
.form-signin .checkbox {
font-weight: normal;
}
.form-signin .form-control {
position: relative;
height: auto;
-webkit-box-sizing: border-box;
box-sizing: border-box;
padding: 10px;
font-size: 16px;
}
.form-signin .form-control:focus {
z-index: 2;
}
.form-signin input[type=”Artist”] {
margin-bottom: -1px;
border-bottom-right-radius: 0;
border-bottom-left-radius: 0;
}
.form-signin input[type=”bottom”] {
margin-bottom: 10px;
border-top-left-radius: 0;
border-top-right-radius: 0;
}
- Frontend behavior:
- Collects user input (wait time, message, email)
- Sends request to API Gateway using JavaScript
var API_ENDPOINT = ‘REPLACEME_API_GATEWAY_INVOKE_URL’;
// if correct it should be similar to https://somethingsomething.execute-api.us-east-1.amazonaws.com/prod/petcuddleotron
var errorDiv = document.getElementById(‘error-message’)
var successDiv = document.getElementById(‘success-message’)
var resultsDiv = document.getElementById(‘results-message’)
// function output returns input button contents
function waitSecondsValue() { return document.getElementById(‘waitSeconds’).value }
function messageValue() { return document.getElementById(‘message’).value }
function emailValue() { return document.getElementById(’email’).value }
function clearNotifications() {
errorDiv.textContent = ”;
resultsDiv.textContent = ”;
successDiv.textContent = ”;
}
// When buttons are clicked, this is run passing values to API Gateway call
document.getElementById(’emailButton’).addEventListener(‘click’, function(e) { sendData(e, ’email’); });
function sendData (e, pref) {
e.preventDefault()
clearNotifications()
fetch(API_ENDPOINT, {
headers:{
“Content-type”: “application/json”
},
method: ‘POST’,
body: JSON.stringify({
waitSeconds: waitSecondsValue(),
message: messageValue(),
email: emailValue()
}),
mode: ‘cors’
})
.then((resp) => resp.json())
.then(function(data) {
console.log(data)
successDiv.textContent = ‘Submitted. But check the result below!’;
resultsDiv.textContent = JSON.stringify(data);
})
.catch(function(err) {
errorDiv.textContent = ‘Oops! Error Error:\n’ + err.toString();
console.log(err)
});
};
Stage 6: Test the Application
- Access the S3 website endpoint to load the frontend interface
- Test workflow:
- Input parameters:
- Wait time
- Message
- Email address
- Submit request:
- API Gateway receives request
- Lambda triggers State Machine
- State Machine invokes email Lambda
- Input parameters:
- Monitoring:
- Execution flow can be tracked via Step Functions logging
- Result:
- Email notification is delivered to the specified recipient
- Outcome:
- Fully functional serverless architecture with no infrastructure provisioning required
- Update frontend:
- Add phone number input
- Modify JavaScript to include phone parameter
- Allow optional email and phone inputs
- Update API Lambda:
- Ensure at least one contact method (email or phone) is provided
- Update State Machine:
- Introduce
CHOICEstate:- Email only
- SMS only
- Both email and SMS
- Use
PARALLELstate when both are required
- Introduce
- Result:
- Application supports multi-channel notifications
Stage 7 (Extended): Enable SMS-Based Notifications

- The initial Pet-Cuddle-o-Tron design supported notifying users via both email and SMS, but it was simplified to email-only. This stage restores the original dual-channel notification capability by adding SMS support through additional SNS setup and minor system enhancements.
- Important: SMS delivery via SNS is not covered by the AWS Free Tier, though costs remain low for minimal usage.
1. Set Up Simple Notification Service (SNS) for SMS
- Navigate to SNS → Mobile → Text messaging (SMS) → Sandbox destination phone numbers
- SNS begins in sandbox mode, similar to SES
- Add and verify a recipient phone number (AWS sends a verification code via SMS)
- For simplicity:
- Do not attach the number to SNS topics
- Do not configure a dedicated origination number
2. Create a Lambda Function for SMS Notifications
- Function name:
sms_reminder_lambda - Runtime:
Python 3.9 - Execution role: reuse the same role as
email_reminder_lambdaandapi_lambda - This function sends SMS messages using SNS based on input from the workflow:
import boto3, os, jsonsns = boto3.client('sns')def lambda_handler(event, context):
print("Received event: " + json.dumps(event))
sns.publish(
PhoneNumber=event['Input']['phone'],
Message=event['Input']['message']
)
return 'Success!'
3. Update Frontend to Include Phone Input
- Modify the static website to capture a phone number along with existing inputs
- Add a phone input field in
index.html - Update
serverless.jsto include:
function emailValue() { return document.getElementById('email').value || 'NO_EMAIL' }
function phoneValue() { return document.getElementById('phone').value || 'NO_PHONE' }
- Include both values in the API request payload:
body: JSON.stringify({
waitSeconds: waitSecondsValue(),
message: messageValue(),
email: emailValue(),
phone: phoneValue()
}),
- If no value is provided:
"NO_EMAIL"or"NO_PHONE"is sent- This allows the backend to determine the correct execution path
4. Update API Lambda Validation
- Add a validation check to ensure at least one contact method is provided:
checks.append(not (data['email'] == "NO_EMAIL" and data['phone'] == "NO_PHONE"))
- This prevents requests without any notification target
- Additional validation for email/phone formatting can be added if needed
5. Enhance Step Functions Workfow Logic
- Introduce a Choice state to determine the notification method:
EmailOnlySMSOnlyEmailAndSms
- Behavior:
- If only email is present → trigger email Lambda
- If only phone is present → trigger SMS Lambda
- If both are present → execute both in parallel
- The
EmailAndSmsstate uses parallel branches to invoke both Lambda functions simultaneously
6. Validate the Updated Workflow

- Test all possible input scenarios:
- No email → SMS only
- No phone → Email only
- Both provided → Email and SMS sent
- Confirm:
- Correct branching in Step Functions
- Successful message delivery via SNS
Stage 8: Resource Cleanup
- Remove all created AWS resources after completing the lab
- Prevents unnecessary charges
- Includes:
- Lambda functions
- API Gateway
- Step Functions
- SNS/SES configurations
- S3 bucket
Amazon SQS (Simple Queue Service) Basics
Amazon SQS (Simple Queue Service) – Overview
- Fully managed queuing service
- Enables asynchronous communication where producers push messages to a queue and consumers retrieve them through polling
- Supports two queue types based on ordering behavior:
- Standard → best-effort ordering, messages may be delivered out of sequence
- FIFO (First-In-First-Out) → guarantees strict ordering
- Designed for small payloads (up to 256 KB, similar to SNS)
- Smaller messages are easier to handle and scale
- For larger data, messages typically contain references (e.g., links to stored data)
- Managed entirely by AWS
- Serverless (no infrastructure management required)
- Accessible via public endpoints
- Built for regional high availability and scalability
- High throughput and performance
- One of the earliest AWS services, launched in 2004
Message Retrieval and Visibility
- Polling is the process of checking a queue for available messages
- Retrieved messages are not automatically removed
- Instead, they become temporarily hidden using the Visibility Timeout
- Messages are permanently deleted only when:
- The consumer explicitly deletes the message after successful processing
- The Visibility Timeout expires, making the message visible again if processing was not completed in time
- This approach ensures reliability:
- Messages can be retried by the same or different consumers if processing fails
Handling Failed Messages
- Messages that repeatedly fail processing can be moved to Dead-Letter Queues (DLQs)
- Allows separate handling and analysis of problematic messages
- Improves fault tolerance and troubleshooting
- Common Use Cases
- Decoupling system components
- Enabling scalable architectures
- Auto Scaling Groups and Lambda functions can scale based on queue depth
- Supporting distributed worker or background processing systems
Amazon SQS – Example Worker Pool Architecture

- The architecture revisits a media processing example using asynchronous queues
- ASG Web Tier
- Users upload source content
- The application stores the original file in an S3 bucket (
Master) - A message containing a reference to the file is sent to an SQS queue
- Processed outputs are later retrieved from another bucket (
Transcode) - Scaling is driven by application demand (e.g., CPU usage)
- ASG Worker Tier
- Continuously polls the SQS queue for tasks
- Scales based on queue depth:
- High queue length → scale out
- Low queue length → scale in (can scale to zero)
- Retrieves the original file, processes it into multiple formats, and stores results
- If processing fails:
- The message becomes visible again after the Visibility Timeout
- Another worker can retry the task
- The queue acts as a decoupling layer between web and worker tiers, allowing them to operate independently
Fanout Pattern with SNS and SQS

- Instead of sending messages directly to SQS:
- The producer publishes messages to an SNS topic
- The SNS topic distributes those messages to multiple SQS queues (subscribers)
- Each queue:
- Represents a different processing path (e.g., video quality variants)
- Scales independently with its own worker group
- This pattern is common with S3 event-driven designs:
- S3 generates a single event per object upload
- SNS distributes that event to multiple processing pipelines
Amazon SQS – Features
Standard vs FIFO Queue Tradeoffs
- Standard Queues
- At-least-once delivery
- No ordering guarantee
- Possible duplicate messages
- Highly scalable with high throughput
- FIFO Queues
- Exactly-once processing
- Guaranteed message order
- Lower throughput compared to Standard queues
Billing Model
- Charges are based on requests, not messages or API calls
- A single request can return:
- Up to 10 messages
- Up to 64 KB of data
- A single request can return:
- Large messages increase request usage:
- Example: a 256 KB message results in 4 billable requests
- Cost considerations:
- Frequent polling may return no messages, increasing cost unnecessarily
- Efficient system design balances responsiveness and cost, often using long polling
Polling Methods
- Short Polling
- Immediate response
- May return zero messages
- Long Polling
- Waits up to a specified duration (
ReceiveMessageWaitTimeSeconds, max 20 seconds) - Returns messages as soon as they are available
- Reduces empty responses and improves cost efficiency
- Waits up to a specified duration (
- Long polling is generally the recommended default
Security and Data Protection
- Messages can be retained for up to 14 days
- Encryption at rest using KMS (SSE)
- Encryption in transit via SSL/TLS
- Access control via:
- IAM identity policies
- Queue resource (policy-based) permissions
SQS – Standard vs FIFO Queue Types
SQS Queue ↔ Highway Analogy

- SQS queues are categorized based on how they handle message ordering:
- Standard (at-least-once delivery)
- FIFO (First-In-First-Out, exactly-once delivery)
- Understanding the tradeoffs is key, and a highway comparison helps visualize this:
- Messages are like cars
- Standard queues resemble multi-lane highways where vehicles can move freely and overtake
- FIFO queues resemble single-lane roads where movement is controlled and order is maintained
SQS Standard Queues
- At-least-once delivery model
- Limitations
- Ordering is not guaranteed; messages may arrive out of sequence
- Duplicate messages can occur if polled multiple times
- Applications must be designed to handle duplicates and unordered data
- Advantages
- Extremely high scalability and throughput
- Scaling is smooth and flexible as demand increases
- Comparable to adding more lanes to a highway to increase capacity
- Common Use Cases
- Decoupling distributed system components
- Worker-based processing systems
- Batch processing pipelines
SQS FIFO (First-In-First-Out) Queues
- Exactly-once processing model
- Requires the queue name to include the
.fifosuffix
- Requires the queue name to include the
- Advantages
- Strict preservation of message order
- No duplicate message delivery
- Limitations
- Lower throughput and scaling capacity compared to Standard queues
- Similar to how a single-lane road limits traffic flow
- Typical throughput:
- ~300 transactions per second without batching
- ~3000 messages per second with batching (up to 10 messages per request)
- Some regions support significantly higher throughput, especially with high-throughput FIFO mode
- Even with enhancements, performance is still below Standard queues
- Lower throughput and scaling capacity compared to Standard queues
- Common Use Cases
- Ordered workflows where sequence matters
- Command processing in strict order
- Step-by-step or iterative calculations (e.g., financial or order processing systems)
SQS – Delay Queues
SQS – Visibility Timeout

- Time window after a message is received during which it remains hidden from other consumers
- Configurable from 0 seconds to 12 hours (default is 30 seconds)
- Can be adjusted using the
ChangeMessageVisibilityAPI - Can also be set at the individual message level, overriding the queue default
- Useful for enabling retry mechanisms and fault recovery
- Message lifecycle with Visibility Timeout:
- A message is added to the queue using
SendMessage - A consumer retrieves it via
ReceiveMessage - Once retrieved, the Visibility Timeout begins, and the message becomes temporarily invisible
- If processing completes successfully → the message is deleted
- If processing fails or exceeds the timeout → the message becomes visible again and can be retried
- A message is added to the queue using
- Important behavior:
- Visibility Timeout only applies after a message has been received
- It controls reprocessing behavior, not initial delivery
- This is different from
DelaySeconds, even though both involve temporary invisibility
SQS – Delay Seconds

- Defines a delay before a message becomes visible for the first time
- Messages enter the queue in a hidden state and remain invisible until the delay expires
- During this period,
ReceiveMessagewill not return the message - Used to introduce a controlled delay before processing begins
- Configuration:
- Range: 0 seconds to 15 minutes (default is 0 seconds)
- A queue with a non-zero delay is considered a delay queue
- Message-level control:
- Initial delay can also be defined per message (message timer), overriding queue settings
- This allows fine-grained control over when specific messages become available
- Limitations:
- Per-message delay (message timers) is not supported for FIFO queues
- This restriction preserves strict ordering guarantees
- Queue-level delay (
DelaySeconds) is still supported for FIFO queues
- Per-message delay (message timers) is not supported for FIFO queues
- Key distinction:
DelaySecondscontrols when a message first becomes availableVisibilityTimeoutcontrols what happens after a message has been retrieved
SQS – Dead-Letter Queues (DLQs)
SQS Dead-Letter Queue (DLQ)

- Dedicated queue for handling failed or unprocessable messages
- Used to isolate messages that repeatedly fail during processing
- Prevents continuous retry cycles in the main queue
- Without a DLQ, failed messages may be retried many times within the retention window (up to 14 days), which is typically undesirable
- Redrive policy controls how messages are moved to a DLQ:
- Defines the source (SRC) queue to monitor
- Specifies the target DLQ where failed messages are sent
- Includes conditions for redriving messages, requiring a configured
maxReceiveCount
- A single DLQ can be associated with multiple source queues
- Each time a message is received from the source queue:
- Its
ReceiveCountincreases - Once
ReceiveCountreachesmaxReceiveCount, the message is transferred to the DLQ
- Its
- Common DLQ use cases:
- Trigger alerts when messages are moved to the DLQ
- Perform isolated troubleshooting and analysis (e.g., reviewing logs or payloads)
- Apply alternative or specialized processing for failed messages
SQS – Message Retention Period
- Defines how long messages are stored in a queue before being automatically removed
- Messages are deleted once the retention period expires
- Each message is assigned an enqueue timestamp upon arrival
- This timestamp is used to determine when the message should expire
- Important behavior with DLQs:
- The original enqueue timestamp is not reset when a message is moved to a DLQ
- Example:
- If a message spends 1 day in the source queue and the DLQ retention is 2 days
- It will only remain in the DLQ for the remaining 1 day
- Example:
- The original enqueue timestamp is not reset when a message is moved to a DLQ
- Best practice:
- Configure DLQs with a longer retention period than the source queue
- Ensures sufficient time for analysis and reprocessing of failed messages
Amazon Kinesis Data Streams Basics
Amazon Kinesis Data Streams (KDS) – Core Concepts
- Real-time data streaming platform
- Built to collect and process high volumes of data continuously from multiple sources
- Serverless, public, and regionally resilient, but shard scaling must be managed by the customer
- Data Stream = fundamental unit of Kinesis
- Producers write data into streams, consumers read data from streams
- Streams can scale from minimal throughput to very high volumes
- Data retention is time-bound (default 24 hours)
- Older data is automatically discarded
- Retention period can be extended up to 365 days at additional cost
- Amazon Data Firehose can export stream data to other services (e.g., S3) for long-term storage
- Supports multiple producers and multiple consumers, allowing fine-grained access
- Ideal for real-time analytics, dashboards, and monitoring
Kinesis Data Streams – Architecture

- Data flows from producers into streams, then consumers read from the streams
- Shards enable scaling
- Each shard provides:
- 1 MB/s data ingestion
- 2 MB/s data consumption
- More shards → higher throughput and higher costs
- Data is stored in Kinesis Data Records (max 1 MB per record)
- Scaling is linear: add shards to handle more data
- Each shard provides:
- Billing is based on:
- Number of shards
- Data retention window size
- Kinesis is relatively costly, intended for use cases that require real-time data streaming
SQS vs Kinesis
- Key distinction: Kinesis handles continuous, high-volume data streams, while SQS is for asynchronous message delivery
- SQS:
- Typically one producer group (e.g., WEB tier) and one consumer group (e.g., WORKER tier)
- Not intended for hundreds or thousands of sources sending data simultaneously
- Best for decoupling components and asynchronous task queues
- Messages are temporary; no rolling window for data retention
- Kinesis:
- Designed for massive, high-frequency data ingestion
- Examples: analytics, real-time monitoring, app clickstreams
- Supports multiple independent consumers
- Maintains a rolling window of data for temporary persistence
- Enables real-time streaming and processing
- Designed for massive, high-frequency data ingestion
Amazon Kinesis Video Streams – Real-Time Video Data Streaming
Amazon Kinesis Video Streams (KVS) – Core Concepts
- Real-time video streaming platform
- Captures live video or time-sequenced sensor data from producers, including:
- Video sources: security cameras, smartphones, drones, vehicles
- Sensor streams: audio, thermal imaging, depth sensors, RADAR
- Consumers can retrieve data frame-by-frame or in segments as needed
- Captures live video or time-sequenced sensor data from producers, including:
- Fully-managed AWS service
- Serverless, public, and regionally resilient
- Automatically scales with demand
- Data is persisted and encrypted both in-transit and at rest
- Access via API only
- Raw source data is not directly accessible
- Consumers interact with indexed and structured streams stored in KVS
- Integrates with other AWS services:
- Amazon Rekognition (for video and image analysis)
- Amazon Connect (e.g., voicemail or multimedia processing)
- Use cases:
- Event-driven video analytics pipelines
- Streaming from cameras or IoT devices
- For exam scenarios mentioning GStreamer or RTSP, KVS is the default choice
- GStreamer: multimedia pipeline framework for connecting multiple processing systems
- RTSP: protocol for transporting real-time multimedia streams over a network
Kinesis Video Streams – Example Video Surveillance Architecture (with Rekognition)

- Security cameras in a smart home stream video into a Kinesis Video Stream (KVS) in AWS, offloading local video processing
- Video streams feed into Amazon Rekognition Video for analysis (e.g., facial recognition, object detection)
- Rekognition outputs processed data to a Kinesis Data Stream (KDS) containing structured insights, such as identified faces or events
- Further automation: AWS Lambda can process each record and trigger notifications via SNS for events like unknown faces detected
Amazon Kinesis Data Firehose Basics
Amazon Data Firehose – Key Concepts
- Fully-managed data delivery service
- Moves and stores data into data lakes, data stores, and analytics platforms.
- By default, Kinesis Data Streams (KDS) does not retain data long-term
- Data is available only within its retention window; Firehose can persist it beyond that by delivering it elsewhere.
- Features:
- Managed by AWS
- Serverless, regionally redundant, and publicly accessible.
- Scales automatically, unlike KDS which requires shard management.
- Near real-time data delivery (~60s by default)
- Unlike KDS (~200ms), Firehose is not strictly real-time by default.
- Buffering can now be disabled for true real-time delivery if needed.
- On-the-fly data transformation using Lambda
- Processing may introduce some latency depending on complexity.
- Managed by AWS
- Billing model: Pay-as-you-go, based on the volume of data processed.
- Common use cases:
- Loading data into supported destinations.
- Persisting KDS data after its retention window.
- Transforming data format during delivery using Lambda.
Amazon Data Firehose – Architecture

- Supported data sources
- AWS services (CloudWatch Logs, CloudWatch Events)
- IoT devices
- Kinesis Data Streams
- Kinesis producers (KPL, Kinesis Agent)
- KPL = Kinesis Producer Library; the Agent is built on top of it.
- If streaming features of Kinesis aren’t required, data can be sent directly to Firehose.
- Supported destinations
- HTTP endpoints (for 3rd-party delivery)
- Splunk
- Amazon S3
- Amazon Redshift
- Amazon OpenSearch Service (formerly Elasticsearch Service)
- Data buffering for delivery
- By default, Firehose waits for 1 MB of data or 60 seconds before sending.
- AWS now allows disabling the buffer for true real-time delivery, though the default buffer is still active.
- Lambda transformation support
- Can use built-in blueprints for common transformations.
- Optional: retain raw data in an S3 backup bucket.
- Transformation may add latency.
- Redshift delivery specifics
- Firehose writes data to S3 first; Redshift then loads data via COPY.
- The process is managed automatically, but the S3 step is required.
Amazon Managed Apache Flink – Stream Processing Service (formerly Kinesis Data Analytics)
DISCLAIMER: Name Change from Amazon Kinesis Data Analytics
- This service was previously called Amazon Kinesis Data Analytics, where SQL was used for data transformations.
- It is no longer part of the Kinesis product family. The core engine is now Apache Flink, though SQL transformations are still supported.
- Reference: AWS Announcement
- Older lecture notes reference the previous naming but the service retains most functionality.
- Old summary (for context):
- Analyzes streaming data in real time, enabling actionable insights and immediate responses.
- Operates on high-throughput streaming data and transforms input using SQL (optionally with S3 reference data).
- Streams output to destinations such as dashboards or analytics systems.
- Typical use cases: time-series analytics (e.g., elections, esports), real-time dashboards, real-time security metrics.
Amazon Managed Service for Apache Flink – Overview
- Purpose: Real-time stream processing using Apache Flink.
- Acts between an input stream and an output stream, transforming data in transit.

- Supported sources:
- Kinesis Data Streams
- Data Firehose
- Amazon Managed Streaming for Apache Kafka (MSK)
- Optional static reference data from S3
- Supported destinations:
- Kinesis Data Streams
- Data Firehose (and its downstream destinations: HTTP, Splunk, OpenSearch Service, S3, Redshift)
- MSK
- Lambda
- S3
- Analytics tools
Stream Processing Architecture

- Input sources remain unchanged; only output streams are modified.
- In-application input streams function like tables, updated continuously to match the live input stream.
- Reference tables (from S3) contain static data that can enrich the input stream.
- Example: An esports stream sends live player data; static player metadata from S3 is joined to live data in real time for enhanced dashboards.
- Application code (SQL or Flink API) processes input and generates in-application output streams.
- Errors can be routed to an in-application error stream.
- Billing: Charged based on processed data volume; cost can be significant. Use only for workloads that need real-time stream processing.
- Typical use cases:
- Time-series analytics (elections, esports, etc.)
- Real-time dashboards (leaderboards, sports, games)
- Real-time metrics for security and operations teams
Amazon Cognito – User Authentication and Identity Management Service
Amazon Cognito – Overview
- Core AWS identity service: provides authentication, authorization, and serverless user management for web and mobile apps.
- Two main components:
- User pools – handle sign-in and issue JSON Web Tokens (JWTs).
- Identity pools – provide temporary AWS credentials to access AWS resources.
- Note: User pools and identity pools serve different purposes despite similar naming.
- Scalability: Supports unlimited users, far exceeding the 5,000 IAM user limit, making it suitable for large-scale applications.
Amazon Cognito – User Pools

- User directory: stores users like a database.
- Provides standardized sign-up/sign-in experiences. Authenticated users receive a JWT.
- Additional features: user management, customizable web UI, multi-factor authentication (MFA), and other security settings.
- Users can be internal or external (e.g., via social IDPs like Google or Facebook).
- JWTs:
- Prove the user has authenticated with the user pool.
- Can be used for authentication to self-managed servers or databases.
- Services like API Gateway and ALBs can accept JWTs for authentication.
- Cannot directly access most AWS resources; that requires temporary AWS credentials.
Amazon Cognito – Identity Pools

- Purpose: provide access to AWS resources by exchanging an identity token for temporary AWS credentials.
- Identity types:
- Unauthenticated identities – guest users with limited access to AWS resources.
- Federated identities – external identity (Google, Facebook, SAML 2.0, or Cognito user pool JWT) swapped for temporary AWS credentials.
- External IDPs handle authentication; your app never sees third-party credentials.
- Identity pools support:
- Social IDPs (Google, Facebook, etc.)
- Cognito user pool JWTs
- Requires configuration for each IDP in the identity pool.
- Credentials are temporary but can be refreshed by Cognito.
- Role assumption: Cognito maps identities to IAM roles and returns temporary credentials.
- Must define roles for both authenticated/federated users and unauthenticated/guest users.
Amazon Cognito – Web Identity Federation (User Pools + Identity Pools)

- Web Identity Federation: process of exchanging a third-party IDP token for AWS credentials.
- User pools consolidate internal and external users.
- Identity pools only need to integrate with the user pool JWT for temporary AWS credentials.
- Reduces the need to configure multiple external IDPs directly in identity pools.
AWS Glue BasicsS
AWS Glue – Overview
- Serverless ETL (Extract, Transform, Load) and Data Catalog service
- Compared to AWS Data Pipeline:
- Data Pipeline can perform ETL but uses compute servers (creates Amazon EMR clusters).
- Glue is serverless, ad-hoc, and cost-efficient—preferred in exams when a serverless ETL solution is required.
- Compared to AWS Data Pipeline:
- Two primary functions:
- Move and transform data between sources and destinations.
- Crawl data sources and generate data catalogs.
- Fully managed by AWS:
- Serverless, public, and regionally resilient.
- Automatically scales based on workload.
AWS Glue – Data Catalogs
- Data Catalog: centralized repository of metadata with data management and search tools.
- Stores persistent metadata about data sources within a region.
- Regional & account scope:
- One catalog per AWS region per account.
- Avoids data silos and improves visibility of metadata across an account.
- Metadata can be browsed or used in ETL workflows for other services.
- Integration with other AWS services:
- Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, AWS Lake Formation, and more.
- Data crawlers:
- Configured with credentials to access and scan data sources.
- Automatically detect schemas and tables, storing metadata in the catalog.
AWS Glue – Architecture

- Supported data sources:
- Data stores: Amazon S3
- Databases: RDS, Redshift, DynamoDB, or any JDBC-compatible DB
- Streams: Kinesis Data Streams, Apache Kafka
- Supported data targets:
- Data stores: Amazon S3
- Databases: RDS, Redshift, or JDBC-compatible DBs
- Data Catalog access:
- Accessible to Glue jobs and other systems, e.g., via AWS Management Console.
- Makes metadata visible and reusable across the organization.
- Glue Jobs (ETL Jobs):
- Serverless execution—no need to manage compute resources.
- AWS allocates resources from a warm pool; billing is based only on used resources.
- Jobs can be triggered manually or automatically by events.
- Serverless execution—no need to manage compute resources.
Amazon MQ Basics
Apache ActiveMQ
- Open-source message broker
- Written in Java and widely used in enterprise systems.
- Enables communication between distributed applications, regardless of language or hosting environment.
- Supports standard APIs and protocols such as JMS, AMQP, MQTT, OpenWire, and STOMP.
- Provides both queues (point-to-point) and topics (publish-subscribe) messaging models.
- Comparison with Amazon SQS and SNS:
- SQS and SNS provide similar messaging capabilities but are AWS-native services.
- They use AWS-specific APIs instead of industry-standard protocols.
- They are fully managed, highly available, and deeply integrated with AWS.
- Migration challenge:
- Applications built on industry-standard messaging systems may not work with SQS/SNS without modification.
- A standards-compliant solution is often required → this is where Amazon MQ is used.
Amazon MQ – Overview
- AWS-managed service for Apache ActiveMQ
- Provides managed message brokers that support industry-standard APIs and protocols.
- Deployment options:
- Single instance
- Lower cost, single Availability Zone resilience.
- Suitable for development and testing.
- High availability (HA) pair (active/standby)
- Multi-AZ deployment for production workloads.
- Single instance
- Networking model:
- Not a public service.
- Runs inside a VPC and requires private networking.
- Integration considerations:
- Does not offer native integrations with most AWS services.
- Requires managing compatibility with Apache ActiveMQ standards.
- AWS services are generally designed to integrate with SQS and SNS instead.
Amazon MQ – HA Architecture

- Active/standby brokers deployed across two Availability Zones
- Shared storage is provided using Amazon EFS.
- Hybrid connectivity:
- Supports private connections to on-premises ActiveMQ brokers.
- Can use Site-to-Site VPN or Direct Connect.
- Application integration:
- Applications running on EC2 within the VPC can communicate using standard protocols.
- No immediate application changes are required during migration.
- Supports hybrid and phased migration architectures
Amazon MQ vs SQS and SNS – Exam Considerations
- Default choice:
- Use SQS and SNS for most new AWS-based messaging solutions.
- Better integration with AWS services (IAM, monitoring, encryption, etc.).
- Use SQS and SNS for most new AWS-based messaging solutions.
- Use Amazon MQ when:
- Migrating existing systems that rely on industry-standard messaging with minimal code changes.
- Applications require JMS or protocols such as AMQP, MQTT, OpenWire, or STOMP.
- Important requirement:
- Amazon MQ requires proper VPC and private network configuration.
Amazon AppFlow Basics
Amazon AppFlow – Overview
- App integration service
- Functions like middleware for connecting applications.
- Enables data exchange between connectors using configurable flows.
- Flow: primary unit of configuration
- Combines source connector + destination connector + optional components such as transformations or filters.
- Fully managed by AWS:
- Serverless, auto-scaling, and regionally resilient.
- Public service with public endpoints, enabling integration with SaaS apps like Slack, Zendesk, and Salesforce.
- Can also work with AWS PrivateLink for VPC-private integration.
- Connectors:
- Supports many popular SaaS apps.
- Custom connectors can be developed using the AppFlow Custom Connector SDK.
- Common use cases:
- Data synchronization across apps
- Example: sync support tickets from Slack or Zendesk into Amazon Redshift for analysis.
- Data aggregation across sources to reduce silos
- Example: copy Salesforce contact records into S3 for centralized storage.
- Data synchronization across apps
Amazon AppFlow – Architecture

- Connections: store configuration and credentials to access applications.
- Defined separately from flows, allowing reuse across multiple flows.
- Flows: define main processing logic
- Source and destination mappings (which connections to use).
- Optional data transformations.
- Optional filtering and validation of data.