High Availability (HA) and Scaling

AWS Regional and Global Architecture

Global vs Regional Applications
Architectural Components of Cloud Systems

Global Components

  1. Global Service Location & Discovery
    • Determines how users locate the application (e.g., netflix.com)
    • Example: DNS configuration in Amazon Route 53
  2. Content Delivery & Optimization
    • Ensures application content is delivered efficiently to users worldwide
    • Can use globally distributed storage or rely on a central origin
    • Example: Amazon CloudFront caches content closer to users
  3. Global Health Checks & Failover
    • Monitors infrastructure health across regions
    • Automatically redirects users to healthy regions during failures
    • Example: Route 53 health checks

Regional Components

  1. Regional Entry Point
    • Defines how users access regional infrastructure
    • Example: VPC endpoints, ALBs, API Gateway
  2. Scaling & Resilience
    • Maintains performance under regional load fluctuations or failures
    • Example: Auto Scaling Groups (ASGs), ALBs
  3. Application Services & Components
    • Implements core functionality for the region
    • Example: EC2 instances, S3 buckets

Regional Application Tiers

  1. Web Tier
    • Entry point for regional applications
    • Abstracts users from underlying infrastructure
    • Example: ALB, API Gateway
  2. Compute Tier
    • Provides application logic and processing
    • Example: EC2, Lambda, ECS
  3. Storage Tier
    • Supports compute services with data storage
    • Example: S3, EBS, EFS
    • CloudFront can use S3 as an origin for media
  4. Database Tier
    • Stores structured data for compute tier
    • Example: RDS, Aurora, DynamoDB, Redshift
  5. Caching Tier
    • Reduces read load on databases by caching data in memory
    • Example: ElastiCache, DynamoDB Accelerator (DAX)
  6. Application Services
    • Additional AWS services enhancing the application
    • Examples:
      • Simple services: notifications, email (SNS, SES)
      • Architecture-changing services: decoupling components (SQS, Kinesis, Step Functions)

EC2 Launch Configurations (LCs) and Launch Templates (LTs)

EC2 Launch Configuration (LC)
EC2 Launch Template (LT)

EC2 Auto Scaling Groups (ASG)

EC2 Auto Scaling Group (ASG) – Key Concepts

EC2 ASG – X:Y:Z Model

EC2 ASG – Architecture
EC2 ASG – Scaling Processes
EC2 ASG – Additional Features

EC2 Auto Scaling and ASG Scaling Policies

EC2 Auto Scaling – Types
  1. Manual Scaling
    • Adjust ASG capacity values directly without a scaling policy.
    • MIN, DESIRED, MAX values are static but can be modified manually via AWS Console, CLI, or custom scripts.
    • Customer controls scaling logic and execution.
    • Use cases: testing ASGs, cost management, or urgent capacity changes.
  2. Scheduled Scaling
    • Automatically updates DESIRED capacity at specified times.
    • Does not require a scaling policy.
    • Implemented using scheduled actions in the ASG.
      • MIN and MAX values remain unchanged and must be modified separately.
    • Useful for predictable traffic patterns, e.g., peak business hours or promotional events.
  3. Dynamic Scaling
    • Automatically adjusts DESIRED capacity based on real-time metrics or CloudWatch alarms.
    • Requires a scaling policy attached to the ASG.
    • Only DESIRED capacity is updated automatically; MIN and MAX still need manual configuration.
    • Dynamic scaling types:
      1. Simple Scaling – single-step scale out/in actions
      2. Step/Stepped Scaling – multi-step scaling based on metric thresholds
      3. Target Tracking – ASG maintains a metric near a defined target
ASG Scaling Policies (Dynamic Scaling)
EC2 ASG – Simple Scaling
EC2 ASG – Step/Stepped Scaling

EC2 ASG – Target Tracking

Auto Scaling Group (ASG) Lifecycle Hooks

EC2 ASG – Lifecycle Hooks

AWS Elastic Load Balancer (ELB)

Traditional Load Balancer (LB)
AWS Elastic Load Balancer (ELB) – Architecture

Configuration highlights:

Subnet requirements for ELB scaling:

ELB – Abstraction of Infrastructure
ELB – Cross-Zone Load Balancing

ELB Types – Classic, Application, and Network Load Balancers

AWS Elastic Load Balancer (ELB) – Overview and Types
  1. v1 (Legacy) – Classic Load Balancer (CLB)
    • Considered outdated; migrate to ELB v2 if still in use.
  2. v2 (Current) – Modern Load Balancers
    • Application Load Balancer (ALB) – operates at Layer 7, supports HTTP(S) and WebSockets.
    • Network Load Balancer (NLB) – operates at Layer 4, supports TCP, TLS, and UDP.

Note: A fourth type, Gateway Load Balancer (GWLB), exists for specific network appliance use cases and is handled separately.

ELBv1 – Classic Load Balancer (CLB)

Drawbacks:

ELBv2 – Modern Load Balancer

Advantages over CLB:

Application Load Balancer (ALB)

Listener Rules:

Benefits:

Limitations:

Network Load Balancer (NLB)

Limitations:

Use Cases:

Choosing Between ALB and NLB

In all other cases, ALB provides greater flexibility with Layer 7 features.

ELB – SSL Termination

AWS ELB – Approaches for Handling SSL

ALB – SSL Bridging

Key Points:

NLB – SSL Pass-through

Key Points:

ALB – SSL Offload

Key Points:

SSL Handling in ELB – Overview Table
SSL ApproachLoad Balancer TypeListener ProtocolCertificate on ELBCertificate on EC2 Instances
SSL BridgingApplication Load Balancer (ALB)HTTPSPresentPresent
SSL Pass-throughNetwork Load Balancer (NLB)TCPNot usedRequired
SSL Termination (Offload)Application Load Balancer (ALB)HTTPSPresentNot required

ELB – Session Affinity (Stickiness)

ELB Session/Connection Stickiness – Overview
ALB Session Stickiness

NLB Session Stickiness

Demo – ALB Session Stickiness

ASG – ELB Integration and Health Checks

ASG-ELB Integration
ASG Health Checks
  1. EC2 instance status checks (default)
    • An instance is considered healthy only if it is in the Running state AND passes both 2/2 status checks.
    • Instances in states like Stopping, Stopped, Shutting Down, or Terminated or failing checks are marked unhealthy.
  2. ELB health checks (optional, when ASG is attached to ELB)
    • An instance is healthy only if both ELB checks and EC2 status checks pass.
    • Provides network-level (L4) or application-level (L7) health monitoring (L7 only with ALB).
    • Misconfiguration can cause problems:
      • Example: ELB checks a simple HTML page, but the app’s backend (e.g., database) is failing → ASG may continuously replace instances unnecessarily.
  3. Custom health checks
    • Any external system can mark instances as healthy/unhealthy.
    • Allows ASGs to meet specific business requirements or integrate with monitoring tools.

Gateway Load Balancer (GWLB) – Traffic Management for Network Appliances

Scaling Challenges with Network Security Appliances
AWS Gateway Load Balancer (GWLB) – Core Concepts

GENEVE Protocol in GWLB

Example GWLB Architecture and Traffic Flow
  1. Traffic enters the IGW, destined for an ALB with a public IP in subnet 10.16.9.0/20.
  2. IGW updates the destination IP to the ALB’s private IP and forwards traffic to GWLBE2.
  3. GWLBE2 sends packets to the GWLB in the securityVPC.
  4. GWLB encapsulates traffic using GENEVE, preserving original source/destination IPs, and forwards to a selected security appliance.
  5. Security appliance inspects traffic and either blocks or returns it.
  6. Packets are returned to GWLB with GENEVE encapsulation removed.
  7. Traffic passes back through GWLBE2 to the application VPC (catagramVPC).
  8. Local routing directs traffic from GWLBE2 to the ALB.
  9. ALB distributes traffic to the appropriate application instance.
  10. The return path follows the same logic, maintaining traffic integrity and inspection throughout.

Serverless and Application Services

Architecture Deep Dive Concepts

CatTube (Example App)
Monolithic Architecture
Tiered Architecture
Asynchronous Queues Architecture
Microservices Architecture
Event-Driven Architecture (EDA)

AWS Lambda Basics

AWS Lambda – Key Concepts
AWS Lambda – Architecture

Resource Configuration

AWS Lambda – Common Use Cases
Demo: Creating and Running a Lambda Function
  1. Deploy a CloudFormation stack to provision required resources (e.g. EC2 instances)
  2. Create an execution role with permissions (e.g. logging and EC2 control actions)
  3. Navigate to Lambda and create a new function
    • Assign a name and choose a runtime (e.g. Python 3.9)
    • Attach the execution role
  4. Add function code
    • Example: script to stop EC2 instances using environment variables

5. Configure environment variables

6. Run a test invocation

7. Optionally create another function (e.g. to start instances) and test similarly

8. Perform cleanup by deleting Lambda functions and the CloudFormation stack

9. Clean-up: remove the Lambda functions that were created, and then delete the CloudFormation stack to terminate all associated resources.

AWS Lambda Networking

Public Lambda (Default)
Private Lambda

Private Lambda ENI Injection – Old vs New Approach

Old approach:

New approach:

AWS Lambda Security, Monitoring, and Versioning

AWS Lambda – Security
  1. Lambda execution role
    • An IAM role assumed by the Lambda function during execution
    • The trust policy allows Lambda to assume the role
    • The permissions policy defines what the function is allowed or denied to do
      • Example: read data from DynamoDB and write it to S3
  2. Lambda resource policy
    • A resource-based policy attached to the Lambda function
    • Determines which principals are allowed to invoke the function
      • Can grant access to AWS services (e.g. S3, SNS) or external AWS accounts
AWS Lambda – Monitoring
AWS Lambda – Versioning and Aliases

AWS Lambda Invocation Methods

Lambda – Synchronous Invocation
Lambda – Asynchronous Invocation
Lambda – Event Source Mapping

AWS Lambda Execution Environment

AWS Lambda – Cold vs Warm Starts
AWS Lambda – Reducing Cold Start Latency

Amazon EventBridge – Serverless Event Bus Service

Amazon EventBridge – Overview and Architecture
CloudWatch Events (CWEvents)

DEMO: Building a Simple Event-Driven Architecture (EDA)

Protect an EC2 Instance if It Gets Stopped

  1. Create a Lambda function that automatically restarts any EC2 instance that enters the stopped state:
  1. Create an EventBridge rule to monitor EC2 state changes.
  2. Event to track: EC2 Instance State-change Notification.
  3. Generate a JSON sample of the event to see what information is delivered.

5. Fill out the event pattern to match instances entering the Stopped state.

6. Assign the Lambda function as the target for this EventBridge rule.

7. Test the setup by stopping an instance. The Lambda should automatically restart it after a short period.

8. Check logs in CloudWatch Logs for function execution details:

Stop All EC2 Instances at a Specific Time Every Day
  1. Create a schedule rule in EventBridge
    • Use the EventBridge Scheduler for a modern UI and flexible scheduling options.
    • Unlike the old method, you can define schedules outside of event buses.
    • Traditional “Create Rule” only allowed Unix CRON format and required UTC time.
  1. Specify the schedule using a Unix CRON expression
    • Check the time zone carefully:
      • CRON expressions default to UTC
      • EventBridge Scheduler UI may display times in your local time zone
      • Verify next trigger times to ensure the schedule is correct
  1. Assign the Lambda function that stops EC2 instances as the target for this scheduled rule.
  1. Wait for the scheduled time
    • The Lambda function will automatically stop the instances at the specified time.
    • If the EC2 protection Lambda is still active, any protected instances will automatically restart after being stopped.

Serverless Architecture Overview

Serverless – Core Idea

Key Characteristics

  1. Small, specialized functions
    • Each function does a single task well.
    • Functions start, execute, and stop quickly.
    • Billing is per execution.
  2. Stateless & ephemeral
    • Functions can run anywhere, independently.
  3. Event-driven
    • Functions execute only when triggered.
    • Consumption-based model: low idle costs.
  4. Managed services first
    • Use services like S3, DynamoDB, Cognito instead of self-hosting.
    • Code only what’s necessary.
  5. FaaS (Function-as-a-Service)
    • Cheap, scalable compute for general tasks.
    • AWS Lambda = main compute engine; avoids self-managed EC2 whenever possible.
Example: CatTube Serverless Architecture

Video Upload Workflow

  1. Upload video → S3 Originals bucket.
  2. New object triggers Lambda function via S3 event.
  3. Lambda creates Elastic Transcoder jobs → outputs in Transcode bucket.
  4. Video metadata added to DynamoDB.

Media Access Workflow

  1. Client requests media → triggers Lambda function.
  2. Lambda loads metadata from DynamoDB + media from Transcode bucket.
  3. Lambda returns URLs for client to access media.

Key Points:

Amazon SNS (Simple Notification Service) – Pub/Sub Messaging Service

Amazon SNS – Core Idea
SNS Architecture

Key Entities

  1. SNS Topic – main entity
    • Holds configuration and permissions.
    • Enables 1-to-many communication:
      • Publisher → sends messages to topic.
      • Subscribers → receive messages.
    • Supported subscriber types: HTTP(S), email, SQS, Lambda, SMS, mobile push.
    • Entities can act as both publisher and subscriber.
  2. Topic Policy – resource policy defining who can read/write and supports cross-account access.
  3. Message Filters – allow subscribers to receive only relevant messages.
  4. Delivery Status & Retries – confirm message delivery and retry until success.

Amazon API Gateway (APIGW) Basics

Application Programming Interface (API)
Amazon API Gateway (APIGW) – Key Concepts

Features and Capabilities

Amazon API Gateway – Architecture

Authentication Methods

  1. No Authentication
    • Publicly accessible APIs
  2. Amazon Cognito User Pools
    • Users authenticate via Cognito; API Gateway validates the token
  3. Lambda Authorizer
    • Custom authorization using a Lambda function to validate tokens
  4. IAM-Based Authentication
    • Uses AWS credentials provided in request headers
    • Considered an advanced approach

Endpoint Types

  1. Edge-Optimized
    • Requests are routed through the nearest CloudFront edge location
  2. Regional
    • Designed for clients within the same AWS region
    • Does not utilize CloudFront by default
  3. Private
    • Accessible only within a VPC using interface endpoints

Stages

Caching

Common HTTP Errors

4XX – Client Errors (Invalid Requests)

5XX – Server Errors (Backend Issues)

AWS Step Functions Basics

AWS Lambda – Limitations
AWS Step Functions – Key Concepts (State Machines)
AWS Step Functions – State Machines (SM)

Common State Types in Step Functions

AWS Step Functions – Example Architecture

LAB: Building the Serverless Pet Cuddle-O-Tron

Pet Cuddle-O-Tron – Overview

End-State Architecture (Simplified – After Stage 5)

End-State Architecture (Extended – After Stage 7)

Stage 1: Configure Amazon Simple Email Service (SES)
Stage 2: Configure Email Lambda Function
Stage 3: Configure Step Functions State Machine
Stage 4: Configure Backend API (API Gateway + Lambda)
Stage 5: Configure Frontend (S3 Static Website)

body {
padding-top: 40px;
padding-bottom: 40px;
background-color: #eee;
}

hr {
border-top: solid black;
}

div #error-message {
color: red;
font-size: 15px;
font-weight: bold;
}

div #success-message, #results-message {
color: green;
font-size: 15px;
font-weight: bold;
}

.form-signin {
max-width:480px;
padding: 15px;
margin: 0 auto;
}
.form-signin .form-signin-heading,
.form-signin .checkbox {
margin-bottom: 10px;
}
.form-signin .checkbox {
font-weight: normal;
}
.form-signin .form-control {
position: relative;
height: auto;
-webkit-box-sizing: border-box;
box-sizing: border-box;
padding: 10px;
font-size: 16px;
}
.form-signin .form-control:focus {
z-index: 2;
}
.form-signin input[type=”Artist”] {
margin-bottom: -1px;
border-bottom-right-radius: 0;
border-bottom-left-radius: 0;
}
.form-signin input[type=”bottom”] {
margin-bottom: 10px;
border-top-left-radius: 0;
border-top-right-radius: 0;
}

var API_ENDPOINT = ‘REPLACEME_API_GATEWAY_INVOKE_URL’;
// if correct it should be similar to https://somethingsomething.execute-api.us-east-1.amazonaws.com/prod/petcuddleotron

var errorDiv = document.getElementById(‘error-message’)
var successDiv = document.getElementById(‘success-message’)
var resultsDiv = document.getElementById(‘results-message’)

// function output returns input button contents
function waitSecondsValue() { return document.getElementById(‘waitSeconds’).value }
function messageValue() { return document.getElementById(‘message’).value }
function emailValue() { return document.getElementById(’email’).value }

function clearNotifications() {
errorDiv.textContent = ”;
resultsDiv.textContent = ”;
successDiv.textContent = ”;
}

// When buttons are clicked, this is run passing values to API Gateway call
document.getElementById(’emailButton’).addEventListener(‘click’, function(e) { sendData(e, ’email’); });

function sendData (e, pref) {
e.preventDefault()
clearNotifications()
fetch(API_ENDPOINT, {
headers:{
“Content-type”: “application/json”
},
method: ‘POST’,
body: JSON.stringify({
waitSeconds: waitSecondsValue(),
message: messageValue(),
email: emailValue()
}),
mode: ‘cors’
})
.then((resp) => resp.json())
.then(function(data) {
console.log(data)
successDiv.textContent = ‘Submitted. But check the result below!’;
resultsDiv.textContent = JSON.stringify(data);
})
.catch(function(err) {
errorDiv.textContent = ‘Oops! Error Error:\n’ + err.toString();
console.log(err)
});
};

Stage 6: Test the Application
Stage 7 (Extended): Enable SMS-Based Notifications

1. Set Up Simple Notification Service (SNS) for SMS

2. Create a Lambda Function for SMS Notifications

import boto3, os, jsonsns = boto3.client('sns')def lambda_handler(event, context):
print("Received event: " + json.dumps(event))
sns.publish(
PhoneNumber=event['Input']['phone'],
Message=event['Input']['message']
)
return 'Success!'

3. Update Frontend to Include Phone Input

function emailValue() { return document.getElementById('email').value || 'NO_EMAIL' }
function phoneValue() { return document.getElementById('phone').value || 'NO_PHONE' }
body: JSON.stringify({
waitSeconds: waitSecondsValue(),
message: messageValue(),
email: emailValue(),
phone: phoneValue()
}),

4. Update API Lambda Validation

checks.append(not (data['email'] == "NO_EMAIL" and data['phone'] == "NO_PHONE"))

5. Enhance Step Functions Workfow Logic

6. Validate the Updated Workflow

Stage 8: Resource Cleanup

Amazon SQS (Simple Queue Service) Basics

Amazon SQS (Simple Queue Service) – Overview

Message Retrieval and Visibility

Handling Failed Messages

Amazon SQS – Example Worker Pool Architecture
  1. ASG Web Tier
    • Users upload source content
    • The application stores the original file in an S3 bucket (Master)
    • A message containing a reference to the file is sent to an SQS queue
    • Processed outputs are later retrieved from another bucket (Transcode)
    • Scaling is driven by application demand (e.g., CPU usage)
  2. ASG Worker Tier
    • Continuously polls the SQS queue for tasks
    • Scales based on queue depth:
      • High queue length → scale out
      • Low queue length → scale in (can scale to zero)
    • Retrieves the original file, processes it into multiple formats, and stores results
    • If processing fails:
      • The message becomes visible again after the Visibility Timeout
      • Another worker can retry the task
Fanout Pattern with SNS and SQS
Amazon SQS – Features

Standard vs FIFO Queue Tradeoffs

  1. Standard Queues
    • At-least-once delivery
    • No ordering guarantee
    • Possible duplicate messages
    • Highly scalable with high throughput
  2. FIFO Queues
    • Exactly-once processing
    • Guaranteed message order
    • Lower throughput compared to Standard queues

Billing Model

Polling Methods

  1. Short Polling
    • Immediate response
    • May return zero messages
  2. Long Polling
    • Waits up to a specified duration (ReceiveMessageWaitTimeSeconds, max 20 seconds)
    • Returns messages as soon as they are available
    • Reduces empty responses and improves cost efficiency

Security and Data Protection

SQS – Standard vs FIFO Queue Types

SQS Queue ↔ Highway Analogy
SQS Standard Queues
SQS FIFO (First-In-First-Out) Queues

SQS – Delay Queues

SQS – Visibility Timeout
SQS – Delay Seconds

SQS – Dead-Letter Queues (DLQs)

SQS Dead-Letter Queue (DLQ)
SQS – Message Retention Period

Amazon Kinesis Data Streams Basics

Amazon Kinesis Data Streams (KDS) – Core Concepts
Kinesis Data Streams – Architecture
SQS vs Kinesis

Amazon Kinesis Video Streams – Real-Time Video Data Streaming

Amazon Kinesis Video Streams (KVS) – Core Concepts
Kinesis Video Streams – Example Video Surveillance Architecture (with Rekognition)
  1. Security cameras in a smart home stream video into a Kinesis Video Stream (KVS) in AWS, offloading local video processing
  2. Video streams feed into Amazon Rekognition Video for analysis (e.g., facial recognition, object detection)
  3. Rekognition outputs processed data to a Kinesis Data Stream (KDS) containing structured insights, such as identified faces or events
  4. Further automation: AWS Lambda can process each record and trigger notifications via SNS for events like unknown faces detected

Amazon Kinesis Data Firehose Basics

Amazon Data Firehose – Key Concepts
Amazon Data Firehose – Architecture

Amazon Managed Apache Flink – Stream Processing Service (formerly Kinesis Data Analytics)

DISCLAIMER: Name Change from Amazon Kinesis Data Analytics
Amazon Managed Service for Apache Flink – Overview
Stream Processing Architecture

Amazon Cognito – User Authentication and Identity Management Service

Amazon Cognito – Overview
Amazon Cognito – User Pools
Amazon Cognito – Identity Pools
Amazon Cognito – Web Identity Federation (User Pools + Identity Pools)
  1. User pools consolidate internal and external users.
  2. Identity pools only need to integrate with the user pool JWT for temporary AWS credentials.
    • Reduces the need to configure multiple external IDPs directly in identity pools.

AWS Glue BasicsS

AWS Glue – Overview
AWS Glue – Data Catalogs
AWS Glue – Architecture

Amazon MQ Basics

Apache ActiveMQ
Amazon MQ – Overview
Amazon MQ – HA Architecture
Amazon MQ vs SQS and SNS – Exam Considerations

Amazon AppFlow Basics

Amazon AppFlow – Overview
Amazon AppFlow – Architecture