Some notes as I review AWS. Many of the material here are from the AWS Certification Study Guide. Screen shots and images are property of Amazon.    

http://www.oracleappshub.com/wp-content/uploads/2010/06/aws_ec2.jpg

 

http://blog.clearpathsg.com/Portals/154661/images/amazon-web-services-global-infrastructure-resized-600.png

 

 


 

Contents

Study Blueprint 6

Document Listing. 7

Background Readings. 8

Scalability and Elasticity. 9

CDN.. 9

Route tables (NAT, HTTP, DNS, IP and OSI Network) 9

TCP / IP. 9

RESTFul Web Services (XML, JSON) 10

Public key encryption, SSH, access credentials, and X.509 certificates. 10

IOPS. 10

AWS General 10

Compute and Networking. 11

Amazon EC2 (Elastic Compute Cloud) 11

Amazon Machine Image (AMI) 14

EC2 Instance store volumes (ephemeral drives) 14

Elastic Batch Store (EBS) and Databases on EC2. 15

Auto Scaling. 16

Elastic Load Balancing (ELB) 17

Lab Example. 17

Amazon VPC (Virtual Private Cloud) 18

Security � Security Groups and Network Access Control List (ACL) 19

Elastic Network Interfaces (ENI) 20

Lab Demo. 21

AWS Direct Connect 21

Amazon Route 53. 21

Storage and Content Delivery. 23

S3 (Simple Storage Service) 23

Developer Guide. 24

Lab Example. 25

Glacier. 25

Lab Example. 25

Amazon EBS (Elastic Block Store) 25

AWS Import/Export 26

Storage Gateway. 26

Amazon CloudFront 27

Reduce DNS time � Route 53. 28

Keep-Alive Connections. 28

CloudFront Slow-Start Optimization. 29

Cf_dg_CloudFront_Developer.pdf 29

Serving Private Content and Accessability. 30

Running the Lab (starts on page 11) 30

Troubleshooting. 31

Database. 32

Amazon RDS (Relational Database Service) 32

Amazon DynamoDB. 32

Amazon ElastiCache. 32

Amazon Redshift 32

Amazon SimpleDB. 33

Deployment, Administration, Management 34

AWS IAM (Identity and Access Management) 34

Groups. 34

Users. 34

Roles. 34

Identity Providers. 35

Password Policy. 35

AWS CloudTrail 35

Amazon CloudWatch. 35

AWS Elastic Beanstalk. 35

AWS CloudFormation. 36

AWS OpsWorks. 36

AWS CloudHSM (Hardware Security Modules) 36

Analytics. 37

Amazon EMR (Elastic MapReduce) 37

Amazon Kinesis. 37

AWS Data Pipeline. 37

Application Services. 38

Amazon AppStream.. 38

Amazon CloudSearch. 38

Amazon SWF (Service Work Flow) 38

Amazon SQS (Simple Queue Service) 38

Amazon SES. 38

Amazon SNS. 38

Amazon Elastic Transcoder. 38

Additional Software and Services. 39

Alexa Top Sites. 39

Alexa Web Information Service. 39

Amazon DevPay. 39

Amazon FPS. 39

Amazon Mechanical Turk. 39

Amazon Silk. 39

AWS GovCloud (US) 39

 


 

Study Blueprint

Architect on AWS course/labs

EC2, VPC, EBS, S3, Route 53, IAM, RDS, SQS, DynamoDB, �

Excluded: RedShift, SES, OpsWork

Carefully study the FAQ section for these services

http://aws.amazon.com/training/self-paced-labs/

 

From AWS_certified_solutions_architect_associate_blueprint.pdf file:

 

AWS Knowledge

 Hands-on experience using compute, networking, storage, and database AWS services

 Professional experience architecting large scale distributed systems

 Understanding of Elasticity and Scalability concepts

 Understanding of network technologies as they relate to AWS

 A good understanding of all security features and tools that AWS provides and how they relate to traditional services

 A strong understanding on how to interact with AWS (AWS SDK, AWS API, Command Line Interface, AWS CloudFormation)

 Hands-on experience with AWS deployment and management services

 

General IT Knowledge

 Excellent understanding of typical multi-tier architectures: web servers (Apache, nginx, IIS), caching, application servers, and load balancers

 RDBMS (MySQL, Oracle, SQL Server), NoSQL

 Knowledge of message queuing and Enterprise Service Bus (ESB)

 Familiarity with loose coupling and stateless systems

 Understanding of different consistency models in distributed systems

 Experience with CDN, and performance concepts

 Network experience with route tables, access control lists, firewalls, NAT, HTTP, DNS, IP and OSI Network

 Knowledge of RESTful Web Services, XML, JSON

 Familiarity with the software development lifecycle

 Work experience with information and application security including public key encryption, SSH, access credentials, and X.509 certificates

 

Other Study Tips:

-          VPC config and troubleshoot, IP subnetting. VPC VPC VPC Must Excel!

-          Use Cases for SWF, SQS and SNS

-          ELB interactions with auto-scaling

-          S3 security use cases

-          EBS vs ephemeral storage for EC2 instances

-          CloudFormation basics

-          EBS config and snapshots for I/O performance and durability

-           


 

Document Listing

 

Filename

Topics

Y

01_Hands_On_IAM_wb.pdf

IAM

Y

02_Hands_On_EC2_wb.pdf

 

Y

03_Hands_On_EBS_wb.pdf

 

Y

04_Hands_On_S3_wb.pdf

S3

Y

05_Hands_On_VPC_wb.pdf

 

Y

07_Elastic_Load_Balancing.pdf

EC2, ELB

Y

08_Auto_Scaling.pdf

EC2, ELB

Y

AWS IAM Lab.pdf

IAM

Y

AWS Certification - Web Video Training.docx

Web Videos

Y

AWS_Amazon_SES_Best_Practices.pdf

SES

 

AWS_certified_solutions_architect_associate_blueprint.pdf

 

Y

AWS_certified_solutions_architect_associate_examsample.pdf

 

Y

AWS_Cloud_Best_Practices.pdf

Background

Y

AWS_Overview.pdf

Background

Y

AWS_Risk_and_Compliance_Whitepaper.pdf

 

Y

AWS_Security_Best_Practices.pdf

 

 

AWS_Security_Whitepaper.pdf

 

Y

AWS_Storage_Options.pdf

S3, Glacier, EBS, EC2 Instance Storage, AWS Import/Export, Storage Gateway, CloudFront, SQS, RDS, Dynamo DB, ElastiCache, Redshift, Databases

Y

AWS_Storage_Use_Cases.pdf

S3, EC2 Instance, EBS, CloudFront, SimpleDB, etc.

 

AWS_Web_Hosting_Best_Practices.pdf

 

 

AWSImportExport-dg.pdf

Import Export

Y

aws-cli.pdf

CLI

 

awseb-dg.pdf

Elastic Beanstalk

Y

awssg-intro.pdf

 

Y

cf_dg.pdf

CloudFront

Y

cfn-ug.pdf

CloudFormation

Y

dc-ug.pdf

Direct Connect

F

dynamodb-dg.pdf

DynamoDB

Y

ec2-ug.pdf

EC2

Y

elasticache-ug.pdf

ElastiCache

Y

govcloud-us-ug.pdf

GovCloud

Y

rds-ug.pdf

RDS

-

Redshift-gsg.pdf

Redshift

-

Redshift-mgmt.pdf

Redshift

Y

Route53-dg.pdf

Route 53

Y

s3-dg.pdf

S3

Y

sns-dg.pdf

SNS

 

sqs-gsg.pdf

SQS

 

swf-dg.pdf

SWF

Y

SampleQuestions.docx

 

Y

Sample Questions for Amazon Web Services Certified Solution Architect Certification.docx

 

 

storagegateway-ug.pdf

Storage Gateway

Y

Studynotes.docx

 

Y

vpc-ug.pdf

VPC

 

http://aws.amazon.com/faqs/

 

 

http://www.jamiebegin.com/tips-for-passing-amazon-aws-certified-solutions-architect-exam/

http://nitheeshp.tumblr.com/post/61394863836/aws-certified-solution-architect-exam-tips#!

http://www.cloudtrail.org/blog/339amazon-route-53-easy-example/

Background Readings

 

 

Benefits of Cloud:

-          Almost zero up front infrastructure investment

-          Just-in-time Infrastructure

-          Effiecient resource utilization

-          Usage-based costing

-          Reduced time to market

-          Auto-scaling / Proactive Scaling (possibly infinite scalability)

-          Elasticity:

-         

-          Automation (scriptable infrastructure)

-          Efficient Development lifecycle

-          Improved Testability

-          Disaster Recovery and Business Continuity

-          Elastic IP addresses � allocate static IP address and programmatically assign it to instances

 

Security Notes:

-          AWS is certified and accredited = ISO 27001 certification

-          Physical security = knowledge and locations of data centers is limited and locations physically guarded in variety ways

-          Secure services = SSL / encryption

-          Data privacy = encryption

 

Best Practices:

-          Design for failure � assume the worst, that servers will fail and datacenters lost

-          From a design for failure, we see focus on redundancy, data recovery, backup, quick reboot, etc

-          Failover gracefully using Elastic IPs � dynamically re-mappable so you can quickly remap to another server

-          Utilize multiple Availability Zones � spreading out the datacenters for redundancy

-          Maintain an Amazon Machine Image (AMI) to restore to � all subsequent are clones of this (virtualization)

-          Utilize Amazon CloudWatch for visibility on hardware failures or performance degradation

-          Utilize Amazon EBS to setup cron jobs for incremental snapshots in S3 � data persistence

-          Utilize Amazon RDS for data retention and backups

-          Decouple components

-          Implement Elasticity

o    Proactive cyclic Scaling (daily, weekly, monthly)

o    Proactive event-based scaling

o    Auto-scaling based on demand

-          Think Parallel

 

 

Scalability and Elasticity

Elastic = ability to scale computing resources up and down easily, with minimal friction. Helps avoid provisioning resources up front for projects with variable consumption rates or short lifespans. Elastic Load Balancing and Auto Scaling automatically scales your AWS cloud-based resources up to meet unexpected demands and then scale back down when the demand decreases.

 

CDN

Content Delivery Network � large distributed system of servers deployed in multiple data centers across internet with goal to serve content with high availability and performance. (e-commerce, live streaming, social).

Traditional vs CDN:

Operates as Application Service Provider (ASP) on internet � top ones are Microsoft Azure and Amazon CloudFront.

http://en.wikipedia.org/wiki/Content_delivery_network

 

Amazon CloudFront is a CDN web service that integrates with AWS. Requests for content are automatically routed to nearest edge location (high performance).

http://aws.amazon.com/cloudfront/

 

Route tables (NAT, HTTP, DNS, IP and OSI Network)

 

 

TCP / IP

Transmission Control Protocol (OSI Layer 4) = created by DARPA as part of ARPNET for real time communication

TCP works by sending a single packet and waiting for AWK, at which point it will send twice as many packets and wait for the next AWK. It keeps doubling per each successful AWK until there was disrupt or loss data, at which it starts back at one packet.

IP = 123.123.123.123 address for any device on network

Subnet Mask = way to organize network into access / viewable groups (can only view what is in your group). Devices in the same network still cannot connect to each other without a subnet mask. Class C = 255.255.255.0 and Class B = 255.0.0.0 (more address).

Default Gateway = when a device cannot be found, the user device will then go to the default gateway. This connects the sub-network to the internet or other WAN/LAN.

Domain Name Server (DNS) = resolves domain names into IP addresses

Dynamic Host Control Protocol (DHCP) = every device on network must have IP address. This can be done through a static address or a dynamic one. If more than one device has same IP address, packets will get lost as it might try to deliver to both locations. Using DHCP lets one host control / manage IP distributions.

Network Address Translation (NAT) = address are translated via the router such that internal address maybe different than external address.

CNAME = Canonical Name record = a type of resource record in the DNS used to specify that a domain name uses the IP addresss of another domain, the �canonical� domain. For example

 

RESTFul Web Services (XML, JSON)

 

 

Public key encryption, SSH, access credentials, and X.509 certificates

 

 

IOPS

Input / Output Operations � commonly used for benchmarking computer storage devices (hdd,sdd,san). A 7200RPM HDD has 75~100 IOPS, whereas a SDD SATA 3 Gbit can have 400 ~ 20,000 IOPS. Higher the faster.

 

AWS General

 

Region = separate geographical area / location. They are completely isolated from other EC2 regions. Not every region has AWS resources, for example, only following has EC2:

-          US East (N. Virginia)

-          US West (Oregon)

-          US West (norcal)

-          EU (Ireland)

-          Asia Pacific (Singapore)

-          Asia Pacific (Tokyo)

-          Asia Pacific (Sydney)

-          South America

Availability Zone = in each reach have several isolated locations

 

AWS Management Console

Web interface at https://console.aws.amazon.com

 

Command Line Interface (CLI)

Following the aws-cli.pdf document for setup and examples � need to following instructions and download and install the AWS CLI tool (MSI file). Also need to download and install Python. Need to create an access key in order for CLI tool to connect into AWS. This was created in IAM and downloaded locally. (rootkey.csv)

 

Software Development Kits (SDK)

Class libraries for various platforms including iOS and Android

 

Query API

Low level API accessed online via RESTful or SOAP


 

Compute and Networking

 

 

Amazon EC2 (Elastic Compute Cloud)

Virtual Servers in the Cloud

 

Features

-          Virtual machines (instances) / hypervisor created from an AMI

-          Pre-configured templates for instances (AMI) Amazon Machine Images

-          Various config for CPU, memory, storage, networking � instance types

-          Secure login using key pairs

-          Instance store volumes � temporary data that�s deleted when you stop or terminate your instance

-          EBS (Elastic Block Store) � persistent storage volumes for your data

-          Multiple physical locations of resources � in a single region can have multiple availability zones with EC2 instances copied (at least 2 copies = 2 availability zones in a region)

-          Firewall config � protocols, ports, source IP ranges and security groups

-          Static IP address for dynamic cloud computing � Elastic IP addresses By default � new instances get two IP addresses: private IP and public IP which is mapped to the private via NAT. If the DNS had to always switch to new public IPs, it would take a day or more, whereas Elastic IP is instant switchover to another instance.

-          Tags � metadata for EC2 resources

-          Vitual networks to isolate from rest of AWS cloud � VPC (Virtual Private Cloud)

-          Limit of 20 On-demand or Reserved Instances and 100 Spot Instances per region

-          *For important data � replicate to S3 or EBS volumes

 

 

There are 3 parts to an EC2:

-          Unit of control

Your stack (of software contained within the instance) which includes OS, web server, etc. It�s a bundle of you solution or application.

-          Unit of scale

Scale out the functions by having different instances for different functions (put web server on an instance and a business logic part of application on another instance)

-          Unit of resilience

As images are scaled out and replicated, obviously can recover more easily

 

When creating an EC2 instance, start small because it is easier to scale up. Commonly people get too large and end up downsizing to reduce costs. There are many instance types � from small micro (powerful as iPhone) to large with 244GB RAM etc.

 

EC2 Instance Types:

-          On Demand � pay by hour (for spikes)

-          Reserved � 1 to 3 year terms (reserved capacity and steady state). The hourly rate for this instance is discounted (less than the On-Demand rate).�

-          Light / Medium / Heavy Utilization instances -

-          Spot Instances � bidding on unused instances (for batch jobs off hours, Hadoop for data analyst)

Instances stopped don�t lose EBS volumes and can be start up again. You can also detach EBS and perform other config changes during this time. However, terminated instances loose EBS volumes and get deleted. Instance termination can be disabled as a whole.

 

Instances are deployed in region (geographical area) appropriate to user base or local laws. This includes Asia, Europe, Americas, etc. Also, instances come in variety of sizes with different use cases:

-          Micro = for lower throughput applications � when additional compute cycles needed periodically. Available as a EBS backed instance only.

-          HI1 = for random IOPS such as NoSQL databases, clustered database or OLTP systems (online transaction processing). The primary data storage is SSD volumes (instance store ephemeral) with a EBS backed root device.

-          HS1 = high storage density and high sequential read/write, ideal for data warehousing, hadoop/mapreduce, parallel file systems

-          GPU instances = for high parallel processing, such as scientific computing, engineering or rendering applications that leverage Compute Unified Device Architecture (CUD) or OpenCL

-          C1 instances = EBS optimized � maximizes EBS storage performance

 

Key Pair

Used to authenticate instance access. EC2 instance stores the public key � you keep the private key. Communications with instance are secured with the key. Not all instances need a key pair.

 

Security Group

Used to control access, like a firewall. Contains a name, description, and ports / protocols (ex SSH, FTP, HTTP, etc). Also can setup a source IP range which is where the administrator can connect from. Setting it to 0.0.0.0 means can be connected from anywhere. Best practice is to have a specific source IP connected.

Security Groups are different between EC2 Classic vs VPC.

 

IAM Roles

Set up roles that is a pre-config of what can be accessed across all of AWS, not just EC2. Best practice is to use roles and not use access keys. This way the keys never need to be referenced anywhere (like in code). Roles are temporary access to AWS resources. It is used during EC2 instance lifespan but not exists thereafter.

 

 

User Data (Linux)

Can be file or text sent in through CLI or API it gets processed by the metadata service. It goes into to the instance and can kick off scripts (batch). For example � can send a command to install various software and run it.

 

EC2 Windows EC2Config Service (Windows)

Similar as user data but for Windows. If IAM Roles setup, would follow those policies (for example if role is connected to S3 or other services it would setup config for all that).

 

Placement Group

Logical grouping of instances within single Availability Zone to enable full bisection bandwidth and low-latency network performance for tightly coupled, node-to-node communication typical of HPC applications (High Performance Computing)

 

Resizing

EBS backed instances must be stopped before resizing (config changes). Resizing is done manually by the user (or via API).

 

Instance Life Cycle

 

Starting / Stopping / Restarting Instances

EBS volumes are retained but RAM is lost. In Classic � the public / private IP are releases and new ones assigned when started, whereas in VPC � the public is only released and renewed (the private is retained). As for the Elastic IP, that is disassociated in Classic whereas in the VPC it is retained.

 

Termination

When an instance is being terminated � for EBS backed instances, everything except the OS volume is preserved and all EBS snapshots are preserved (in snapshots, the OS is also preserved). For S3 backed instances � all ephemeral volumes are lost.

 

Importing / Exporting

AMI VM can be import / export from Citrix Xen, Microsoft Hyper-V or VMware vSphere.

 

Monitoring and CloudWatch

Following metrics can be monitored via CloudWatch (graphs also available for each):

-          CPU utilization

-          Disk I/O

-          Network

-          Status

Various alarms and alerts can be setup

 

Troubleshooting

If an instance immediately terminates:

-          Check volume limit

-          AMI could be missing a part

-          SnapShot is corrupt

-          Check Console for description logs

Connection timed out

-          Check security group rules

-          Check CPU load on the instance

-          Verify the private key file and or user name for AMI

If Instance is stuck in Stopping phase

-          Create a replacement instance, kick that off and terminate the stuck one

Lost Key Pair

-          First create a new Key Pair

-          Stop all instances using the old one and point it to the new one

 

 

 

 

Amazon Machine Image (AMI)

A master drive of an instance / virtual machine, in which an EC2 instance stems from. These instances can be launched in EC2 or VPC.

 

AMI Types:

-          Amazon maintained (Ubuntu, RedHat, Windows � listed with price lowest to highest)

-          Community maintained

-          Your own machine image (can be private or shared with other accounts)

 

AMI Characteristics

-          Region (Region and Availability Zones)

-          OS

-          Architecture (64 / 32)

-          Launch Permissions = public (anyone can launch), explicit (to specific users only), implicit (for the owner only)

-          Storage for root device

 

Bootstrapping / Bake an AMI

-          Start an instance and install all the software needed. Save this off and then create new instances from this image which has become pre-installed and pre-configured.

-          This should be balanced though as some configurations may still be needed after creation of instance (ex � deploy latest code). So bake as much of config as necessary and leave the rest for dynamic config post instance startup.

 

EC2 Instance store volumes (ephemeral drives)

Two Storage Types for EC2

-          EBS

Storage on network

-          Instance Store (ephemeral drives)

Disk storage on the instance � which is different than EBS which is storage on the network. Instance stores has shorter lifecycle than EBS. If instance is lost, stopped or terminated, that storage will remove with it. Should not have database on here (that is usually on EBS or RDS).

 

Instance stores provide temporary block-level storage for EC2 instances. This is pre-attached/preconfigured on the same disk storage (same physical server) as the host EC2 instance. Some smaller micro instances (t1) use EBS storage only and no instance storage. Other HI1 instances may use one or more SSD-backed storage capable of 120,000 IOPS or 2.6 GB/sec of sequential read and write when using block size of 2MB.

Well suited for local temporary storage that is continually changing � such as buffers, caches, scratch data, and other temp content. Unlike EBS, instance store volumes cannot be detached or attached to another instance. Ideally suited for high performance (high I/O) workloads. Size of an instance store ranges from 150 GiB to 48TiB

EC2 local instance store are not intended to be durable disk storage. Only persists for the duration of EC2 instance. Should persist necessary data onto EBS or S3. Instances backed by EBS by default have no ephemeral volumes set for them. Also, they can only be assigned to one instance and never transferred,

The EC2 local instance storage capacity is fixed and defined by EC2 instance type. Cannot decrease/increase unless the EC2 instance instead is duplicated or removed.

Also these cannot be stopped, only set to running or terminated. Has storage capacity of 10 GiB and often now enough for Windows-type instances.

 

 

Elastic Batch Store (EBS) and Databases on EC2

EC2, together with EBS volumes, provides ideal platform for self managed RDBMS with prebuilt, ready to use solutions such as IBM DB2, Informix, Oracle, MySQL, MS SQL, PostgreSQL, Sybase, EnterpriseDB and Vertica. Performance is based on EC2 instance (memory, size, etc). RAID striping also available to increase speed or redundancy (RAID 0 or 1). But this is like traditional datacenters so scaling is based on EC2 instance setup. Amazon RDS and DynamoDB provide automatic scaling.

 

Provisioned IOPS Volumes

These are designed to meet the needs of I/O intensive workloads � such as databases. They perform 4000 IOPS per volume but must required at least 100 GB size for 3000 IOPS or more. Sizes range from 10GB to 1TB. This is ideally used for database workloads. Still weaker/slower than instance store.

 

Block Device Mapping

This is a storage device that moves data in sequences of bytes or bits to support random access and use buffered I/O. This is used in hard disks, CD-ROMS, flash drives, etc.

 

Workload Demand

Average Queue Length = the number of pending I/O requests for a device. Optimal is to have average queue length of 1 for every 200 Provisioned IOPS.

 

Pre-Warming EBS volumes

When instance first created or restored, the initial I/O against that volume will be slow as the data is not yet cached. But subsequent I/O will be at optimal performance. To avoid this initial I/O charge, there is pre-warming, which does an initial I/O to get cached.

 

 

 

Auto Scaling

Auto Scaling automatically adds/removes EC2 instances based on triggers/alerts. There are 3 parts to auto-scaling EC2:

-          Launch Configurations

Set of parameters such as instance size, security groups, etc

-          Auto Scaling Group

Tells what to do once launched, such as which AZ, which load balancer to use, and most importantly � the min and max number of servers to run at any given time (cool-down period)

-          Auto Scaling Policy

 

Set cooldown period � which is amount of time to wait when adding a new instance. Don�t want to add/remove instances constantly. Sets the min/max of capacities and can be triggered by CloudWatch events.

 

Auto Scaling charges by hour with one hour up front when started. Scaling takes time � it is a polling system so it works off time intervals, also there is boot time to account for, and ELB needs a few cycles to start calling on the new instance.

 

CloudFormation is used to setup the configuration for the auto scaling.

 

CloudWatch can be used to trigger/alarm the auto scaling. For example, when CPU utilization is greater than 50% for more than 5 minutes, scale up and when CPU is less than 30% for more than 5 minutes, scale back down. But CloudWatch could also just send a message (SNS) when these conditions are met instead of actually kicking off the auto scaling. This way the auto scaling is done manually by the admin when reading these messages.

 

Limits for auto-scaling:

-          20 EC2 instances

-          100 spot instances

-          5000 EBS volumes or an aggregate size of 20TB (Total)

-          These max instances are for state of pending, running, shutting down or stopping only. There is another limit of 4x the max instances for total number of instances in any state.

 

Elastic Load Balancing (ELB)

ELB does 3 things:

-          Spread

Balance resources across availability zone

-          Offload

Remove load from EC2 instance

-          Health check

Monitor and setup alerts on whole layers � can take out whole sections of architecture (wherever egress points are)

 

Best Practices:

-          Persistent HTTP connections

-          Don�t use underlying IP address, just use DNS names

 

Setup is done through the EC2 console. Select the PING protocol/port/path (where to monitor the traffic � which can be the index.html page or just root directory). The options for load balancer are:

-          Response Timeout = time to wait when receiving a response from health check

-          Health Check Interval = amount of time between health checks

-          Unhealthy Threshold = number of consecutive health check failures before declaring an instance unhealthy

-          Healthy Threshold = number of consecutive health check successes before declaring an instance healthy

 

 

Example here (this is for the two Linux instances in Lab examples)

http://LinuxLoadBalancer-1936685151.us-west-2.elb.amazonaws.com

 

Lab Example

Following is from LAB document example

Create 2 Linux instances using a bootstrap script given by Amazon for this demo:

#!/bin/sh

curl -L http://bootstrapping-assets.s3.amazonaws.com/bootstrap-elb.sh | sh

 

Two Linux instances were created with Apace and PHP pre-installed and a default index.php page. To connect into these instances, use the original key pair (downloaded locally).

http://ec2-54-200-250-195.us-west-2.compute.amazonaws.com/

http://ec2-54-201-108-203.us-west-2.compute.amazonaws.com/

 

Setup the ELB instance and point it to the two instances. ELB site is here:

http://labelb-1827406090.us-west-2.elb.amazonaws.com/

 

Setup the Auto Scaling Group

http://ec2-54-201-210-96.us-west-2.compute.amazonaws.com/

Terminated this instance and the auto scale automatically created a new one:

http://ec2-54-200-154-184.us-west-2.compute.amazonaws.com/

 

Create an SNS Topic � AutoScaling � and set the receipient email address. Then on the Auto Scaling Group � set notifications to this topic (this can be done through AWS CLI or Console).

 

Create Scaling Policies � one for scaling up (Adds Instance) and another for scaling down (Removes Instance). This also automatically sets alarms in CloudWatch. Note that for this LAB, we are not using ELB with Auto Scaling. Ideally, you would be using both.

 

Creating a Windows instance � no user data needed and instead uses EC2Config Service (read above)

When first logging in, have to use the key access file to decrypt the Administrator�s password.

Administrator / LQUu*%KuQHT

*The private key file is in the same AWS wiki folder as this study guide

http://ec2-54-201-201-92.us-west-2.compute.amazonaws.com

 

Elastic IP created and set to the windows instance here:

http://54.201.201.92/

 

To connect into EC2 instance using Putty, reference the end of this Lab guide or this:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

 

Amazon VPC (Virtual Private Cloud)

VPC is a network layer specifically for EC2. Instances get deployed into the VPC and user may select IP ranges, subnets, configure route tables, network gateways and security settings. There is a default VPC that every instance gets launched into automatically. Instances must use Internet Gateway to access internet, which is automatically provided in default VPC. Instances deployed in non-standard subnets do not get Public IP so it must have Internet Gateway and an Elastic IP to have internet access.

 

Subnets are used to group instances. Instances can only view other instances in its own subnet. Private Subnets can have a NAT instance in the public subnet (with an EIP) that the private instances can connect to for internet access (for example to get updates). Subnets with Internet Gateways are considered Public Subnet, whereas subnets without are considered Private Subnet.

 

In EC2, there are some benefits in using VPC over Classic: (default VPC vs static)

-          Assign static private IP to instances that persist across starts and stops (never looses IP)

-          Assign multiple IP to instances

-          Define network interfaces, and attach one or more interface to instance

-          Change security group membership for your instance

-          Control the outbound/inbound traffic from instance (ingress / egress filtering)

-          Network Access Control List (ACL)

Some capabilities

-          User-defined address pace up to 65k+

-          Up to 200 user defined subnets (setup virtual routing, DHCP servers, NAT instances, Internet gateways, ACLs)

-          Private IPs stable once assigned

-          Elastic Network Interfaces (ENI)

-          VPC can span multiple AZ although subnets must remain in single AZ

-           

 

Within VPC can run dedicated instances (not shared with any other customers). Today � VPC is enable as default for EC2 instances. EC2-Classic instances has private IP from share private IP address range (within AWS). Each instance also ahs public IP address from Amazon�s IP pool. With EC2-VPC, each instance gets private IP from the VPC�s private IP range. There is no public IP by default, unless set by the user via Elastic IP. But this will always first go through the VPC�s gateway at the network edge.

 

Dynamic Host Configuration Protocol (DHCP)

To setup your own domain names, create a new DHCP setting in VPC. Can have up to 4 separate DNS servers defined.

 

Amazon DNS

By default, all instances in default VPC get DNS hostnames (name = AmazonProvidedDNS). If this is disable, those instances would not be accessible from internet.

 

 

Security � Security Groups and Network Access Control List (ACL)

VPC has two features for security:

-          Security Groups = like a firewall for EC2 instances controlling inbound and outbound traffic at instance level. These groups are specific to the VPC only.

-          Network Access Control List (ACLs) = like a firewall for associated subnets, controlling inbound and outbound traffic at subnet level. These lists are specific to the VPC only.

Each EC2 instance requires 1 or more security groups (if not set, will default to default SG). The Network ACL is optional (additional) level of security that can be added on top of this. Use the AWS IAM to manage which users have control over modifying these policies.

 

Some differences between Security Groups and Access Control Lists (ACL)

-          SG is at instance level, ACL is at subnet level

-          SG has allow rules only, ACL can have allow rules and deny rules

-          SG is stateful (return traffic automatically shown), ACL is stateless (return traffic must explistly allowed)

-          SG evaluates all rules before allowing traffic, ACL process rules in order before allowing

 

Recommendations for ACL � Best Practice

Have a single subnet that can receive and sent to internet (like a DMZ), then setup ACL as following for inbound & outbound:

-          Rule 100 = TCP 80 allow (http)

-          Rule 110 = TCP 443 allow (https)

-          Rule 120 = TCP 22 allow (SSH)

-          Rule 130 = TCP 3389 allow (rdp)

-          Rule * = all / all deny (blocks everything else except above)

 

Elastic Network Interfaces (ENI)

ENI is a virtual network interface that attaches to EC2 instances with following attributes (only available in VPC), these attributes follow the network interface not the instance so it can be moved around to other interfaces easily

-          Primary private IP address

-          One or more secondary private IP addresses

-          One Elastic IP per private IP address

-          A MAC address

-          One or more security groups

-          Source / destination check flag

-          A network interface that can attach to an instance, then detach it and re-attach it to another instance.

Attaching multiple ENI to an instance is useful when:

-          Create a management network

-          Use network and security applicances in your VPC

-          Create dual-homed instances with workloads on distinct subnets

 

Lab Demo

There are 4 VPC config types already template in AWS Console:

-          Single public subnet with internet access

-          Public and private subnets � the private subnet can only access via NAT through the public subnet

-          Public and private subnets with VPN � same as above except there is a VPN connection available to private subnet

-          Private subnet with VPN only

 

This lab will create 1 VPC with 1 subnet and then add a 2nd subnet to it. One will be public while the other is private.

 

 

AWS Direct Connect

Dedicated Network Connection to AWS

-          Dedicated bandwidth to AWS in 1Gbps or 10Gbps

-          Full access to public endpoints, EC2, S3 and VPCs (VLAN tagging maps to public side or VPC)

Requirements for Direct Connection are:

-          Collocated in existing Direct Connect location

-          Service provider is member of AWS Partner Network (ARN)

-          Single mode fiber, 1000BASE-LX � for 1gb / 10gb connections �

A request has to be submitted to implement Direct Connect

 

Amazon Route 53

Scalable Domain Name Web Service

 

Latency-based Routing (LBR) = application is in different Amazon EC2 regions which has LBR records for each location (with geo information). Route 53 will route end-users to the endpoint that provides the lowest latency.

 

DNS domain name cannot exceed 255 bytes including the dots of any ASCII characters (though some requires escape characters for usage), Route 53 supports any valid domain name.

 

When creating a hosted zone, Route53 automatically creates 4 Name Server (NS) records and a Start of Authority (SOA) records.

 

A Format

An A record value must be of IPv4 format: 192.0.2.1

AAAA Format

An AAAA record value must be of IPv6 format in colon-separated hexadecimal format: 2001:0bd8:85a3:0:0:8a2e:0370:7334

CNAME Format

Any subdomain that is not using the zone apex � example.com is a zone apex but www.example.com are CNAME

MX Format (Mail Host)

NS Format (Name Server)

Route53 name servers look like this:

-          Ns-2048.awsdns-64.com

-          Ns-2049.awsdns-65.net

-          Ns-2050.awsdns-66.org

-           

PTR Format

SOA Format (Start of Authority)

SOA record identifies the base DNS info about the domain:

Ns-2048.awsdns-64.net. hostmaster.example.com. 1 7200 900 1209600 86400

SPF Format

SRV Format (Space-Separated Values)

TXT Format

 

Route CloudFront URLs via Route53

Route Elastic Load Balancing (ELB) via Route 53

Route to EC2 instance through an Elastic IP (EIP) via Route 53

Route to S3 object or RDS

 

Name Servers = Links your Registrar with Hosting Service Provider (like a main DNS)

Start of Authority =

 

Hosted Zone

Collection of resource record sets for a specified domain. (ie example.com) � tells DNS how to route traffic for that domain

 

Weighted Resource Record

This is to help Route 53 determine which record set to select from when given a domain name. Each resource record is weighted and divided by the total number of weights. This percentage gives the probability of being selected.

 

Alias Resource Record

Alias Resource record set contains a pointer to a CloudFront, an Elastic Load Balancer, an S3 bucket that is static web content, or another Route 53 resource record in same hosted zone.


 

Storage and Content Delivery

 

AWS Storage Options:

 

S3 (Simple Storage Service)

Scalable Storage in the Cloud

 

S3 is storage for the internet accessible from EC2 or anywhere on web. Supports encryption and virtually unlimited amount (unlimited number of buckets with unlimited objects in them, but each object limited to 5TB). Each object has unique developer-assigned key. Accessed through REST API (Java, .Net, PHP, Ruby), AWS CLI (command line) and web.

 

Setup auto archive to Glacier by using the Life Cycles option. Typically used for static content and as an origin for CDN (CloudFront). Examples usage are photos, videos, large-scale analytics (financial transactions), critical data, disaster recovery. S3 is redundant and versioning control capable. Often used inline with database (dynamoDB) where db has metadata (object name) that is referenced to S3.

 

S3 is 99.99999999% (11) durability per object and 99.99% availability (1 year). Reduced Redundancy Storage (RRS) is S3 option that has lower durability for lower cost (99.99% durability). This is a cost-effective yet highly available solution. Great for anything that can be easily reproduced (like thumbnails or transcoded media).

 

Multi-Factor Authentication (MFA) � requires two forms of authentication to delete � AWS account credentials plus six digit token code. Accounts can be created that are used for roles and authorization. These accounts can be geographical, allowing users of particular region to only access that region�s S3 buckets. These accounts fall under Bucket policies, which apply to all objects. Specific object authorization can be setup through permissions (object level). Policies can be controlled with IAM.

 

Some Anti-Patterns:

-          S3 is not a file system, not POSIX-compliant

-          S3 is not queryable, must use bucket name and key to retrieve

-          S3 has high read / write latencies, so not intended for dynamic or rapidly changing data

-          For long term storage (archive) where not accessed often, it is more cost effective to use Amazon Glacier

-          No dynamic website hosting (but good for static website hosting)

 

Developer Guide

Every object is contained in a bucket and single AWS account can have up to 100 buckets. Bucket example, object photos/puppy.jpg is in the example bucket but the URL could be:

http://example.s3.amazonaws.com/photos/puppy.jpg. Buckets also serve to organize, identify, access control of objects as well as aggregation for usage reporting. Every bucket has a key � unique identifier. In URL above, example is the bucket and �photos/puppy.jpg� is the key.

The stored data are called objects in S3. Objects consist of the actual data plus metadata (date, version, etc). There is unlimited storage for objects. You cannot create a bucket inside another bucket. Bucket ownership cannot be transferred.

 

Regions:

US Standard

US West (Oregon)

US West (North Califronia)

EU (Ireland)

Asia Pacific (Singapore)

Asia Pacific (Syndney)

Asia Pacific (Tokyo)

South America (Sao Paulo)

 

Objects stored in a region will never leave that regionthough it may be replicated into other regions. In each region the objects are replicated across multiple servers. This may take some time. For example, �when an object is created, modified or deleted the object might not appear, might not exist, return prior data or return deleted data until the change is fully propagated. S3 does not support object locking. If two object received simultaneously, it will store the one with the latest timestamp.

 

Bucket Policies (Access control)

Bucket policies provide centralized, acess control to buckets and objects based on variety of conditions, including S3 operations, requesters, resources and aspects of the request (eg IP address). For example, account policy could control access to particular S3 bucket, origin (such as from corporate network), business hours and from a custom application (by user agent string). Only bucket owner can set policies for that bucket.

Also can use Identity Access Management (IAM) to create users under single AWS account with different levels of access, which is controlled by their own set of access keys (every IAM could have different keys).

 

Operations

-          Create a bucket

-          Write an Object

-          Read object

-          Delete object

-          Listing keys � list of contents

 

Bucket Names

There is direct mapping between S3 buckets and subdomains. Objects are access by the REST API under bucketname.s3.amazonaws.com.

 

Versioning

Versioning allows you to preserve, retrieve, and restore every version of every object stored in this bucket. This provides an additional level of protection by providing a means of recovery for accidental overwrites or deletions. Once enabled, Versioning cannot be disabled and you will not be able to add Lifecycle Rules for this bucket.

 

Alerts and Error Handling

S3 buckets can setup Simple Notification Service (SNS) on an event key. Due to distributed nature of S3, requests can be temporarily routed to the wrong facility � especially immediately after buckets are created or deleted.

 

Lab Example

Following buckets created with URL

Sfishs3bucket01 (private, RRS, USRegion )

https://s3.amazonaws.com/sfishs3bucket01/images/20130609_091627.jpg

Sfishs3bucket02 (public, logging, versioning*, northern California)

https://s3-us-west-1.amazonaws.com/sfishs3bucket02/images/20130609_091949.jpg

-          File modified � view version

Sfishs3bucket03 (static site, life cycle, USRegion, lifecycles, Glacier Archive* )

https://s3.amazonaws.com/sfishs3bucket03/index.html

 

*Buckets that are under version control cannot be set with Life Cycle Rule (archiving)

 

Glacier

Archive Storage in the Cloud

Some quick facts

-          Extremely cheap archive cloud storage ($0.01 / GB / month).

-          Retrieve data in 3 � 5 hours

-          Use data lifecycle policies to move data from S3 to Glacier. Can also directly import/export.

-          Annual durability is 99.99999999% (11) per year.

-          Performs regular systematic data integrity checks and auto self-healing.

-          A single archive is 4TB but no limit to total amount for overall service. Scales up and down.

-          2 ways to interface

o    REST web services (java or .Net SDKs available too). Can setup jobs and send notifications through SNS.

o    Object lifecycle management via S3 (auto policy driven). Check S3 guide for details.

-          Obviously, since Glacier is archive service, we have following 2 anti-patterns:

o    Rapid data changes

o    Real time access

 

Vaults

-          Controls various archives

-          Controls access

-          Send notifications to SNS

 

Lab Example

Store objects to Glacier from S3

Life Cycle Rules

-          Setup in S3 � create date start and expire time

-          To restore � done in S3 as well

 

 

Amazon EBS (Elastic Block Store)

Provides durable block-level storage for EC2 instances (VM) which are NAS an persisted independently � physical hard drive, can be formatted with file system of your choice and interfaced by the instance OS selected.

 

Some quick facts:

-          EBS becomes a boot drive for EC2

-          EBS volume sizes range from 1GB to 1TB.

-          Ideally used as primary storage for database or file system, or for any application requiring block-level storage.

-          EBS has high and consistent rate of disk reads and writes.

-          Two types of volume types:

o    Standard volumes � for boot drives that go about 100 IOPS and supports burst capability

o    Provisioned IOPS volumes � for high performance and intensive IO, such as databases. Supports 2000 IOPS but can be striped to delivery thousands per EC2 instance

-          Since EBS is NAS (network storage), the performance can be impacted by the network IO. EC2 offers instance types where EBS optimized for dedicated connection speed of 500 MBps or 1GBps. Optimized settings deliver 10% of Provisioned IOPS performance 99.9% of the time.

-          Multiple EBS can be attached to EC2 instance. Can also do RAID 0 or logical volume manager software to aggregate available IOPS, total volume throughput and total volume size.

-          Backup (snap shots) are persisted on S3, and incremental � only containing most recent changes since last snapshot; EBS volumes of 20GB or less of modified data since snapshot can expect annual failure rate (AFR) of 0.1% to 0.5%. Larger volumes should expect higher AFR values, proportionally.

-          If EBS volume fails, volume recreated from all prior snapshots. But EBS volume is dependent on Availability Zone, so if zone itself goes down, then that volume does too. But snapshots are persisted into all zones in a region.

-          Cost of EBS is in three components

o    Provision storage (the volume)

o    I/O requests (frequency)

o    Snapshot storage (backup)

-          EBS volumes cannot be resized. If resizing is needed, two options:

o    Attach a new volume and use together with existing one

o    Snapshot original volume, remove that volume, create new desired volume, restore from snapshot

-          Interface available in SOAP and REST services. This is to create, delete, describe attach and detach EBS volumes from EC2 instances or to create, delete and describe snapshots from EBS to S3

-          No Interface available for the data itself. Just appears as drive under EC2.

-          Anti-Patterns:

o    Temporary Storage � consider using SQS, ElastiCache

o    Highly-durable storage � consider using S3 or Glacier

o    Static data or web content � consider using S3

 

 

 

AWS Import/Export

Move large amounts of data in/out of AWS EBS snapshots, S3 buckets and Glacier vaults using portable storage devices for transport (bypasses internet using internal network). Typically used for data that would take a week or more to transfer over the internet. Examples:

-          Initial data upload to AWS

-          Content distribution or data interchange to/from customers

-          Offsite backup / archive

-          Disaster recovery

Typically rates at 100 MBps but bounded by combination of read/write of portable storage device.

 

 

 

Storage Gateway

Integrates On-Premises IT Environments with Cloud Storage. Ideally for backups to S3, disaster recovery and data mirroring to cloud-based compute resources.

AWS Storage Gateway software appliance is downloaded as a VM image to local datacenter. From there, this will connect to local iSCSI devices. Having this retains some portion of data locally (like cache data) up to 32TB sizes (per each volume). This will be connected back to S3 in the backed and asynchronously backing up to cloud.

 

 

 

 

 

Amazon CloudFront

CloudFront is a CDN web service that integrates with AWS. Requests for content are automatically routed to nearest edge location (high performance). User requests are invisibly redirected to a copy of file at nearest edge location. Content is organized into distributions and has a unique CloudFront.net domain name (abc123.cloudfront.net). Distributions can download your content (HTTP/HTTPS) or stream them (RTMP).

http://aws.amazon.com/cloudfront/

 

 

CloudFront serves static content, not dynamic. Dynamic content comes from EC2. Difficult to cache dynamic content, so that is usually not in CloudFront. But cache as much as you can. Query strings can be cached (/api/GetBooks?cat=math).

https://dsys5zajh0hn7.cloudfront.net/Cloudfront-Diagram_Website_Updated.jpeg
Each second of download/load time on webpage is costly. Typical page load looks like (multiple lines represent various page content which can come from various origins):

How can dynamic content be optimized?

Reduce DNS time � Route 53

Reduce TCP Connection time � Keep-Alive Connections & SSL Termination

Reduce First Byte time � Keep-Alive Connections

Reduce Content Download time � TCP/IP optimizations (also Route 53)

 

Keep-Alive Connections

HTTP runs on TCP/IP and requires handshakes (syn, syn-awk, awk). Example without CloudFront:

Keep-Alive connections uses the initial SYN/SYN-ACK/ACK for all connections and requests thereafter just comes thru.

Keep-Alive connections work best when there are multiple users / many requests. For SSL, there are even more TCP handshakes so Keep-Alive reduces time even more in SSL. This is CloudFront SSL Termination � can be Half Bridge or Full Bridge.

 

CloudFront Slow-Start Optimization

This is used to reduce content download. Slow-Start used to build up to the packet transfer. Doesn�t request next set until ACK received. CloudFront Slow-Start optimization works because this Slow-Start is only done for the initial connection / user, any thereafter it just sends the full 4 packets (then more after ACK).

 

Cf_dg_CloudFront_Developer.pdf

CloudFront is a web service focused on improving performance (speed) via distribution. It retrieves content from edge location (data center) or custom source (your server) with the lowest latency. If already in an edge location, retrieves content instantly.

 

Example: A regular jpg image from some site may take 10 hops (tracert) to get to end user via several states (possibly even further from source). CloudFront eases this by reducing the number of hops.

 

The Origin Server can be of two types:

-          HTTP/HTTPS = Amazon S3 bucket, Elastic Compute Cloud (EC2) or own web server

-          RTMP = Adobe Media Server which must always be Amazon S3 bucket (uses Adobe Flash media Server over port 1935 and port 80)

 

Content / data served are called objects and access can be controlled via CloudFront URLs. CloudFront sends config info (not content) to edge locations � which caches copies of your objects. Can customize this via headers to have expiration times (remove cache from edge locations). Defaults to 24 hours but can be set from 0 � infinite.

 

CloudFront URLs in form of:

http://abc123.cloudfront.net/mydatacontent.jpg

 

DNS handles user requests and routes to lowest latency CloudFront edge location. There, CloudFront checks cache, if not found, goes to origin to grab it. Instantly returns content while caching. After 24 hours (or config set in header), CloudFront compares cache version against origin to determine latest version. Updates cache as necessary. Request params are part of cache, for example, following are all cached separately, although it may all return same content:

http://abc.example.com/images/a.jpg?p=1

http://abc.example.com/images/a.jpg?p=2

http://abc.example.com/images/a.jpg?p=3

 

CloudFront doesn�t retrieve object until user requests for it. When objects updated, it should have new name (*_v2) otherwise you will also have to wait for the original object to expire before retrieving the update (defaults to 24h). If an object is not requested often, CloudFront may evict it to make space for more demanded objects.

 

Users can also modify content in CloudFront via HTTP DELETE, OPTIONS, PATCH, POST and PUT. CloudFront is cheaper to deliver content than from S3 (less hops too so faster).

 

To route using CNAME or domain name (instead of the abc.cloudfront.net), setup in DNS. This can also be done in Route 53 if applicable.

 

Serving Private Content and Accessability

Can control private content access by:

-          Restrict access to objects in CloudFront edge caches by using signed URLs (pub/priv keys)

-          Restrict access to S3 such that all requests must go through CloudFront. Then create special user called Origin Access Identity, which has access to read the bucket and no one else can. Remove all other access permissions in S3.

-         

-          For HTTP servers, this is not possible since the content has to be public for CloudFront to access it. So the content is not as well controlled as S3.

 

If loading content from own domain and it is SSL (HTTPS), must load that SSL certificate to AWS Identity and Access Management (IAM) certificate store. Then add the domain to CloudFront distribution store and let it pickup the content. (certs must be X.509 PEM format)

 

 

Running the Lab (starts on page 11)

Create S3 bucket � sfishs3bucket01

URL to content:

S3 = https://s3.amazonaws.com/sfishs3bucket01/images/20130609_091627.jpg (US National)

https://s3-us-west-1.amazonaws.com/sfishs3bucket02/images/20130609_091627.jpg (Norcal)

CloudFront = https://d1gh6jsmdgiqiz.cloudfront.net/images/20130609_091627.jpg

 

CNAME can be used in CloudFront (alias � domain names that point to another domain name, but not an IP. Example: public.example.com www.example.com)

 

Follow the CloudFront wizard to create the new cloudfront for an S3 bucket. Wait for CloudFront status to be �Deployed� and then test it by access via CloudFront URL.

S3 = https://s3.amazonaws.com/sfishs3bucket01/web/index.html

CloudFront = https://d1gh6jsmdgiqiz.cloudfront.net/web/index.html

 

Create Origin Access Identity � In CloudFront dashboard (create/edit) select �Yes� for �Restrict Bucket Access�. This will open up options for �Origin Access Identity� (create / use existing). You can use a single identity for multiple distributions to various buckets. The limit is 100 total identities, but really you should only need to use one for your application.

 

Using signed URLs � create trusted signer (via User � Security Credentials). Goto cloudFront Key Pairs and create new key pair. Download the private and public keys.

 

Troubleshooting

 

Log Diagram

 

Some basic CloudFront troubleshooting (on top of logs) when user cannot view files on distribution:

-          Make sure account is signed up for both CloudFront and S3

-          Check object permissions on S3

-          Check CNAME and make sure its pointing to correct location

-          Check URL

-          Are you using a custom origin for content? If so, check there

 

ERROR: Certificate xxx is being used by CloudFront

When trying to delete an SSL cert from the IAM certificate store, you get this error.

-          CloudFront distribution must be associated either with default certificate or a custom SSL one. Make sure to rotate distribution to another certificate OR revert from using the custom one back to the default one.

 


 

Database

 

 

Amazon RDS (Relational Database Service)

Managed Relational Database Service provides capabilities of MySQL, Oracle, or MS SQL Server as a managed, cloud-based service. Eliminates administrative overhead associated with launching, managing and scaling own relational DB on EC2 or another computing environment. Backups are automatically done nightly � but DB snapshots can also be initiated by user. Interface into the database is direct (like any other db) via the database server�s address. For example � original connection string of dbserver.example.com is replaced with dbserver.c0caffpest.us-east1.rds.amazonaws.com. RDS can use IAM and VPC for security

 

RDS configurations can range from 64-bit 1.7GB RAM with single EC2 compute unit (ECU) up to a 64-bit 68GB RAM with 26 ECUs. IOPS range from 1,000 � 30,000.

 

To increase I/O capacity do any of following:

-          Migrate DB instance to high I/O capacity

-          Convert from standard storage to Provisioned IOPS storage

-          If already Provisioned IOPS, provision additional throughput capacity

Use of IAM to control DB administration, such as � create, modify or delete RDS resources, security groups, option groups or parameter groups; also remember to rotate credentials often.

 

RDS Multi-AZ

Synchronously replicates data between primary RDS DB instance with a standby instance in another availability zone. Automatically turns over the to the standby instance on failure. (takes 3 minutes)

 

RDS vs Database on EC2

EC2 databases are ideal for application require more control not supported by RDS, ie max level admin control and configurability (like controlling it from the OS level). But in EC2, backups / snapshots need to be manually done or setup a job for. In RDS, its automatically controlled. If using MySQL, Oracle, ProtgreSQL or MSSQL, RDS is really the better way to go (patching, auto backup, Provisioned IOPS, replication, easier scaling)

 

RDS Components

-          DB Instances (MySQL, PostgreSQL, Oracle and MS SQL) size ranges from 5GB to 3TB

-          Regions and Availability Zones (AZ)

o    Create in one region but can use multiple AZ for performance and redundancy / failover support.

-          Security Groups (firewall filter)

o    DB security group controls access to DB that is not in a VPC

o    VPC security group controls access inside VPC

o    EC2 security group controls access to EC2

-          DB Parameter Groups

o    Contains engine config values that can be applied to one or more DB instances

o    This is specific to database engine and version

-          DB Option Groups

o    For various tools based on the DB engine (currently only available for Oracle, MySQL and MSSQL)

o    The tools vary based on the DB being used (Oracle has most tools)

-          Reserved Instances

o    A one-time up front cost reserves an instance for a one or 3 year term at much lower rates

o    Available in 3 variants: Heavy, Medium and Light Utilization

-           

 

Terminology and Concepts

CIDR = Classless Inter-Domain Routing, allocates IP and routing IP packets

DB Instance = the database server in cloud (RDS does not support direct host access, must remote); can have up to 40 RDS DB instances, 10 can be Oracle / MS SQL

DB Instance Class = memory capacity of DB (t1.micro, m1.small, m1.medium �)

RDS Storage = some factors that can affect performance are

-          DB snapshot creation

-          Nightly backups

-          Multi-AZ creations

-          Read replica

-          Scaling Storage

Regions and AZ = any RDS instance initiated is for that region only. When using multi-AZ deployment, there is no standby instance.

Maintenance = controlled by the user, use multi-AZ to minimize disruptions

VPC = all RDS instances should be in a VPC (if not at least the default which is now mandatory), and should have at least two AZ in the region;

Instance Replication = done through AZ and automatically changes instances when failure. For MySQL, uses the replica feature built in MySQL. The standby instance is not in the same AZ as the running instance.

Events = tracks all activity in RDS � this is a �DescribeEvent� action

SOAP = only available through HTTPS

 

MySQL

MyISAM does not support crash recovery

Federated Storage Engine is not supported

Oracle

Two types of licensing � BYOL (bring your own, contact Oracle for support) or License Included (where AWS owns the license and no additional needed, contact AWS for support)

The Database Diagnostic Pack and Database Tuning Pack are only available on Enterprise Edition

MSSQL

No support for increasing storage (due to Windows Server lack of striped storage extensibility), therefore initial creation should anticipate future growth accordingly

Max number of db is 30, on MS Server Edition � 1024 GB max storage, for MS Express Edition � 10 GB

PostgreSQL

Minor version upgrades will be automatically performed by AWS RDS, based on user�s availability window.

 

Making Change to DB Instance

When renaming an instance � automatically removes old DNS name, also replicas name stay attached and unchanged, metrics follow names so it will start over unless another instance uses the old name again

 

Lab

Created MySQL DBinstance using the Database Security Group (allows 3306 inbound/outbound); to connect run following:

$ mysql -h labmysql.c9buqst6tnbq.us-west-2.rds.amazonaws.com -P 3306 -u dba -p -b -v

Enter password: **********

Welcome to the MySQL monitor.� Commands end with ; or \g.

Your MySQL connection id is 25

Server version: 5.6.13-log MySQL Community Server (GPL)

 

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

 

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

 

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 

mysql> show databases;

--------------

show databases

--------------

 

+--------------------+

| Database���������� |

+--------------------+

| information_schema |

| DBLab1��� ���������|

| innodb������������ |

| mysql������������� |

| performance_schema |

+--------------------+

5 rows in set (0.05 sec)

 

mysql> use DBLab1

Database changed

 

Now can create and modify tables.

mysql> select * from LabTable1;

--------------

select * from LabTable1

--------------

 

+----+------------+

| id | data������ |

+----+------------+

|� 1 | L1 RECORD1 |

|� 2 | L1 RECORD2 |

|� 3 | L1 RECORD3 |

|� 4 | L1 RECORD4 |

|� 5 | L1 RECORD5 |

+----+------------+

5 rows in set (0.04 sec)

 

mysql> select * from LabTable2;

--------------

select * from LabTable2

--------------

 

+----+------------+

| id | data������ |

+----+------------+

|� 1 | L2 RECORD1 |

|� 2 | L2 RECORD2 |

|� 3 | L2 RECORD3 |

+----+------------+

3 rows in set (0.04 sec)

 

mysql> select * from LabTable3

��� -> ;

--------------

select * from LabTable3

--------------

 

+----+------------+

| id | data������ |

+----+------------+

|� 1 | L3 RECORD1 |

|� 2 | L3 RECORD2 |

+----+------------+

2 rows in set (0.04 sec)

 

 

Amazon DynamoDB

Predictable and Scalable NoSQL Data Store that stores structured data in tables, indexed by primary key, and allows low-latency read and write access to items ranging from 1 byte � 64 kb (cell in table value).

Some facts:

-          No fixed schema, so each data item can have different attributes

-          Primary key can be either single-attribute hash key or a composite hash-key

-          Supports three data types: number, string and binary (scalar and multi-valued sets).

-          Automatically replicated to 3 different availability zones in region.

-          Quick performance with the use of SSD and limiting indexing on attributes.

Data Model

-          Tables, Items and Attributes

o    Database is collection of tables, where table is collection of items and each item is collection of attributes

o    Tables are schema-less, just collection of various items

o    Item has attributes that must be name value pair and less than 64 KB

o    No NULL or Empty string attributes are allowed

-          Operations

o    Table � create update delete

o    Item � add, update, delete item

o    Query and Scan

o    Data Read

 

 

 

Amazon ElastiCache

Managed In-Memory Cache service using either Memcached or Redis (runs in VPC):

-          Memchached � memory object caching system for code / applications

-          Redis � in-memory key-value store that supports data structures such as sorted sets and lists. Often used as in-memory NoSQL database

Data Model

-          Cache Nodes = smallest block of memory deployment

-          Cache Cluster = group of cache nodes

-          Cache Parameter Groups = parameters to manage runtime settings (used during startup)

-          Replication Groups = duplicates / replicas on more clusters to avoid data loss

-          Security = controlled through the subnet group in VPC

By default, cache clusters are not redundant and standalone. If using Redis, can create replication group to enhance scalability and avoid data loss. (only for Redis, not Memcached)

 

 

Amazon Redshift

Managed Petabyte-Scalable Data Warehouse service optimized for datasets ranging from gigabytes up through petabytes or more. Some common use cases are:

-          Analyze global sales data

-          Store historical stock trade data

-          Analyze ad impressions

-          Aggregate gaming data

-          Analyze social trends

-          Measure clinical quality, operation efficiency and financial performance in health care

Nodes can be a few GB up through PB, controlled through the AWS console or APIs. Costs less than $1000 per TB per year.

 

 

Amazon SimpleDB

tbd

 


 

Deployment, Administration, Management

 

AWS IAM (Identity and Access Management)

When initial AWS account creation � that root account has unlimited access to everything. After signing in, should setup IAM. Permissions are defined in policies. AWS provides policy templates that you can use. Signing up for an EC2 automatically signs you up for S3 and VPC. Watch video on IAM help page (best practices)

 

https://573575957043.signin.aws.amazon.com/console

Created an account alias:

https://solidfish.signin.aws.amazon.com/console 

 

There is an IAM policy simulator.

 

Groups

Groups are used to manage permissions. Create groups using permissions policy templates. Some example templates are:

-          Administrator � full access to AWS services and resources

-          Power user

-          Read only

-          CloudFormation Read Only Access

-          CloudFront Full

-          CloudFront Read only

-         

 

Users

Best practice is to create unique users for all � keeps it granular and divided, better control. Passwords can be manually created or auto generated. Passwords are required if user will access the AWS Management Console (otherwise user accounts are used in the API via access keys).

 

Access Keys

Used to make secure REST or Query protocol request to any AWS service API

 

MFA � Multi-Factor Authentication

Requires one more setup for user authentication (on top of password). This can be virtual (use of QR codes using smart phone) or hardware (token). This is best for privileged users.

 

Signing Certificates

Use of X.509 certificates for scure access � used with SOAP and third-party tool such as OpenSSL to create the certificate. Can use RSA key 1024 or 2048 bit length. Once certificate created (and self signed), upload it to the User account in IAM.

 

Roles

Easy management of access keys on EC2 instances with automatic key rotation, assign least privilege to the application and SDK fully integrated. It is like delegating access. Some benefits are:

-          No need to share security credentials

-          Easy to break sharing relationship

-          Great for cross-account access, intra-account delegation and federation

 

There are 3 role types and access:

Service Roles

-          EC2

-          CloudHSM

-          Data Pipeline

-          EC2 role for data pipeline

-          Elastic Transcoder

-          OpsWork

Cross Account Access

-          Access between AWS accounts owned � allowing IAM users from another AWS account to access this account

-          IAM user access from third party AWS account holders � allowing access from third party AWS users

Identity Provider Access

-          Access to web identity providers � facebook, google, etc

-          WebSSO � Web Single Sign On� - allows SAML provider access

-          API access for SAML provider � SAML provider to access AWS via CLI or API

 

 

Identity Providers

SAML 2.0 providers (3rd party) for IdP (Identity Provider)

For example � using your company�s authentication system

SAML = Security Assertion Markup Language

 

Password Policy

Set password length and following policies:

-          Require at least one uppercase

-          Require at least one lowercase

-          Require at least one number

-          Require at least one non-alphanumeric

-          Allow user to change password

 

 

AWS CloudTrail

User Activity and Change Tracking

 

Amazon CloudWatch

Collects and reports metric on your AWS resources. Can set alarms on these metrics to trigger actions.

There are two parts to CloudWatch:

-          Alarms

-          Metrics (on any AWS resource)

o    DynamicDB

o    EBS

o    EC2

o    ELB

  HealthyHostCount

o    ElastiCache

o    ElaticMapReduce

o    RDS

o    SNS

o    SQS

o    StorageGateway

 

 

AWS Elastic Beanstalk

Automatically creates, deploys and manages the IT infrastructure needed to run a custom application.

Upload existing application and it will auto deploy into AWS with load balancing and everything

Supports Java, Node.js, PHP, phthon, Ruby, .Net

Stacks supported are:

-          Tomcat Java

-          Apache PHP

-          Apache Phython

-          Apache Node.js

-          Passenger Ruby

-          IIS7.5 .Net

Supports multiple versions of same application (ex PROD, DEV, TEST etc); supports up to 25 apps and 500 app versions

Supports application deployment via Git

 

 

 

 

AWS CloudFormation

Automatically deploys IT infrastructure on AWS using templates (stack) � using a Template, Parameters, Mappings, Conditions, Psuedo Parameters, Resources, Resource Properties, References, Intrinsic Functions and Outputs.

 

Look at template example here: https://s3.amazonaws.com/cloudformation-templates-us-east-1/WordPress_Single_Instance_With_RDS.template

 

Some notes:

-          When creating a stack, the stack parameter values will be prompted for on AWS Console during steps

-          When creating, status goes from CREATE_IN_PROGRESS to CREATE_COMPLETE

-          When creating a stack, it will deploy all the subcomponents � AWS resources ie EC2, RDS, etc

-          When deleting a stack, it will remove all subcomponents

-          To update an existing stack, use AWS CLI and run: aws cloudformation update-stack (UpdateStack API) or can also be done through Console (uploads a new template file)

-          There are some Windows AMI CloudFormation templates available on page 206

-          For all stacks, the following views are available on AWS Console

o    Overview

o    Outputs (from template)

o    Resources (AWS resources used)

o    Events (logging)

o    Template (the template)

o    Parameters

o    Tags

o    Policy

The 6 top level Template Objects (declaration of AWS resources that make up the stack in JSON format):

-          Format Version

-          Description

-          Parameters

-          Mappings

-          Resources (required � at least one resource defined, all other objects are optional)

-          Output

 

Lab

Create a WordPress stack based on given S3 template

Result website was given in the Output section:

http://ec2-54-201-76-172.us-west-2.compute.amazonaws.com/wordpress

Wordpress setup with: solidfish / sfish

 

AWS OpsWorks

Manages an application on AWS, including lifecycle, provisioning, configuration, deployment, updates, monitoring and access control.

 

AWS CloudHSM (Hardware Security Modules)

abc

 

 


 

Analytics

 

 

Amazon EMR (Elastic MapReduce)

Managed Hadoop Framework

For data mining � reserachers data analysts or other using Hadoop on top of EC2 and S3

 

 

Amazon Kinesis

abc

 

AWS Data Pipeline

Orchestration for Data-Driven Workflows

 

 


 

Application Services

 

 

Amazon AppStream

abc

 

Amazon CloudSearch

Managed Search Service

 

Amazon SWF (Simple Work Flow)

Workflow Service for Coordinating Application Components

Coordinate work across distributed application components � works similar to SQS

http://aws.amazon.com/swf/faqs/

Differences between SWF and SQS

-          SWF is task-oriented, not message oriented

-          Tasks are never duplicated

-          Ease of use

-          More details regarding the tasks

-           

 

 

Amazon SQS (Simple Queue Service)

Message Queue Service that acts as a buffer between the producer and consumer. SQS messages (up to 64KB text based) can be sent and received by servers or application components within EC2 from anywhere on internet. Can have unlimited number of queues and supports unordered, at-least-once delivery of messages.

An example is using SQS for image encoding (where image is in S3):

-          Asynchronously pull task message from queue

-          Retrieve filename from message

-          Process conversion

-          Write image back to S3

-          Create �task complete� task message on another queue

-          Delete the original task message in original queue

-          Check for more messages (loop)

Client can send or receive SQS messages at a rate of 5 � 50 messages per second. For higher performance scenarios, requester can call multiple messages (up to 10) in a single call. SQS messages can also have expiration dates set � anything from 1 hour to 14 days. Messages are retained until explicitly delete or automatically deleted upon expiration.

 

 

 

Amazon SES

Email Sending Service

Monitors and manages emails

 

Metrics that Define Success:

-          Bounce Rate (successful delivery or not)

-          Compliant Rate (marked as spam)

-          Content issues (content filters)

Best Practices

-          Domain and from address reputation

-           

 

 

Amazon SNS

Push Notification Service

There are two users:

-          Publisher (producer)

-          Subscriber (consumer)

Common SNS Scenarios:

-          Fanout - Can be used with SQS to send messages to multiple areas simultaneously for multiprocessing

-          Application / System Alerts

Email alerts

-          Push Email / Text Messaging

Messages sent via SMS for updates, headlines, etc; could have links for users to respond

-          Push Mobile

Messages sent to mobile apps for updates, alerts, etc; could have links for user response

 

Amazon Elastic Transcoder

Easy-to-use Scalable Media Transcoding

 

 

 

 

 


 

Additional Software and Services

 

 

Alexa Top Sites

Abc

 

Alexa Web Information Service

Abc

 

Amazon DevPay

Abc

 

Amazon FPS

Abc

 

Amazon Mechanical Turk

Abc

 

Amazon Silk

Abc

 

AWS GovCloud (US)

GovCloud is an isolated region designed for US government agencies with sensitive data that adheres to US International Traffic in Arms Regulations (ITAR). Data contain all categories of Controlled Unclassified Information (CUI). Physical and logical administration is done by U.S. persons only. Authentication is also separate from other AWS resources with different accounts and vetting process (to verify users are U.S. persons). Multi-Factor Authentication (MFA) is recommended for GovCloud users.

 

The following are available in GovCloud:

Application Services

-          Simple Notification Service (SNS)

-          Simple Queue Service (SQS)

-          Simple Workflow Service (SWF)

Compute

-          Elastic Compute Cloud (EC2)

o    Provisioned IOPS EBS store volumes not supported

o    EBS optimized not supported

o     

-          Elastic MapReduce (EMR)

-          Auto Scaling

-          Elastic Load Balancing

Database

-          DynamoDB

-          Relational Database Service (RDS)

Deployment and Management

-          Identity and Access Management (IAM)

o    Users in GovCloud are specific to this region only and not exist elsewhere or normal AWS regions

-          CloudWatch

-          CloudFormation

-          Management Console for GovCloud

Networking

-          Virtual Private Cloud (VPC)

-          Direct Connect

Storage

-          Elastic Block Store (EBS)

-          Simple Storage Service (S3)

o    Cannot direct copy contents from GovCloud to another AWS region

Support

-          Customer support

 

GovCloud Best Practices

Use Direct Connect into the GovCloud � which comes in two flavors � with VPN or no VPN. ITAR data require VPN connections (since everything gets encrypted).

 

CloudFront can be used with GovCloud � this is normal for CloudFront since its origin source can be non AWS resources and since GovCloud is outside the� normal AWS, CloudFront can easily access it.

 

Amazon Resource Names (ARNS)

In GovCloud, the ARNs are slightly different:

arn:aws���������� vs��� arn:aws-us-gov (or if a specific region = arn:aws-us-gov-west-1)

arn:aws-us-gov:dynamodb:us-gov-west-1:1234566:table/books_table