Tips-And-Tricks-To-Speed-Aws-Deployment PDF
Tips-And-Tricks-To-Speed-Aws-Deployment PDF
1-800-COURSES www.globalknowledge.com
Tips and Tricks to Speed AWS
Deployment
Rich Morrow, Global Knowledge Instructor, Cloud and Big Data Analyst
Introduction
Like many organizations, yours probably has investigated some of the services offered by Amazon Web Services
(AWS), and you’ve probably identified several that you would like to use. Although the benefits of increased
agility, developer productivity, pay-as-you-go pricing, and overall cost savings are extremely attractive, you might
wonder where to start. What pitfalls exist, and how can you avoid them? How can you best save time and
money?
If you’re a developer, a development manager, or a CIO who’d like to launch an AWS-hosted service like a pro,
keep reading.
Greenfield projects (brand new ones that don’t have any legacy) are by far the easiest to launch on any platform.
When launching a greenfield project on AWS, you’ll want to leverage existing architectures, Amazon Machine
Images(AMIs), which are pre-built virtual machine images), and/or CloudFormation Templates (pre-built stacks of
services that deploy from a template). The latter two are discussed in detail below.
Amazon publishes dozens of whitepapers and architecture diagrams, both of which you’ll definitely want to
review. If you’re trying to build a standard system like a CMS, e-commerce website, ad server, or batch-processing
engine, then you’re likely to find a CloudFormation template that will get you 90% or more of the way, and you
could be up and running in as soon as a day. With a greenfield project, you have the luxury of architecting “The
AWS way,” which is loosely coupled, highly-available, and scaled on demand.
Note: In the following paragraphs, we may be introducing some AWS terms or services that you’re not familiar
with. These services are covered later in the document, so feel free to skip ahead and come back for context if
you need.
If you’re looking to migrate all or part of an existing app, you’ll want to focus on something that will let you get
AWS experience with low risk. Content and media serving, internal CMS or collaboration engines, and proof-of-
concept projects are all excellent choices for a first effort. The most tried and true path to get one of these apps
moved from your systems to AWS is by doing it in three distinct phases: Forklift --> Integrate --> Optimize.
Forklift refers to just picking up your existing stack (maybe a combination of load balancers, servers, and
databases) and moving that entire stack into AWS. In the forklift phase, you’re single goal is to take what you
have and deploy on AWS. You’re simply trying to match your existing hardware and software to what AWS
offers, and migrate over. In this phase, you’re focusing on the AMIs, and matching your systems (like custom built
load balancers and databases) to AWS resources (like Elastic Load Balancers and RDS—the database product).
In the Integrate phase, you’re now focused on truly integrating your app into AWS. You’ll be looking for
opportunities to do high availability or “HA” (ensuring that no single points of failure exist in your stack), save
costs, and extend your security and disaster recovery capabilities. During this phase, you’ll begin utilizing AWS
features like autoscaling groups for your web and app servers, S3, and CloudFront for low-latency, cost effective
serving of static content, and RDS multi-AZ hot failover for the database layer. You’ll also begin fine-tuning your
security, and may even leverage technologies like VPC, VPNs, and/or Identity and Access Management (IAM)
(which is AWS’ fine-grained access controls for users, groups, and servers/services).
After you’ve completed the integration phase, you may want to hyper-optimize your application in the
Optimization phase. In Optimization, you’re trying to squeeze out every last efficiency and cost savings that AWS
offers. Some common activities in this phase are:
• Code refactors (which probably push out some functionality like queues, email, and notifications to AWS
services like SQS, SES, and SNS)
• Advanced EC2 strategies like parallel processing, leveraging reserved, and spot instances
• Integration with more esoteric AWS services like Simple Workflow (SWF) and Elastic Map Reduce (EMR)
–hosted Hadoop
By going through the Optimization phase with a low-risk product, you’ll not only have fully integrated one full
product into AWS, but you’ll have acquired valuable learning that you can then apply to more complicated or
risky efforts.
EC2s have two aspects to them, the instance type or “hardware footprint” that was covered previously, and the
AMI or “software footprint,” which is the particular OS, applications (like Apache, JBoss, or custom PHP or Python
code), and all configuration. Instance types and AMIs are independent of each other—you can launch most AMIs
into most instance types. When deciding on an AMI, you definitely want to consult both the community AMI
library as well as the AMI marketplace, because there’s a good chance that the exact software you want to
deploy exists in one or the other. The only caveat to using either is to carefully vet the provider of the AMI—make
sure it comes from a reputable company or individual you trust.
AMIs from the marketplace are an excellent way to test drive both open source and commercial software—you
can spin the servers up in just a few minutes, use them for as little as an hour, and pay as little as a few pennies.
As with nearly all AWS services, there is no up-front cost and you only pay for the hours that the software is kept
running. You can also load your own applications and custom code into a base AMI (OS only), and then burn
your own AMI, which now contains all your modifications and can be used to spin up exact copies of that server
in minutes.
All AWS services which utilize resizable VMs (like EC2, RDS, and Elasticache) can be paid for using “reservations.”
A reservation functions a lot like a stock option—you pay an up-front fee for the option (but not obligation) to
purchase a given VM for a lower nominal hourly cost than the “no-obligation” on-demand price. If your server
demand is well known (reservations are for specific instance types in specific data centers), purchasing a one or
three year reservation for your “always on” needs can save 30% or more over the entire term of the reservation.
A much more advanced use case is spot instances. With spot instances, you’re bidding on unused EC2 capacity—
you put in a bid for a specific instance and if the spot price goes below your bid, you get the instance. The flip
side of that is that if the spot price goes above your bid, the instance is taken away from you without warning
and given to other customers. Although that constraint makes spot appear unusable, there are many cases where
it makes perfect sense—dev and test environments, batch processing architectures, and offline data analysis to
name just a few.
It’s worth noting that whether you’re using on-demand, reserved, or spot instances, the underlying server is the
exact same—the three billing models only dictate how you pay for that server (and in the case of spot, that it can
go away at any time).
A very important aspect of EC2 instances is security. To help you manage security, AWS provides two important
tools that you’ll want to leverage ASAP—Security Groups and public/private EC2 key pairs.
AWS Security Groups (SGs) can be applied to the same resources which support reserved billing models—EC2,
RDS, and Elasticache. SGs are point-and-click stateful firewalls that can be defined once and applied to many
instances. A single instance can also have one or more SGs applied to it, and they can be modified at runtime to
add and remove access to that server. Like everything AWS does, SGs are locked down to “no access”—you’ll
need to create a new SG (do NOT modify the “default” SG) for at least each tier of your application—perhaps
opening port 80 and 443 to the world for the web servers and only port 8080 (from the web server’s SG) on the
app tier servers.
In addition to SGs, AWS requires public-private key pairs for at least the first Secure Shell (SSH) access to your EC2
server. It’s highly recommended to both leave this key pair authentication on, and rotate key pairs at least every
two to three months, or when an employee leaves the organization.
Security
If you’re used to managing only a handful of servers, you probably allow outside SSH access to each. In AWS,
where you will likely have dozens of servers running at any given time, you’ll probably want to leverage a bastion
host or “jump station” server. This bastion host (or set of bastion hosts) becomes the only machine that allows
outside SSH access, and to get to other machines in your farm, you first have to SSH into the bastion and then
SSH to the other machines. Your bastion host machine(s) then become the one(s) on which you lock down
security, possibly installing OS-level firewalls, anti-virus, and intrusion detection systems (IDSes) such as Snort or
Tripwire. Again, SGs provide an easy way to enforce bastion host (outside access) and internal server (access only
from bastion host) SSH.
Savvy AWS users will actually take their root account (the main account that has complete access to all actions on
all resources), burn its credentials on a USB and then lock it in a company safe (after, of course, they used it to
create all of the IAM accounts that they needed). Whatever you do, make sure that individual users and other
EC2 instances never use that root account for interaction with AWS—if that account becomes compromised,
everything you’ve built with AWS is compromised.
A final important security tool available to you (which is 100% free) is the Virtual Private Cloud (VPC). With VPC,
AWS users can cordon off a private, virtual cloud in AWS. This private cloud has “edge level” control over who
can get into and out of the whole set of resources you deploy into it. It also gives you:
• Detailed control over public and private IP addresses
• Egress as well as ingress rules for SGs
• The ability to connect a Virtual Private Gateway (VPG) which allows for secured VPN connections to your
on-premise data center
• The ability to set up routing rules which direct traffic based on source, destination, and traffic type
VPC provides a ton of security for only a small additional amount of complexity, and I try to use it in every client
project I launch in AWS.
With CloudFormation, you define all the resources you need in a JSON template, and then you can deploy that
template in any Region. Because the template is a single JSON document, you can even check the template into
version control, allowing it to be the single source of documentation for what you deploy. Those who have used
commercial configuration management tools like Puppet, Chef, or Ansible will be right at home in
CloudFormation’s syntax. The only possible downside to CloudFormation is that it is an AWS-only tool, whereas
other general-purpose tools will work both in and out of AWS.
Storage Options
At the storage layer, AWS shines exceptionally bright—giving you several options to choose from, each with
various use cases.
To even further speed up your serving of these assets, you could use CloudFront—AWS’s Content Delivery
Network (CDN). Although it’s not as feature rich as competitor solutions like Akamai, CloudFront provides the
same basic functionality—pushing static content closer to users, greatly speeding their interaction with your web
product. Like almost all services AWS offers, CloudFront is zero cost up-front and zero commitment. You can try it
to decide whether it’s suitable for you, and turn it back off, all in a day—something Akamai will likely never let
you do.
No discussion of storage would be complete without talking about a Relational Database Management System
(RDBMS). The heart of your real-time web or mobile product is likely to be an RDBMS like MySQL, Oracle,
Postgres, or SQL Server.
For this layer, AWS offers the RDS product which is a fully managed RDBMS, complete with client-level control
(you get no SSH into the server) to create, select, insert, and perform all other operations that you’re used to
from a command line client like the MySQL client.
RDS should be your go-to tool when trying to implement a database in AWS, but it definitely comes with some
limitations that may not make it suitable for you. Although it makes the patching, backup, and multi-AZ HA as
simple as a few mouse clicks—in exchange for that convenience, you give up considerable control. You cannot, for
instance, adjust my.cnf-level properties, or tune anything at the OS level. There is no ability to SSH into the server,
and although RDS comes in re-sizeable footprints (such as EC2), your database must fit in 3TB or less.
For this reason, some AWS users must go the route of building their own database on EC2, probably using a set
of striped Elastic Block Store or EBS volumes, and possibly even using provisioned Input/Output Operations Per
Second (IOPS) to make the database extremely responsive (albeit at a higher cost). Whenever possible, one
should always use the AWS service, which will likely be more cost effective and feature rich for the common
operations you’d like to perform.
A final storage product that’s worth mentioning is DynamoDB, which many users have found extremely useful (as
evidenced by it being the fastest growing service ever in AWS history). DynamoDB is a managed, SSD-backed,
distributed Key-Value NoSQL engine—essentially providing the same base functionality as Memcached, but with
much higher performance and a richer feature set. Aside from a 64KB row limit, DynamoDB really has no limits—it
can serve any read or write volume, and there are no limits to number of tables, number of rows per table, or
raw storage volume.
DynamoDB is an excellent fit for data like user web sessions, or “hot tables,” that your RDBMS is struggling to
serve, or systems where extremely low latency is needed.
Although I love Memcached and have used it in many production environments, when building systems in AWS,
one should really try to avoid their ElastiCache product (which provides a hosted Memcached cluster), and
instead lean towards DynamoDB. ElastiCache still only runs in a single AZ, making it a potential single point of
failure for the critical areas where you want to deploy it.
Additional Reading
I hope the information provided in this white paper helps make your AWS deployments less risky and more
rewarding. If you’re inspired to learn more about AWS, a few links to some helpful resources are provided below.