Notes AWS Solutions Architects
Notes AWS Solutions Architects
EC2 Instances
Amazon EC2 is free to try. There are multiple ways to pay for Amazon EC2 instances:
On-Demand, Savings Plans, Reserved Instances, and Spot Instances. You can also pay
for Dedicated Hosts, which provide EC2 instance capacity on physical servers dedicated
for your use. For more information on how to optimize your Amazon EC2 spend, visit
the Amazon EC2 Cost and Capacity page.
On-Demand
With On-Demand instances, you pay for compute capacity by the hour or the second
depending on which instances you run. No longer-term commitments or upfront
payments are needed. You can increase or decrease your compute capacity depending
on the demands of your application and only pay the specified per hourly rates for the
instance you use.
On-Demand instances are recommended for:
Users that prefer the low cost and flexibility of Amazon EC2 without any upfront
payment or long-term commitment
Applications with short-term, spiky, or unpredictable workloads that cannot be
interrupted
Applications being developed or tested on Amazon EC2 for the first time
Spot instances
Amazon EC2 Spot instances allow you to request spare Amazon EC2 computing
capacity for up to 90% off the On-Demand price. Learn More.
Spot instances are recommended for:
Applications that have flexible start and end times
Applications that are feasible only at very low compute prices
Users with urgent computing needs for large amounts of additional capacity.
Savings Plans
Savings Plans are a flexible pricing model that offers low prices on EC2 and Fargate
usage, in exchange for a commitment to a consistent amount of usage (measured in
$/hour) for a one- or three-year term.
Dedicated Hosts
A Dedicated Host is a physical EC2 server dedicated for your use. Dedicated Hosts can
help you reduce costs by allowing you to use your existing server-bound software
licenses, including Windows Server, SQL Server, and SUSE Linux Enterprise Server
(subject to your license terms), and can also help you meet compliance requirements.
Learn more.
Can be purchased On-Demand (hourly).
Can be purchased as a Reservation for up to 70% off the On-Demand price.
Per-Second Billing
With per-second billing, you pay only for what you use. EC2 per-second billing removes
the cost of unused minutes and seconds from your bill. Focus on improving your
applications instead of maximizing hourly usage, especially for instances running over
irregular time periods such as dev/testing, data processing, analytics, batch processing,
and gaming applications.
EC2 usage is billed in one-second increments, with a minimum of 60 seconds. Similarly,
provisioned storage for Amazon Elastic Block Store (EBS) volumes will be billed per-
second increments, with a 60-second minimum. Per-second billing is available for
instances launched in:
On-Demand, Savings Plans, Reserved, and Spot instances
All regions and Availability Zones
Amazon Linux, Windows, and Ubuntu
For details on related costs like data transfer, Elastic IP addresses, and EBS Optimized
Instances, visit the On-Demand pricing page.
Host-based Routing: Host-based routing allows you to route traffic to different target
groups based on the hostname of the incoming request. For example, you could
configure your load balancer to route requests with the hostname "api.example.com"
to a different target group than requests with the hostname "www.example.com".
HTTP Header-based Routing: HTTP header-based routing allows you to route traffic to
different target groups based on the value of an HTTP header in the incoming request.
For example, you could configure your load balancer to route requests with the "User-
Agent" header set to "iOS" to a different target group than requests with the "User-
Agent" header set to "Android".
Query String Parameter-based Routing: Query string parameter-based routing allows
you to route traffic to different target groups based on the value of a query string
parameter in the incoming request. For example, you could configure your load
balancer to route requests with the "category" parameter set to "books" to a different
target group than requests with the "category" parameter set to "electronics".
Source IP Address CIDR-based Routing: Source IP address CIDR-based routing allows
you to route traffic to different target groups based on the source IP address of the
incoming request. For example, you could configure your load balancer to route
requests originating from a specific IP address range to a different target group than
requests originating from a different IP address range.
By using these routing methods, you can customize the behavior of your load balancer
and direct traffic to the appropriate target groups based on a variety of factors,
improving the performance and reliability of your applications.
Auto Scaling Group Attributes
Launch Template
is similar to a launch configuration, in that it specifies instance configuration
information such as the ID of the Amazon Machine Image (AMI), the instance type, a
key pair, security groups, and the other parameters that you use to launch EC2
instances. Also, defining a launch template instead of a launch configuration allows
you to have multiple versions of a template.
With launch templates, you can provision capacity across multiple instance types using
both On-Demand Instances and Spot Instances to achieve the desired scale,
performance, and cost.
A launch configuration
is an instance configuration template that an Auto Scaling group uses to launch EC2
instances. When you create a launch configuration, you specify information for the
instances. Include the ID of the Amazon Machine Image (AMI), the instance type, a key
pair, one or more security groups, and a block device mapping.
It is not possible to modify a launch configuration once it is created. The correct option
is to create a new launch configuration to use the correct instance type. Then modify
the Auto Scaling group to use this new launch configuration. Lastly to clean-up, just
delete the old launch configuration as it is no longer needed.
Amazon S3 notification
The Amazon S3 notification feature enables you to receive notifications when certain
events happen in your bucket. To enable notifications, you must first add a notification
configuration that identifies the events you want Amazon S3 to publish and the
destinations where you want Amazon S3 to send the notifications. You store this
configuration in the notification sub resource that is associated with a bucket.
Amazon S3 supports the following destinations where it can publish events:
- Amazon Simple Notification Service (Amazon SNS) topic
- Amazon Simple Queue Service (Amazon SQS) queue
- AWS Lambda
Take note that Amazon S3 event notifications are designed to be delivered at least
once and to one destination only. You cannot attach two or more SNS topics or SQS
queues for S3 event notification. Therefore, you must send the event notification to
Amazon SNS.
S3 Bucket policies
Bucket policies in Amazon S3 can be used to add or deny permissions across some or
all of the objects within a single bucket. Policies can be attached to users, groups, or
Amazon S3 buckets, enabling centralized management of permissions. With bucket
policies, you can grant users within your AWS Account or other AWS Accounts access
to your Amazon S3 resources.
Bucket settings for Block Public Access
S3 multipart upload
allows you to upload a single object as a set of parts. Each part is a contiguous portion
of the object's data. You can upload these object parts independently and in any order.
If transmission of any part fails, you can retransmit that part without affecting other
parts. After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object. In general, when your object size reaches 100 MB, you should
consider using multipart uploads instead of uploading the object in a single operation.
Multipart upload provides improved throughput; therefore, it facilitates faster file
uploads.
Partition Key
portion of a table's primary key determines the logical partitions in which a table's data
is stored. This in turn affects the underlying physical partitions. Provisioned I/O
capacity for the table is divided evenly among these physical partitions. Therefore, a
partition key design that doesn't distribute I/O requests evenly can create "hot"
partitions that result in throttling and use your provisioned I/O capacity inefficiently.
The optimal usage of a table's provisioned throughput depends not only on the
workload patterns of individual items, but also on the partition-key design. This doesn't
mean that you must access all partition key values to achieve an efficient throughput
level, or even that the percentage of accessed partition key values must be high. It
does mean that the more distinct partition key values that your workload accesses, the
more those requests will be spread across the partitioned space. In general, you will
use your provisioned throughput more efficiently as the ratio of partition key values
accessed to the total number of partition key values increases.
One example for this is the use of partition keys with high-cardinality attributes, which
have a large number of distinct values for each item.
VPC (Virtual Private Cloud)
Virtual Private Cloud (VPC): A virtual private cloud is a logically isolated virtual
network within the AWS cloud. It allows you to define your own IP address space,
create subnets, and configure network gateways, all within a virtual network that you
control.
Subnets: Subnets are logical partitions within a VPC that allow you to segregate
resources based on their security or operational requirements.
Route Tables: Route tables are used to control the flow of traffic within and outside
the VPC. They contain a set of rules that define how traffic is routed between subnets
and to the internet.
Internet Gateway: An internet gateway is a horizontally scaled, redundant, and highly
available VPC component that allows communication between instances in your VPC
and the internet.
Network Access Control Lists (NACLs): NACLs are used to control traffic at the subnet
level. They act as a firewall and can be used to allow or deny traffic based on IP
addresses, ports, and protocols.
Security Groups: Security groups act as a virtual firewall for your instances to control
inbound and outbound traffic. They are stateful, which means that they automatically
allow return traffic.
Elastic IP addresses (EIPs): EIPs are static IP addresses that can be assigned to your
instances, allowing them to be reachable from the internet even if their IP address
changes.
NAT Gateway: A NAT gateway is a highly available, managed network address
translation (NAT) service that allows instances in a private subnet to connect to the
internet or other AWS services, while remaining private.
VPN Connections: VPN connections provide secure connectivity between your on-
premises data center or office and your VPC.
Direct Connect: Direct Connect is a dedicated network connection between your on-
premises infrastructure and AWS. It provides a high-speed, low-latency, and reliable
connection, which can be used to access services in your VPC.
The Amazon VPC console wizard provides the following four configurations:
1. VPC with a single public subnet - The configuration for this scenario
includes a virtual private cloud (VPC) with a single public subnet, and an
internet gateway to enable communication over the internet. We recommend
this configuration if you need to run a single-tier, public-facing web application,
such as a blog or a simple website.
2. VPC with public and private subnets (NAT) - The configuration for this
scenario includes a virtual private cloud (VPC) with a public subnet and a
private subnet. We recommend this scenario if you want to run a public-facing
web application while maintaining back-end servers that aren't publicly
accessible. A common example is a multi-tier website, with the web servers in a
public subnet and the database servers in a private subnet. You can set up
security and routing so that the web servers can communicate with the
database servers.
3. VPC with public and private subnets and AWS Site-to-Site VPN access -
The configuration for this scenario includes a virtual private cloud (VPC) with a
public subnet and a private subnet, and a virtual private gateway to enable
communication with your network over an IPsec VPN tunnel. We recommend
this scenario if you want to extend your network into the cloud and also
directly access the Internet from your VPC. This scenario enables you to run a
multi-tiered application with a scalable web front end in a public subnet and to
house your data in a private subnet that is connected to your network by an
IPsec AWS Site-to-Site VPN connection.
4. VPC with a private subnet only and AWS Site-to-Site VPN access - The
configuration for this scenario includes a virtual private cloud (VPC) with a
single private subnet, and a virtual private gateway to enable communication
with your network over an IPsec VPN tunnel. There is no Internet gateway to
enable communication over the Internet. We recommend this scenario if you
want to extend your network into the cloud using Amazon's infrastructure
without exposing your network to the Internet.
VPC sharing
allows multiple AWS accounts to create their application resources such as EC2
instances, RDS databases, Redshift clusters, and Lambda functions, into shared and
centrally-managed Amazon Virtual Private Clouds (VPCs). To set this up, the account
that owns the VPC (owner) shares one or more subnets with other accounts
(participants) that belong to the same organization from AWS Organizations. After a
subnet is shared, the participants can view, create, modify, and delete their application
resources in the subnets shared with them. Participants cannot view, modify, or delete
resources that belong to other participants or the VPC owner.
You can share Amazon VPCs to leverage the implicit routing within a VPC for
applications that require a high degree of interconnectivity 0. This reduces the number
of VPCs that you create and manage while using separate accounts for billing and
access control.
VPC Endpoint
is a virtual device that enables private connectivity between a VPC and AWS services
without using public IP addresses, NAT (Network Address Translation) devices, VPN
(Virtual Private Network) connections, or internet gateways.
VPC endpoints allow you to connect to AWS services, such as Amazon S3, Amazon
DynamoDB, and Amazon Kinesis, from your VPC without exposing your traffic to the
public internet. This improves security by keeping traffic within the AWS network and
reducing exposure to threats from the public internet.
Transit Gateway
is a service that enables you to connect multiple VPCs and on-premises networks
together using a single gateway. Site-to-Site VPN (Virtual Private Network) Enhanced
ECMP (Equal Cost Multipath) is a feature of Transit Gateway that allows you to
distribute VPN traffic across multiple VPN connections between Transit Gateway and
on-premises networks.
ECMP is a routing technique that allows traffic to be distributed across multiple equal-
cost paths, enabling efficient use of available bandwidth and providing redundancy in
case of path failure. Site-to-Site VPN Enhanced ECMP extends this capability to VPN
connections between Transit Gateway and on-premises networks, allowing traffic to
be distributed across multiple VPN connections.
By using Transit Gateway and Site-to-Site VPN Enhanced ECMP, you can create a
scalable and highly available solution for connecting multiple networks. You can
connect multiple VPCs and on-premises networks to Transit Gateway, and use Site-to-
Site VPN Enhanced ECMP to distribute traffic across multiple VPN connections,
providing redundancy and efficient use of available bandwidth.
To use Site-to-Site VPN Enhanced ECMP, you must have multiple VPN connections
between Transit Gateway and on-premises networks with the same routing prefix.
When traffic is sent to the routing prefix, Transit Gateway distributes the traffic across
the multiple VPN connections using ECMP, providing an efficient and redundant
solution for connecting multiple networks.
Overall, using DynamoDB Global Tables and KMS Multi-Region Keys with client-side
encryption can provide a highly secure and resilient way to store and access data in the
cloud.
AWS Config
AWS Config provides a detailed view of the configuration of AWS resources in your
AWS account. This includes how the resources are related to one another and how
they were configured in the past so that you can see how the configurations and
relationships change over time.
provides AWS-managed rules, which are predefined, customizable rules that AWS
Config uses to evaluate whether your AWS resources comply with common best
practices. You can leverage an AWS Config managed rule to check if any ACM
certificates in your account are marked for expiration within the specified number of
days. Certificates provided by ACM are automatically renewed. ACM does not
automatically renew the certificates that you import. The rule is NON_COMPLIANT if
your certificates are about to expire.
Amazon GuardDuty
is a threat detection service that continuously monitors for malicious activity and
unauthorized behavior to protect your AWS accounts, workloads, and data stored in
Amazon S3. With the cloud, the collection and aggregation of account and network
activities is simplified, but it can be time-consuming for security teams to continuously
analyze event log data for potential threats. With GuardDuty, you now have an
intelligent and cost-effective option for continuous threat detection in AWS. The
service uses machine learning, anomaly detection, and integrated threat intelligence to
identify and prioritize potential threats.
GuardDuty analyzes tens of billions of events across multiple AWS data sources, such
as AWS CloudTrail events, Amazon VPC Flow Logs, and DNS logs.
With a few clicks in the AWS Management Console, GuardDuty can be enabled with no
software or hardware to deploy or maintain. By integrating with Amazon EventBridge
Events, GuardDuty alerts are actionable, easy to aggregate across multiple accounts,
and straightforward to push into existing event management and workflow systems.
AWS DataSync
is an online data transfer service that simplifies, automates, and accelerates copying
large amounts of data between on-premises storage systems and AWS Storage
services, as well as between AWS Storage services.
You can use AWS DataSync to migrate data located on-premises, at the edge, or in
other clouds to Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon
FSx for Lustre, Amazon FSx for OpenZFS, and Amazon FSx for NetApp ONTAP.
AWS AppSync
is a serverless GraphQL and Pub/Sub API service that simplifies building modern web
and mobile applications. It provides a robust, scalable GraphQL interface for
application developers to combine data from multiple sources, including Amazon
DynamoDB, AWS Lambda, and HTTP APIs.
GraphQL
is a data language to enable client apps to fetch, change and subscribe to data from
servers. In a GraphQL query, the client specifies how the data is to be structured when
it is returned by the server. This makes it possible for the client to query only for the
data it needs, in the format that it needs it in.
With AWS AppSync, you can use custom domain names to configure a single,
memorable domain that works for both your GraphQL and real-time APIs.
In other words, you can utilize simple and memorable endpoint URLs with domain
names of your choice by creating custom domain names that you associate with the
AWS AppSync APIs in your account.
file gateway
supports a file interface into Amazon Simple Storage Service (Amazon S3) and
combines a service and a virtual software appliance. By using this combination, you
can store and retrieve objects in Amazon S3 using industry-standard file protocols such
as Network File System (NFS) and Server Message Block (SMB). The software
appliance, or gateway, is deployed into your on-premises environment as a virtual
machine (VM) running on VMware ESXi, Microsoft Hyper-V, or Linux Kernel-based
Virtual Machine (KVM) hypervisor.
The gateway provides access to objects in S3 as files or file share mount points. With a
file gateway, you can do the following:
- You can store and retrieve files directly using the NFS version 3 or 4.1 protocol.
- You can store and retrieve files directly using the SMB file system version, 2 and 3
protocol.
- You can access your data directly in Amazon S3 from any AWS Cloud application or
service.
- You can manage your Amazon S3 data using lifecycle policies, cross-region
replication, and versioning. You can think of a file gateway as a file system mount on
S3.
File Gateway:
File Gateway is a service that allows users to store and retrieve files from Amazon S3
using traditional file protocols like NFS or SMB. It presents a file interface to
applications and users, and the data is stored in Amazon S3 as objects.
S3 File Gateway is an Amazon Web Services (AWS) service that allows you to create a
file-based storage gateway to store and retrieve data in Amazon S3 (Simple Storage
Service) using the NFS (Network File System) and SMB (Server Message Block) file
protocols.
Volume Gateway:
Volume Gateway provides a way for users to store data in Amazon S3, using their
existing applications and infrastructure. It presents a block storage interface to
applications, allowing them to write data to a virtual machine running on-premises or
in the cloud. Volume Gateway supports two types of volumes:
Stored Volumes: It stores data locally and asynchronously backs up point-in-time
snapshots of the data to Amazon S3.
Cached Volumes: It stores the primary data in Amazon S3 while retaining a cache of
frequently accessed data on-premises.
Tape Gateway:
Tape Gateway provides a way to backup data to Amazon S3 using virtual tapes, similar
to how traditional tape backups are used. It presents an iSCSI interface to applications,
allowing them to use the virtual tapes like physical tapes. Data written to the virtual
tapes is stored in Amazon S3 as objects, and can be retrieved as needed.
In summary, File Gateway is used for file-level access to data in Amazon S3, Volume
Gateway provides block-level storage volumes for applications, and Tape Gateway is
used for backup and archive storage using virtual tapes.
Aws CloudFront
Many companies that distribute content over the internet want to restrict access to
documents, business data, media streams, or content that is intended for selected
users, for example, users who have paid a fee. To securely serve this private content by
using CloudFront, you can do the following:
- Require that your users access your private content by using special CloudFront
signed URLs or signed cookies.
- Require that your users access your content by using CloudFront URLs, not URLs that
access content directly on the origin server (for example, Amazon S3 or a private HTTP
server). Requiring CloudFront URLs isn't necessary, but we recommend it to prevent
users from bypassing the restrictions that you specify in signed URLs or signed cookies.
CloudFront signed URLs and signed cookies provide the same basic functionality: they
allow you to control who can access your content.
Amazon CloudFront is a fast content delivery network (CDN) service that securely
delivers data, videos, applications, and APIs to customers globally with low latency,
high transfer speeds, all within a developer-friendly environment.
CloudFront points of presence (POPs) (edge locations) make sure that popular content
can be served quickly to your viewers. CloudFront also has regional edge caches that
bring more of your content closer to your viewers, even when the content is not
popular enough to stay at a POP, to help improve performance for that content.
Dynamic content, as determined at request time (cache-behavior configured to
forward all headers), does not flow through regional edge caches, but goes directly to
the origin. So, this option is correct.
If you want to serve private content through CloudFront and you're trying to decide
whether to use signed URLs or signed cookies, consider the following:
- You want to use an RTMP distribution. Signed cookies aren't supported for RTMP
distributions.
- You want to restrict access to individual files, for example, an installation download
for your application.
- Your users are using a client (for example, a custom HTTP client) that doesn't support
cookies.
Use signed cookies for the following cases:
- You want to provide access to multiple restricted files, for example, all of the files for
a video in HLS format or all of the files in the subscribers' area of a website.
Global Accelerator is a good fit for non-HTTP use cases, such as gaming (UDP), IoT
(MQTT), or Voice over IP, as well as for HTTP use cases that specifically require static IP
addresses or deterministic, fast regional failover. Both services integrate with AWS
Shield for DDoS protection.
Provisioned capacity
ensures that your retrieval capacity for expedited retrievals is available when you need
it. Each unit of capacity provides that at least three expedited retrievals can be
performed every five minutes and provides up to 150 MB/s of retrieval throughput.
You should purchase provisioned retrieval capacity if your workload requires highly
reliable and predictable access to a subset of your data in minutes. Without
provisioned capacity Expedited retrievals are accepted, except for rare situations of
unusually high demand. However, if you require access to Expedited retrievals under
all circumstances, you must purchase provisioned retrieval capacity.
Amazon Kinesis
provides several features that enable users to collect, process, and analyze real-time
streaming data.
Here are some key features of Amazon Kinesis:
Data Streams: Users can create and manage data streams to collect and store large
amounts of real-time data.
Automatic Scaling: Kinesis can automatically scale the number of stream shards (data
partitions) based on the volume of incoming data, allowing users to process data at
any scale.
Real-Time Processing: Kinesis enables users to process data in real-time using custom
applications or AWS services such as AWS Lambda or Amazon Kinesis Analytics.
Data Retention: Kinesis allows users to store data for up to 365 days, enabling them to
analyze historical data and detect patterns over time.
Data Encryption: Kinesis provides encryption at rest and in transit to ensure that data
is secure.
Multiple Language Support: Kinesis supports multiple programming languages and
frameworks, including Java, Python, and Node.js.
Integration with AWS Services: Kinesis can integrate with other AWS services such as
Amazon S3, Amazon Redshift, and Amazon Elasticsearch for advanced data processing
and analytics.
Data Analytics: Kinesis provides tools for real-time data analytics, including Amazon
Kinesis Analytics, which allows users to analyze and query data streams in real-time
using SQL.
Overall, Amazon Kinesis is a powerful service that provides a scalable, secure, and
easy-to-use platform for collecting, processing, and analyzing real-time streaming data.
Amazon Kinesis is a real-time data streaming and processing service provided by
Amazon Web Services (AWS). It allows users to collect, process, and analyze large
amounts of streaming data from various sources such as website clickstreams, IoT
devices, social media, and log data.
With Amazon Kinesis, users can ingest streaming data from multiple sources into a
data stream, which can then be processed in real-time using custom applications or
AWS services such as AWS Lambda, Amazon Kinesis Analytics, or Amazon Kinesis Data
Firehose. This enables users to gain insights from the data in real-time and take
immediate actions based on the insights.
Amazon Kinesis also provides features such as automatic scaling, data retention, and
data encryption to ensure reliable and secure data processing. It supports multiple
programming languages and frameworks, such as Java, Python, and Node.js, and can
integrate with various AWS services such as Amazon S3, Amazon Redshift, and Amazon
Elasticsearch.
Overall, Amazon Kinesis is a powerful service that allows users to build real-time
streaming data applications and gain insights from large amounts of data in real-time.
DATABASE
Here are the unique features and benefits of each of the databases available on AWS:
Amazon RDS:
Easy setup and management of popular relational databases like MySQL, PostgreSQL,
Oracle, SQL Server, MariaDB, and Amazon Aurora.
Automated backups and software patching.
Multi-AZ deployments for high availability and fault tolerance.
Scalability and flexibility to adjust instance sizes and storage on the fly.
Amazon Aurora:
High performance, availability, and scalability with up to 5 times better performance
than standard MySQL and up to 3 times better performance than standard PostgreSQL.
Compatibility with MySQL and PostgreSQL, allowing for easy migration and integration
with existing applications.
Built-in fault-tolerance and self-healing capabilities with Aurora Multi-Master and
Global Database.
Amazon DynamoDB:
Fully managed NoSQL database service with automatic scaling and high availability.
Predictable, single-digit millisecond latency for reads and writes.
Flexible data model with support for key-value and document data structures.
Encryption at rest and in transit, and integration with AWS Identity and Access
Management (IAM) for security and access control.
Amazon DocumentDB:
Compatibility with MongoDB, allowing for easy migration and integration with existing
MongoDB applications.
Fully managed service with automatic scaling and high availability.
Consistent, single-digit millisecond latency for reads and writes.
Encryption at rest and in transit, and integration with AWS IAM for security and access
control.
Amazon Neptune:
Fully managed graph database service that provides highly connected data storage and
querying.
Optimized for storing and querying highly connected data sets such as social
networking, recommendation engines, and fraud detection.
Compatibility with Apache TinkerPop and SPARQL, allowing for flexible querying and
integration with existing graph applications.
Amazon ElastiCache:
In-memory caching service that provides high-performance caching for popular open-
source in-memory data stores like Redis and Memcached.
Fully managed service with automatic scaling and high availability.
Enables faster response times and reduces database load by caching frequently
accessed data in-memory.
Amazon Keyspaces:
Fully managed, scalable, and highly available Apache Cassandra-compatible database
service.
Flexible and scalable data model to store and retrieve data from multiple data centers
and regions.
Encryption at rest and in transit, and integration with AWS IAM for security and access
control.
Amazon Timestream:
Fully managed time series database service that makes it easy to store and analyze
trillions of time series data points.
Serverless and automatically scales up and down to meet the needs of your
application.
Supports fast and flexible querying of time series data for real-time analytics.
Amazon Managed Apache Cassandra Service:
Fully managed, scalable, and highly available Apache Cassandra-compatible database
service.
Provides the power of Apache Cassandra with the ease of a managed service,
removing the need for self-managed infrastructure.
Encryption at rest and in transit, and integration with AWS IAM for security and access
control.
Amazon Quantum Ledger Database (QLDB):
Fully managed ledger database that provides a transparent, immutable, and
cryptographically verifiable transaction log.
Designed for systems that require an immutable and tamper-proof ledger, such as
supply chain management and financial services.
Fully managed service with automatic scaling and high availability.
Amazon Redshift:
Fully managed, petabyte-scale data warehouse service that provides fast query
performance using a columnar storage format.
Enables fast analysis and business intelligence using SQL-based queries and tools.
Supports integrations with other AWS services like AWS Glue, Amazon EMR, and
Amazon
Amazon QLDB:
Fully managed ledger database that provides a transparent, immutable, and
cryptographically verifiable transaction log.
Designed for systems that require an immutable and tamper-proof ledger, such as
supply chain management and financial services.
Fully managed service with automatic scaling and high availability.
Amazon Timestream:
Fully managed time series database service that makes it easy to store and analyze
trillions of time series data points.
Serverless and automatically scales up and down to meet the needs of your
application.
Supports fast and flexible querying of time series data for real-time analytics.
Amazon Managed Apache Cassandra Service:
Fully managed, scalable, and highly available Apache Cassandra-compatible database
service.
Provides the power of Apache Cassandra with the ease of a managed service,
removing the need for self-managed infrastructure.
Encryption at rest and in transit, and integration with AWS IAM for security and access
control.
Amazon Neptune:
Fully managed graph database service that provides highly connected data storage and
querying.
Optimized for storing and querying highly connected data sets such as social
networking, recommendation engines, and fraud detection.
Compatibility with Apache TinkerPop and SPARQL, allowing for flexible querying and
integration with existing graph applications.
Database Types
• RDBMS (= SQL / OLTP): RDS, Aurora – great for joins
• NoSQL database – no joins, no SQL: DynamoDB (~JSON), ElastiCache (key /
value pairs), Neptune (graphs), DocumentDB (for MongoDB), Keyspaces (for
Apache Cassandra)
• Object Store: S3 (for big objects) / Glacier (for backups / archives)
• Data Warehouse (= SQL Analytics / BI): Redshift (OLAP), Athena, EMR
• Search: OpenSearch (JSON) – free text, unstructured searches
• Graphs: Amazon Neptune – displays relationships between data
• Ledger: Amazon Quantum Ledger Database
• Time series: Amazon Timestream
• Note: some databases are being discussed in the Data & Analytics section
Amazon Aurora
is a cloud-based relational database service developed and managed by Amazon Web
Services (AWS). It is designed to provide a highly available, scalable, and performant
database solution that is compatible with MySQL and PostgreSQL.
Some key features of Amazon Aurora include:
High availability: Aurora is designed to automatically detect and recover from failures,
providing high availability for your databases.
Scalability: Aurora can scale up or down based on your needs, automatically adding or
removing resources as necessary.
Performance: Aurora is optimized for performance, with low latency and high
throughput.
Compatibility: Aurora is compatible with MySQL and PostgreSQL, allowing you to use
familiar tools and drivers.
Security: Aurora supports encryption at rest and in transit, and provides fine-grained
access control through AWS Identity and Access Management (IAM).
Overall, Amazon Aurora is a robust and flexible database service that can meet the
needs of a wide range of applications and workloads.
High Performance: Amazon Aurora is designed to provide high performance and low
latency. It is optimized for running on AWS infrastructure, and is built using a
distributed, fault-tolerant architecture that provides high availability and automatic
failover.
MySQL and PostgreSQL Compatibility: Amazon Aurora is compatible with both MySQL
and PostgreSQL, allowing you to use your existing skills and tools.
Scalability: Amazon Aurora allows you to scale your database instance up or down
depending on your needs, without any downtime. Additionally, it supports up to 15
read replicas for read scaling.
Automated Backups: Amazon Aurora automatically backs up your database and
transaction logs, and allows you to restore your data to any point in time within the
retention period.
Multi-AZ Deployment: Amazon Aurora provides Multi-AZ deployment for high
availability, which automatically replicates your data to a standby instance in a
different Availability Zone (AZ) to ensure that your database is highly available in the
event of a failure.
Security: Amazon Aurora provides several security features such as network isolation,
encryption at rest and in transit, and support for VPC (Virtual Private Cloud).
Global Database: Amazon Aurora Global Database allows you to create a single
database that spans multiple AWS regions, providing low-latency access to your data
from anywhere in the world.
Performance Insights: Amazon Aurora Performance Insights provides a dashboard that
allows you to monitor the performance of your database instance, and identify
performance issues quickly.
Cost-effective: Amazon Aurora offers a pay-as-you-go model, allowing you to only pay
for what you use. Additionally, you can leverage Reserved Instances to reduce your
costs further.
Compatible API for PostgreSQL / MySQL, separation of storage and compute
• Storage: data is stored in 6 replicas, across 3 AZ – highly available, self-healing, auto-
scaling
• Compute: Cluster of DB Instance across multiple AZ, auto-scaling of Read Replicas
• Cluster: Custom endpoints for writer and reader DB instances
• Same security / monitoring / maintenance features as RDS
• Know the backup & restore options for Aurora
• Aurora Serverless – for unpredictable / intermittent workloads, no capacity planning
• Aurora Multi-Master – for continuous writes failover (high write availability)
• Aurora Global: up to 16 DB Read Instances in each region, < 1 second storage
replication
• Aurora Machine Learning: perform ML using SageMaker & Comprehend on Aurora
• Aurora Database Cloning: new cluster from existing one, faster than restoring a
snapshot
• Use case: same as RDS, but with less maintenance / more flexibility / more
performance / more features
Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that
auto-scales up to 128TB per database instance. It delivers high performance and
availability with up to 15 low-latency read replicas, point-in-time recovery, continuous
backup to Amazon S3, and replication across three Availability Zones (AZs).
For Amazon Aurora, each Read Replica is associated with a priority tier (0-15). In the
event of a failover, Amazon Aurora will promote the Read Replica that has the highest
priority (the lowest numbered tier). If two or more Aurora Replicas share the same
priority, then Amazon RDS promotes the replica that is largest in size. If two or more
Aurora Replicas share the same priority and size, then Amazon Aurora promotes an
arbitrary replica in the same promotion tier.
Amazon Redshift
Redshift is based on PostgreSQL, but it’s not used for OLTP
• It’s OLAP – online analytical processing (analytics and data warehousing)
• 10x better performance than other data warehouses, scale to PBs of data
• Columnar storage of data (instead of row based) & parallel query engine
• Pay as you go based on the instances provisioned
• Has a SQL interface for performing the queries
• BI tools such as Amazon Quicksight or Tableau integrate with it
• vs Athena: faster queries / joins / aggregations thanks to indexes
High Performance: Amazon Redshift is designed for fast querying and analysis of large
datasets. It is optimized for running complex queries across multiple nodes in a cluster.
Fully Managed: Amazon Redshift is a fully managed service that handles all the heavy
lifting of provisioning, configuring, monitoring, and scaling your data warehouse.
Petabyte-scale Data Warehousing: Amazon Redshift can store and query petabyte-
scale data in a single cluster. It uses columnar storage to improve query performance
and reduce I/O.
Massively Parallel Processing (MPP): Amazon Redshift uses MPP to parallelize queries
across multiple nodes in a cluster, which allows for fast query processing and
scalability.
Advanced Compression: Amazon Redshift uses advanced compression techniques to
reduce the amount of storage required for your data, which can help lower your
storage costs.
Automated Backups: Amazon Redshift automatically backs up your data warehouse
and allows you to restore your data to any point in time within the retention period.
Encryption: Amazon Redshift provides encryption of data in transit and at rest using
industry-standard AES-256 encryption.
Integration with BI Tools: Amazon Redshift integrates with popular business
intelligence (BI) tools such as Tableau, Looker, and Power BI, making it easy to visualize
and analyze your data.
Pricing: Amazon Redshift offers a pay-as-you-go model, allowing you to only pay for
what you use. Additionally, you can leverage Reserved Instances to reduce your costs
further.
Overall, Amazon Redshift provides a scalable, high-performance, and cost-effective
data warehousing solution that is easy to use and integrates with popular BI tools.
ENCRYPTION
Amazon S3-Managed Keys (SSE-S3) Server-Side
Encryption
Key Management: manages the encryption keys used to encrypt and decrypt your
data. SSE-S3 manages encryption keys internally and automatically.
Key Control: you have limited control over the encryption keys because they are
managed by Amazon S3.
Encryption Options: SSE-S3 uses AES-256 encryption for data at rest.
Key Features: With SSE-KMS, you can take advantage of additional features provided
by AWS KMS such as key rotation, audit logs, and key policies SSE-S3 does not offer
these additional features.
Pricing: SSE-S3 is included in the standard Amazon S3 pricing. is a simpler and more
cost-effective option for encrypting your data at rest in Amazon S3.
Compliance: If you have specific compliance requirements for encryption key
management, such as HIPAA or PCI DSS, then you may need to use SSE-KMS to meet
those requirements.
In summary, EFA is optimized for HPC workloads that require low-latency, high-
bandwidth communication, ENA is designed for a wide range of workloads that require
high throughput and low latency, ENI provides additional network interfaces for EC2
instances, and VIF is used to establish a private, secure connection between an on-
premises data center and an AWS VPC.
AWS CloudFormation
provides a common language for you to describe and provision all the infrastructure
resources in your cloud environment. CloudFormation allows you to use a simple text
file to model and provision, in an automated and secure manner, all the resources
needed for your applications across all regions and accounts. This file serves as the
single source of truth for your cloud environment. AWS CloudFormation is available at
no additional charge, and you pay only for the AWS resources needed to run your
applications.
Amazon SNS is a fully managed messaging service that enables you to send messages
to a large number of subscribers, or endpoints, simultaneously. You can use SNS to
send push notifications, SMS text messages, and email messages to your subscribers.
SNS works on a publish-subscribe model. A publisher sends a message to a topic, and
SNS delivers the message to all subscribers of that topic. Topics can be created
dynamically, and subscribers can be added or removed dynamically as well. SNS is a
highly scalable service, and it can handle millions of messages per second.
Here are some key features of Amazon SNS:
Multi-protocol support: SNS supports multiple messaging protocols, including HTTP,
HTTPS, email, SMS, and mobile push notifications.
Fanout: SNS allows you to send a message to multiple subscribers simultaneously. This
feature is particularly useful when you need to broadcast a message to a large number
of subscribers.
Filtering: SNS allows you to filter messages based on their attributes. This can help you
reduce costs and increase efficiency by sending only relevant messages to your
subscribers.
Mobile push notifications: SNS provides an easy way to send push notifications to
mobile devices, including iOS, Android, and Amazon Fire OS devices.
Message attributes: SNS allows you to add custom attributes to messages, which can
be used for filtering and routing.
Message encryption: SNS supports encryption of messages in transit and at rest,
ensuring that your data is always secure.
Cross-region replication: SNS allows you to replicate your messages across regions for
increased durability and availability.
Dead-letter queues: SNS provides a dead-letter queue for messages that can't be
delivered to subscribers. This feature helps you debug issues and ensures that no
messages are lost.
CloudTrail integration: SNS integrates with AWS CloudTrail to provide audit logs of all
SNS API calls.