davorteamNotes

Elastic Compute Cloud (EC2)

Elastic Compute Cloud (EC2) is a web service that provides compute capacity in the AWS cloud. At its center is the EC2 virtual server, known as an instance.

AMI is an Amazon Machine Image. It is available in only a single region.

When your EC2 instance boots you can optionally execute script files and such bring your instance to any desired state. This is important and necessary in case you use Auto Scaling.

The data you store on its Elastic Block Store (EBS) volumes have 99.999 percent availability.

If your applications will require intense rates of I/O operations, then you should consider provisioned IOPS, which provides a maximum IOPS/volume of 32,000 and a maximum throughput/volume of 500 MB/s.

For regular server workloads that deliver low-latency performance, general-purpose SSDs will work well. You’ll get a maximum of 10,000 IOPS/volume and a maximum throughput/volume of 250 MB/s.

Throughput-optimized HDD volumes can provide reduced costs with acceptable performance where you’re looking for throughput-intensive workloads including log processing and big data operations. These volumes can deliver only 500 IOPS/volume but with a 500 MB/s maximum throughput/volume.

All EBS volumes can be copied by creating a snapshot. You can also generate an AMI image directly from a running instance. To be sure no data is lost, it’s best to shut down the instance first.

How to get parameters of the running EC2 instance?

Running the following curl command from the command line while logged into the instance

$ curl http://169.254.169.254/latest/meta-data/

AWS provides four tools to help you protect your EC2 instances from unauthorized use: security groups, Identity and Access Management (IAM) roles, network address translation (NAT) instances, and key pairs.

Simple Storage Service (S3)

Amazon Simple Storage Service (S3) is a repository for internet data. It’s an excellent platform for the following:

Maintaining backup archives, log files, and disaster recovery images
Running analytics on big data at rest
Hosting static websites

You can access a file called filename that’s in a bucket called bucketname over HTTPS:

https://s3.amazonaws.com/bucketname/filename

or using the AWS CLI:

s3://bucketname/filename

There is no theoretical limit to the total amount of data you can store within a bucket. A single object may be no larger than 5 TB. Individual uploads can be no larger than 5 GB. AWS recommends that you use a feature called Multipart Upload for any object larger than 100 MB.

S3 measures durability as a percentage. The high durability rates delivered by S3 are largely because they automatically replicate your data across at least three availability zones.

Object availability is measured as a percentage, it’s the percentage you can expect a given object to be instantly available on request through the course of a full year. The Amazon S3 Standard class guarantees that your data will be ready whenever you need it for 99.99% of the year. That means there will be less than nine hours each year of down time.

Durability and Availability

	S3 Standard	S3 Standard-IA	S3 One Zone-IA	Reduced Redundancy
Duarbility guarantee	99.999999999%	99.999999999%	99.999999999%	99.99%
Availability guarantee	99.99%	99.9%	99.5%	99.99%
Availability zones	>= 3	>= 3	1	>= 3

You can open up access at the bucket and object levels using access control list (ACL) rules, S3 bucket policies, or Identity and Access Management (IAM) policies.

The following code is an example of an S3 bucket policy


{
    "Version": "2012-10-17", 
    "Statement": [{
        "Effect": "Allow", 
        "Principal": {
            "AWS":["arn:aws:iam::xxxxxxxxxxxx:root",
                "arn:aws:iam::xxxxxxxxxxxx:user/Joe"
            ]
        },
        "Action": "s3:*",
        "Resource": ["arn:aws:s3:::MyBucket",
            "arn:aws:s3:::MyBucket/*"]
    }]
}

When it’s attached to an IAM entity (a user, group, or role), the following IAM policy will accomplish the same thing as the previous S3 bucket policy


{
    "Version": "2012-10-17", 
    "Statement": [{
        "Effect": "Allow", 
        "Action": "s3:*",
        "Resource": ["arn:aws:s3:::MyBucket",
            "arn:aws:s3:::MyBucket/*"]
    }]
}

Glacier supports archives as large as 40 TB and its archives are encrypted by default. Glacier archives are given machine-generated IDs.

The Elastic File System (EFS) provides automatically scalable and shareable file storage. The object is to make it easy to enable secure, low-latency, and durable file sharing among multiple instances.

To move terabyte or even petabyte-scaled data for backup or active use within AWS, ordering a Snowball device might be the best option.

Virtual Private Cloud (VPC)

A VPC is a virtual network that can contain EC2 instances and other network resources. Every VPC is isolated from all other networks, and you can connect your VPC to the Internet and/or other VPCs.

A VPC consists of at least one range of contiguous IP addresses and this address range is represented as a Classless interdomain routing (CIDR) block.

The prefix length of a VPC CIDR can range from /16 to /28.

It is best to use one in the RFC 1918 range to avoid conflicts with public Internet addresses.

10.0.0.0-10.255.255.255 (10.0.0.0/8)
172.16.0.0-172.31.255.255 (172.16.0.0/12)
192.168.0.0-192.168.255.255 (192.168.0.0/16)

You can’t choose your own IPv6 CIDR. Instead, AWS assigns one to your VPC at your request. The IPv6 CIDR will be a publicly routable prefix from the global unicast IPv6 address space. The prefix length of an IPv6 VPC CIDR is always /56.

Once you create an instance in a subnet, you can’t move it, you can terminate it and create a different instance in another subnet.

AWS reserves the first four and last IP addresses in every subnet.

A subnet can exist within only one availability zone (AZ), and the subnets that make the VPC must all be in the same region.

The prefix length for an IPv6 subnet is fixed at /64.

An Elastic Network Interface (ENI) allows an instance to communicate with other resources. An ENI performs the same basic function as a network interface on a physical server.

An Internet Gateway (IG) gives instances the ability to receive a public IP address, connect to the Internet, and receive requests from the Internet. To use an Internet gateway, you must create a default route in a route table that points to the Internet gateway as a target.

Virtual routers do not exist as AWS resources, there is only an implied router and you only have to manage the route table which the implied router uses.

There are public and private subnets. A public subnet has a route pointing to an Internet gateway. A private subnet does not have such a route.

To protect your network you use security groups or network access control list (NACL). Security group acts as a stateful firewall. A NACL functions as a stateless firewall in that it contains inbound and outbound rules to allow traffic based on a source or destination CIDR, protocol, and port.

A security group is attached to an ENI and a NACL is attached to a subnet. This means that NACLs can’t be used to control traffic between instances in the same subnet. If you want to do that, you have to use security groups.

An elastic IP address (EIP) is a type of public IP address that AWS allocates to your account when you request it.

Network Address Translation (NAT) occurs at the Internet gateway. There are two other resources that can also perform NAT:

NAT gateway
NAT instance

You can connect EC2 instances from a private subnet to the same NAT device (e.g. a NAT gateway) that is in the public subnet, thus sharing the same public IP address for outbound connections.

A NAT instance can be used as a bastion host, sometimes called a jump host, to connect to instances that don’t have a public IP. You can’t do this with a NAT gateway.

You can configure VPC peering to allow instances in one VPC to communicate with instances in another VPC over the private AWS network.

AWS CLI

The AWS CLI is an open source tool built on top of the AWS SDK for Python that provides commands for interacting with AWS services. With minimal configuration, you can start using all of the functionality provided by the AWS Management Console from your favorite terminal program.

How to install the AWS Command Line Interface on macOS?

You can install the latest version of Python and pip and then use them to install the AWS CLI


$ brew install python
$ python --version
$ curl -O https://bootstrap.pypa.io/get-pip.py
$ python3 get-pip.py --user
$ pip --version
$ pip3 install awscli --upgrade --user
$ aws --version

To upgrade to the latest version, run the installation command again

$ pip3 install awscli --upgrade --user

In case python, or pip or aws is not on the $PATH, modify this variable in a profile file e.g. ~/.bash_profile


$ which python
$ export PATH=<python or pip or aws path>:$PATH

How to install the AWS Command Line Interface on Linux?

Use the same steps as for macOS just install the python using the command


$ sudo yum install python
$ # repeat steps from macOS

RDS

Depending on its configuration, a relational database can fall into one of two categories: online transaction processing (OLTP) or online analytic processing (OLAP).

OLTP databases are suited to applications that read and write data frequently, on the order of multiple times per second.

OLAP databases are optimized for complex queries against large data sets.

To deploy a database using RDS, you start by configuring a database instance, which is an isolated database environment. A database instance exists in a virtual private cloud (VPC) that you specify and AWS fully manages database instances.

Database engines

RDS offers the following six database engines to choose from:

MySQL
MariaDB (MySQL compatible)
PostgreSQL (Oracle alternative)
Aurora (MySQL and PostgreSQL compatible)
Oracle
Microsoft SQL

Licensing considerations

RDS provides two models for licensing the database engine software you run. The license included model covers the cost of the license in the pricing for an RDS instance. The bring your own license (BYOL) model requires you to obtain a license for the database engine you run.

License included: MySQL, MariaDB, PostgreSQL, Microsoft SQL, Oracle Standard Edition One (SE1) and Two (SE2)
Bring your own license: Oracle Enterprise Edition (EE), SE, SE1, SE2

RDS divides database instance classes into the following three types.

Standard
Memory optimized
Burstable

AWS measures storage performance in input/output operations per second (IOPS). An input/output (I/O) operation is either a read from or write to storage.

MySQL and MariaDB have a page size of 16 KB. Oracle, PostgreSQL, and Microsoft SQL Server use a page size of 8 KB. Writing 16 KB of data using one of those database engines would consume two I/O operations.

For most databases, general-purpose SSD (gp2) storage is sufficient. You can allocate a volume of up to 16 TB. For each gigabyte of data that you allocate to a volume, RDS allocates that volume a baseline performance of three IOPS, up to a total of 10,000 IOPS per volume.

If you think you might occasionally need up to 3000 IOPS, but don’t need a lot of storage, you don’t have to over-allocate storage just to get your desired number of IOPS. Volumes smaller than 1 TB can temporarily burst to 3,000 IOPS. The duration of the burst is determined by the following formula:

Burst duration in seconds = (Credit balance)/[3,000 – 3 * (storage size in GB)]

When you initially boot a database instance, you get a credit balance of 5,400,000 IOPS. The credit balance is increased at a rate of one baseline IOPS every second during the time when there is no database activity.

Provisioned IOPS SSD lets you simply allocate the number of IOPS you need when you create your instance.

The maximum number of IOPS you can achieve and how much storage you can allocate are constrained by the database engine you select. Oracle, PostgreSQL, MariaDB, MySQL, and Aurora let you choose 100 GB to 16 TB of storage and allocate 1,000 to 40,000 provisioned IOPS. Microsoft SQL Server gives you up to 16 TB of storage and lets you choose between 1,000 and 32,000 provisioned IOPS. The ratio of IOPS to storage in gigabytes (IOPS:GB) must be at least 50:1. For example, if you want 32,000 IOPS, you must provision at least 640 GB of storage.

RDS offers magnetic storage for backward compatibility with older instances. It’s limited to a maximum size of 4 TB and 1,000 IOPS.

Scaling vertically — also called scaling up — is a straightforward approach. You simply throw more resources at your database instance and don’t have to make any changes to your application or databases.

Scaling horizontally - also known as scaling out - entails creating additional database instances called read replicas. All database engines except for Oracle and Microsoft SQL Server support read replicas. Aurora exclusively supports a specific type of read replica called an Aurora replica.

To keep your database continuously available in the event of a database instance outage, you can use so called multi-AZ deployment. If the primary instance experiences an outage, it will failover to the standby instance, usually within two minutes.

RDS can automatically create snapshots of your instances daily during a 30-minute backup window.

Redshift

Redshift is a managed data warehouse solution designed for OLAP databases. Although it’s based on PostgreSQL, it’s not part of RDS. Redshift uses columnar storage, meaning that it stores the values for a column close together. This improves storage speed and efficiency and makes it faster to query data from individual columns. Redshift supports ODBC and JDBC database connectors.

DynamoDB

DynamoDB is a fully managed NoSQL database.

When you create a table, you must specify a primary key and a data type. Because the primary key uniquely identifies an item in the table, its value must be unique within the table. There are two types of primary keys you can create.

A partition key, also known as a hash key, is a primary key that contains a single value. When you use only a partition key as a primary key, it’s called a simple primary key. A partition key can store no more than 2,048 B.

A primary key can also be a combination of two values: a partition key and a sort key. This is called a composite primary key. The partition key doesn’t have to be unique, but the combination of the partition key and sort key must be unique. A sort key can store no more than 1,024 B.

When a lot of read or write activity occurs against an item, the partition the item exists in is said to be a hot partition. This can negatively affect performance. To avoid hot partitions, try to make your partition keys as unique as possible.

Each key-value pair composes an attribute, and one or more attributes make up an item. DynamoDB can store an item size of up to 400 KB.

When creating a table, you must specify the number of reads and writes per second your application will require. This is called provisioned throughput. DynamoDB reserves partitions based on the number of read capacity units (RCUs) and write capacity units (WCUs) you specify when creating a table.

For an item up to 4 KB in size, one RCU buys you one strongly consistent read per second. If you use an eventually consistent read, one RCU buys you two eventually consistent reads per second. When it comes to writing data, one WCU gives you one write per second for an item up to 1 KB in size.

You can configure Auto Scaling to automatically increase your provisioned throughput when it gets close to hitting a defined threshold.

DynamoDB provides two different operations to let you read data from a table. A scan lists all items in a table. It’s a read-intensive operation and can potentially consume all of your provisioned capacity units.

Secondary indexes solve issues with querying data from DynamoDB. There are two types of secondary indexes.

You can create a global secondary index (GSI) any time after creating a table. In a global secondary index, the partition and hash keys can be different than the base table.

A local secondary index (LSI) must be created at the same time as the base table. You also cannot delete a local secondary index after you’ve created it. The partition key must always be the same as the base table, but the sort key can be different.

Identity and Access Management

Identity and Access Management (IAM) is a web service for securely controlling access to AWS resources.

An identity represents an user or a role. Roles can be assigned to an application, service, user, or group.

Identities can also be federated. They are controlled by attaching policies that precisely define the way they’ll be able to interact with all the resources in your AWS account. You can attach policies to either principals (identity-based policies) or resources (resource-based policies).

An IAM policy is a document that identifies one or more actions as they relate to one or more AWS resources. Finally, the policy document determines the effect permitted by the action on the resource. The value of an effect will be either Allow or Deny.

A single IAM policy can be associated with any number of identities, and a single identity can have as many as 10 managed policies attached to it. A policy can be no greater than 6,144 characters.

The Amazon Cognito, AWS Managed Microsoft AD, and AWS Single Sign-On services are for handling user authentication, while AWS Key Management Service (KMS), AWS Secrets Manager, and AWS CloudHSM simplify the administration of encryption keys and authentication secrets.

CloudTrail

CloudTrail keeps detailed logs of every read or write action that occurs against your AWS resources.

An event is a record of an action that a principal performs against an AWS resource. CloudTrail logs both API and non-API actions and classifies events into management events and data events.

Management events are grouped into write-only and read-only events.

Data events track two types of operations that tend to be high volume: S3 object-level activity and Lambda function executions. For S3 object-level operations, CloudTrail distinguishes read-only and write-only events.

CloudTrail logs 90 days of management events and stores them in a viewable, searchable, and downloadable database called the event history. The event history does not include data events. If you want to store more than 90 days of event history or add data events to the event history, you can create a trail. A trail is a configuration that records specified events and delivers them as CloudTrail log files to an S3 bucket. A log file contains one or more log entries in JavaScript Object Notation (JSON) format.

You can create up to five trails for a single region.

CloudTrail provides a means to ensure that no log files were modified or deleted after creation. This is useful in forensic investigations where someone with access to the S3 bucket may have tampered with the log file. Every hour, CloudTrail creates a separate file called a digest file that contains the cryptographic hashes of all log files delivered within the last hour. CloudTrail places this file in the same bucket as the log files but in a separate folder. You can validate the integrity of CloudTrail log and digest files by using the AWS CLI.

CloudWatch

CloudWatch collects numeric performance metrics from AWS and non-AWS resources and lets you search them and send you a notification alert or take an action when a metric crosses a threshold.

All AWS resources send their metrics to CloudWatch. For example:

EC2 instance CPU utilization
EBS volume read and write IOPS
S3 bucket sizes
DynamoDB consumed read and write capacity units

You can also send custom metrics to CloudWatch from your applications and on-premises servers.

The metrics are organized into namespaces and use the format AWS/service. For example, AWS/EC2 is the namespace for metrics from EC2, and AWS/S3 is the namespace for metrics from S3.

Metrics exist only in the region in which they were created. A metric functions as a variable and contains a time-ordered set of data points. Each data point contains a timestamp, a value, and optionally a unit of measure. Each metric is uniquely defined by a namespace, a name, and optionally a dimension. A dimension is a name-value pair that distinguishes metrics with the same name and namespace from one another.

Most services support basic monitoring, and some support basic monitoring and detailed monitoring. Basic monitoring sends metrics to CloudWatch every five minutes and detailed monitoring send metrics every minute. More than 70 services support detailed monitoring including EC2, EBS, RDS, DynamoDB, ECS, and Lambda.

CloudWatch can store custom metrics with up to one second resolution. Metrics with a resolution of less than one minute are high-resolution metrics. When publishing a custom metric, you can specify the timestamp to be up to two weeks in the past or up to two hours into the future. If you don’t specify a timestamp, CloudWatch creates one based on the time it received the metric in UTC.

Metric Store Duration Table

Metric resolution	Store duration	Metric type
1 second	3 hours	high-resolution
1 minute	15 days	detailed monitoring
5 minutes	63 days	basic monitoring
1 hour	15 months	basic monitoring

CloudWatch lets you visualize your metrics by graphing data points over time. The time range in a graph can be between 1 minute and 15 months.

You can perform various mathematical functions against metrics and graph them as a new time series. These include addition, subtraction, multiplication, division, and exponentiation.

CloudWatch provides also the statistical functions that you can use in math expressions:

AVG - Avergae
MAX - Maximum
MIN - Minimum
STDDEV - Standard deviation
SUM - Sum

A CloudWatch alarm watches over a single metric and performs an action based on its value over a period of time. The action it takes can include things such as sending an email notification, rebooting an instance, or executing an Auto Scaling action.

An alarm can be in one of the three following states at any given time:

ALARM
OK
INSUFFICIENT_DATA

You can configure an alarm to take an action when it transitions to a given state.

The Simple Notification Service (SNS) uses communication channels called topics. A topic allows a sender or publisher to send a notification to one or more recipients called subscribers. A subscriber consists of a protocol and an endpoint. The protocol can be HTTP, HTTPS, Simple Queue Service (SQS), Lambda, a mobile push notification, email, email-JSON, or short message service (SMS). The endpoint depends on the protocol. In the case of email or email-JSON, the endpoint would be an email address. In the case of SQS, it would be a queue. The endpoint for HTTP or HTTPS would be a URL.

Config

Config tracks how your AWS resources are configured and how they change over time.

The configuration recorder discovers your existing resources, records how they’re configured, monitors for changes, and tracks those changes over time. You can have only one configuration recorder per region.

The configuration recorder generates a configuration item for each resource it monitors. The item contains the settings for that resource as well as the resource type, its ARN, and when it was created. Configuration items are used to build a configuration history for each resource.

Every six hours in which a change occurs to a resource, a configuration history file is delivered to an S3 bucket that you specify. The S3 bucket is part of the delivery channel.

A snapshot of all configuration items from a given point in time is called a configuration snapshot.

AWS Config can record software inventory changes on EC2 instances and on-premises servers.

You have to configure CloudWatch and Config before they can begin monitoring your resources. CloudTrail automatically logs only the last 90 days of management events. So, consider to configure these services early on in your AWS deployment.

Route 53

Route 53 provides more than just basic DNS services. In fact, it focuses on four distinct areas: domain registration, DNS management, availability monitoring, and routing policies. The 53 in Route 53 is used from the fact that DNS traffic uses network port 53 to do its job.

Associating a domain name like example.com with its actual IP address is the job of a name server and every computer has a simple name server database.

There are public DNS servers like Google’s at 8.8.8.8 or OpenDNS at 208.67.222.222. Their job is to provide an IP address matching the domain name you specified, so your application can complete your request.

The term domain is one or more servers, data repositories, or other digital resources identified by a single domain name.

Propagating domain name data among name servers is the job of a domain name registrar. Registrars work with registry operators so that domain registrations should be globally authoritative. Amazon Route 53 is also a domain name registrar.

The rightmost text of every domain address (like .com, .org, or .at) indicates the top-level domain (TLD). The name to the left of the TLD (the example part of example.com) is called the second-level domain (SLD).

A subdomain identifies a subset of a domain’s resources.

A fully qualified domain name (FQDN) contains the absolute location of the domain including a subdomain and the TLD. By conventioni, it will often require a trailing dot after the TLD.

A zone (or hosted zone) is a subset of a DNS domain. A zone file is a text file that describes the way resources within the zone should be mapped to DNS addresses within the domain. The file consists of resource records.

The record type you enter in a zone file’s resource record will determine how the record’s data is formatted and how it should be used. There are currently around 40 types in active use.

Some DNS record types

Type	Function
A	IPv4 IP address
CNAME	Canonical name - allows you to define one host name as an alias for another
AAAA	IPv6 IP address
NS	Identifies a name server to be used by a zone
SOA	Start of authority - Defines a zone's authoritative meta information

When creating a new record set, you’re given the option of choosing a routing policy.

Routing policies

Policy name	Description
Simple routing	It routes all requests to the IP address or domain name you assign it. Simple is the default routing policy.
Weighted routing	A weighted policy will route traffic among multiple resources according to the ratio you set.
Latency routing	Latency-based routing lets you leverage resources running in multiple AWS regions to provide service to clients from the instances that will deliver the best experience.
Failover routing	A failover routing policy will direct traffic to the resource you identify as primary as long as health checks confirm that the resource is running properly.
Geolocation routing	It uses the continent, country where the request originated to decide what resource to send.
Multivalue answer routing	It combines a health check configuration with multivalue routing to make a deployment more highly available.

CloudFront

Amazon’s global content delivery network (CDN) CloudFront can also help solve one of the primary problems addressed by Route 53: getting your content into the hands of your users as quickly as possible.

A CDN maintains a network of physical edge locations placed geographically close to the end users who are likely to request content. When you configure the way you want your content delivered—as part of what AWS calls a CloudFront distribution—you define how you want your content distributed through that network and how it should then be delivered to your users.

Reliability

Reliability or resiliency is the ability of an application to avoid failure and when it occurs, to recover from it quickly. The reliability is quantified in terms of availability. Availability is the percentage of time an application is performing as expected.

Availability

Percentage	Downtime in year
99%	3 days and 15:39 hours
99.9%	08:45 hours
99.95%	04:22 hours
99.99%	00:52 hours
99.999%	00:05 hours

The availability of an application depends of the application's components. If the application can not work without one component then the dependencies are called hard dependencies. To calculate the availability of such application, you multiply the availability of those dependencies.

To increase the availability of an application, you can use redundant components. To calculate the availability of redundant components you take 100 percent minus the component failure rate.

Components availability

Service	Availability
EC2	99.99%
RDS	99.95%
DynamoDB	99.99%
DynamoDB global tables	99.999%
Route 53	100%
Lambda	100%

EC2 Auto Scaling service uses either a launch configuration or a launch template. Launch templates are newer and can also be used to manually start an EC2 instance. With launch configurations this is not possible, they can be used only within EC2 Auto Scaling.

If an instance becomes unhealty, Auto Scaling will terminate it and create a new instance that will replace it. The healthy of an instance is determined using different ways: CloudWatch metrics, checking whether some files exist on web server, etc.

For scaling you can use dynamic scaling policies or scheduled actions. Dynamic scaling policies can be simple, step, or target. Scheduled actions are usefull if you have a predictable load pattern and want to adjust your capacity before demand hits.

Amazon Elastic File System (EFS) can be shared among EC2 instances or on-premises servers. EFSs are stored across multiple zones in a region, for backup store files to an S3 bucket.

EBS automatically replicates volumens across multiple AZs in a region. For backup take a snapshot of it.

Separating an application's components into multiple subnets is called multi-tier architecture.

Direct Connect offers access speeds of 1 to 10 Gbps with consistent latency. Use it if internet speed is slow.

Use CloudFormation to quickly rebuild a new solution or to practice application updates.

Performance

Scaling out or scaling horizontally involves adding new resources that will run identical workloads in parallel.

Use S3 Transfer Acceleration if you often transfer large files between local PCs and S3 buckets.

Use Application Load Balancer for HTTP and HTTPS traffic and Network Load Balancer for TCP traffic.

You can describe your AWS project using CloudFormation templates files. These files are either in JSON or YAML format.

Security

A principal is an entity that can take an action on an AWS resource. It can be any of the following:

root user
IAM user
IAM role

Use policies to define the permissions a principal has. AWS provides hundreds of prepackeged policies called managed policies.

Using Permissions Boundary you can limit permissions you can attach to a principal.

Roles are useful for allowing applications running on EC2 instances to access AWS resources.

Use CloudWatch logs to aggregate logs from different sources: CloudTrail, VPC Flow, RSD, Route 53 DNS Queries, Lambda code, etc. Then using Athena you can search these logs using SQL queries.

Operational

Using a single CloudFormation template you can define and deploy two different stacks for an application, one for developmen and another for production.