6 Common Availability Issues

February 29th, 2016

As Werner Vogels has famously said: “Everything fails all the time.”

Unfortunately, many users ignore his admonition and put off addressing minor misconfiguration errors until availability demands force an immediate — and often frantic – fix. Users should institute a regimen to regularly monitor and fix their configuration issues to ensure optimum AWS performance, but of course that takes time and they don’t.

Well, we here at CloudCheckr are in the business of monitoring and have the benefit of seeing 1000s of AWS users and their misconfigurations.  With our visibility, we wanted to take the opportunity to highlight a few of the most common—in the hope that you will avoid them!

So, please check out 6 of the most common AWS availability issues, explanations of why they are important, and ways that they can be addressed.

Alerts - Availability

 1. RDS DB Instances with Less Than 10% of Free Storage

The Relational Database Service (RDS) provides access to the capabilities of MySQL, Oracle, PostgreSQL, and Microsoft SQL Server database engines while taking care of much of the administrative tasks associated with each. Patches to the databases, as well as backups, are all handled automatically by AWS.

Data storage in Amazon RDS is specified by providing a storage size (GB) and optionally selecting Provisioned IOPS when you create a new DB instance. Specifying just the storage size allocates standard storage. Standard storage is not reserved for the DB instance, so performance can vary greatly depending on the demands placed on shared resources by other customers. Provisioned IOPS differs from standard storage in that the specified IO capacity is reserved for the DB instance. By reserving IO capacity for your instance, Amazon RDS ensures that disk resources are available when you need them independent of other customer activity.

If your DB instance is running out of available storage you will want to either increase the storage capacity of the current DB or launch a new DB.

2. Elastic Load Balancers with Fewer Than Two Healthy Instances

Elastic Load Balancing is designed to automatically distribute incoming traffic across multiple EC2 instances. Load Balancing can be setup across multiple Availability Zones, so if one datacenter fails, traffic is re-routed to one that is operational. It ensures that no instances become overloaded with too many requests. It also directs traffic away from any unhealthy instances, and spreads the load to those that are healthy. Load Balancers will consider an instance unhealthy if the instance is closing the connection to the load balancer, responses are timing out, or if public key authentication is failing.

There should always be a minimum of two healthy instances associated with a load balancer. If there is only one, the load balancer will not be able to failover, as it will not be able to reroute traffic to any other instances and users should immediately associate a second instance.

3. SES Email Addresses with Failure Status

Simple Email Service (SES) is an outbound-only email-sending service, which provides sending statistics and built-in notifications for bounces, complaints, and deliveries.

SES requires that users verify their email address or domain, to confirm that they own it and to prevent others from using it. When an entire domain is verified, all email addresses from that domain are also verified, users don’t need to verify email addresses from that domain individually.

If the DNS settings are not correctly updated, SES will be unable to verify the email address(es). Unverified addresses will display a status of “failed” in the Email Addresses tab. If this happens, users can update their DNS settings and force SES to retry verification.

4. Automatic RDS Database Backups are Disabled

Automated RDS backups, which are enabled by default, backup the databases and store the backups for a user-defined period of time. The backup process, which runs daily, can be configured to run during a user-defined window. AWS stores the backups within S3, across multiple Availability Zones, to provide high levels of data durability.

All RDS database instances should be backed up daily as a precaution against any database failures, or other issues. Any databases that are not being backed up automatically should be reviewed.

5. EC2 Instances with Failed Status Checks

EC2 provides a virtual computing environment where instances can be launched using a wide variety of operating systems and configuration options. AWS users can run their custom applications on these instances, while maintaining full control over their security access.

AWS will perform instance status checks, which are designed to detect any problems with your EC2 instances by monitoring their software and network configuration. These issues can include: exhausted memory, corrupted file system, incompatible kernel, or a network misconfiguration.

If this problem occurs, users can generally resolve it by rebooting the instance or by making modifications to the instance’s operating system.

6. RDS Instances that are Configured to Retain Backups for Fewer Than 30 Days

The Relational Database Service (RDS) patches to the databases, as well as backups, are all handled automatically by AWS. RDS can automatically backup all of the data, and transaction logs, within each database instance during a pre-defined backup window. These backups, which are stored within S3, can be used to initiate a point-in-time recovery.

By default, the backups are stored for only one day. However, a user-specified retention period can be established for as many as 35 days. The length of the retention period will be dependent on the customer, and the importance of the data being backed up. In general, however, users are best served creating a longer back period to protect against accidental loss of data.

Conclusions:

Yes, monitoring takes time. But, we all went to AWS to capitalize on its agility and elasticity. Why would we wait for failure to do so? Instead, start a regimen today. Create a process and regularly check your environment!

You can do it either manually or use CloudCheckr to do it automatically. What we do is continuously monitor your environment so that you can take a proactive approach. Not only will you get alerts when instances are becoming unhealthy, or availability zones are becoming unreachable, but you will also receive daily reports to ensure that your architecture is properly configured before things go awry.

See what CloudCheckr can do for you with a free trial.