Both multi-AZ Auto Scaling and ELB are designed by AWS to insure high availability. As AWS says: “we provide a way to achieve a balanced group of EC2 instances that are spread across multiple Availability Zones for high availability, and provide a single entity for you to manage. In addition, we mitigate the problem of zones becoming unavailable or congested by temporarily allocating capacity in other zones and rebalancing the group back over time.”
As becomes apparent, this automatic reallocation of resources is important to keep in mind when users employ either Reserved Instances (RI) or Spot instances.
Before explaining the critical nature of this issue, let’s review how RI and Spot instance pricing function. When purchasing a RI reservation, users must specify platform (e.g. Windows SQL Server Standard), size (e.g. c1.xlarge), AZ (e.g. us-east-1b), VPC (in a VPC or not), and Tenancy (dedicated or not). All of these choices must be consistent for a newly launched instance to utilize a RI reservation. If they are not, you will be left paying for an On-Demand instance while your RI reservation sits idle.
When bidding for a Spot instance, the user is guaranteed availability as long as the Spot price does not exceed the bid price. Once the Spot price exceeds the bid price, the instance is terminated.
These are the operating constraints: exact match to use a RI reservation and price maximum for Spot instance.
The issue that arises with multi-AZ Auto Scaling and ELB is that, in return for the benefits of automating resource provisioning in response to user-defined demand, users cede manual control of where their instance launches and where the resource is assigned. Further, AWS states that “availability zone balance triumphs everything else.” Consequently, in the absence of a custom termination policy when scaling down, AWS will automatically balance resources.
Impacts for RI
This means that users who have not remained vigilant in matching their RI distribution to their Auto Scaling launch configurations or anticipate an asymmetrical distribution of resources with a Region, may be subject to a potentially cost-unfavorable shift during launch or termination that results in one AZ with unused RI reservations and another AZ using instances without available reservations.
As a practical matter, we have seen this happen to CloudCheckr customers. Many users have purchased Heavy RI within a single AZ with the plan of using the resources within that AZ. However, for one reason or another – whether it be their own custom launch and termination policies or AWS’ default balancing policy, have ended with the resource running in another AZ. The users are then left paying double — paying for the unused Heavy RI and paying for On-Demand instance in another AZ.
In summary, the risk of using RI with multi-AZ Auto Scaling or ELB without proper planning is one of increased and inefficient spend.
Impacts for Spot
The prioritization of balance is even more critical when thinking about using Spot instances. Users who employ Spot instances balance the risk of termination for the benefit of lower cost. Users can mitigate the termination risk by bidding above market.
However, most users are not aware that when Spot prices differ between AZ, Auto Scaling (or ELB) will not look for instances in the cheapest AZ or even instances in an AZ with prices below your bid price. Users could be bidding above the Spot price in one AZ but still lose availability because Auto Scaling sends the resource request to another AZ with a Spot price above the bid. This is because resource dispersion and balance is the priority in the AWS hierarchy. AWS is aware of this issue but, as of today, has not offered a solution to address it.
Consequently, the risk for Spot users is that Auto Scaling or ELB may actually decrease resource availability.
How to Mitigate These Risks
The solutions are imperfect.
RI users should recognize that multi-AZ usage requires balancing reservations across AZ. They need to understand their typical resource utilization patterns to assess and optimize the correct balance of RI reservations and non-RI instances. Users should also construct custom policies for termination that account for RI placement.
Spot users need to be aware of pricing patterns and not rely exclusively on Spot instances. Prices spikes and availability issues arise with unfortunate frequency and Spot users need to consider their usage and availability requirements carefully.
CloudCheckr can help automate these assessments. The utilization and trending reports provide the insight necessary to maintain availability. The cost recommendation reports will insure that users are not “overpaying” for that availability. The RI assessments will identify and alert users to mismatched and under utilized RI.