Blog   |   Automation   |   July 6, 2012

Summer Storms, Cloud Service Disruptions, and the Lessons Learned

Recent storms in the Virginia-D.C. area resulted in a temporary outage of AWS service. Evidently, a lighting storm and failed back-up generator caused a service disruption. This, in turn, brought coverage that questioned the use of the cloud (see also: AWS Outage).
The general narrative is that this problem illustrated the risks of using the cloud. The conventional interpretation is that the outage illustrated the problems with outsourcing, the problems with using a vendor, and the problems that surround relinquishing complete control. That narrative focuses on the service disruptions and is completely wrong.
The simple reality is that natural disasters occur. There is no avoiding that. And, when they occur, there may be service disruptions.  Thus, the correct question is not: was there a natural disaster related service disruption? Instead, the correct question is: would your enterprise have fared better with its own data center or private cloud?

“Instead, the correct question is: would your enterprise have fared better with its own data center or private cloud?”

With this instance as a demonstrable example, it becomes plainly evident that using a large public cloud presents a distinct advantage when compared to using your own data center or a private cloud.
While over 4 million people in the area were without power for multiple days, AWS promptly and effectively responded to the issue. It redistributed service and was functional for nearly all of its customers in less than 24 hours. In fact, many D.C. –area residents could use Instagram to share pictures on the Web even before they had lights or safe drinking water. Plainly, the episode illustrated the durability of public cloud service.
So the question remains: would you have performed equally well? Or, would you have been beholden to local power? Would you be beholden to local transport? Could you have redistributed your needs across the country to unaffected areas? Or, would you have been trapped? Would you have been fully functional in less than 24 hours? And, importantly, who would have borne the cost for the repairs?

“By using the multiple availability zones, AWS customers endured far less disruption than non-users.”

The answer to those questions is obvious. The overwhelming majority of enterprises could not have responded equally well. The overwhelming majority of enterprises would not have recovered as quickly. Even with geographically appropriate disaster recovery plans, the disruption to individual data center and private cloud users would be significantly greater. And the repair-related costs borne by these individual users would have been enormous.
Even more importantly, the AWS experience illustrated the power of spreading your resources across regions prior to a catastrophe. By using the multiple availability zones, AWS customers endured far less disruption than non-users. Data center and private cloud users simply do not enjoy this advantage.
Thus, we draw a counter-narrative conclusion from the incident: using a large public cloud provider is actually safer from a service perspective. The availability and use of distributed resources across multiple zones reduces freak-accident outage risk. Service availability is better, not worse, for public cloud users. Also, we believe the incident illustrates how using a large cloud provider reduces the significant cost and repair component related to outages.
In short, the prevailing narrative completely misinterpreted the outage. If anything, the storms and resultant outage illustrated the power of using large public clouds.

 

Keep up with the Latest in Cloud

Check out our Resources Center for cloud industry news, research, webinars, and more.