AWS S3 Outage Shows We’re All In This Together

March 2nd, 2017

There’s no upside to downtime. Many cloud experts are responding to Amazon’s recent outage by touting the reliability of Amazon’s S3 storage solutions and the AWS Cloud in general, and they’re correct when they do so. Amazon has a deservedly solid reputation for availability (they claim 99.99% uptime).

But the truth is, the news will always cover airplane crashes and not the number of planes that safely take off and land every minute of every day. Make no mistake: we are so dependent on the cloud that one day, such an outage could become a matter of life or death. Suppose a First Responder needs to get directions, or a doctor needs to communicate with team members, but the internet is down? The cloud is increasingly becoming indispensable in our lives—but is that necessarily a good thing?

The “All Your Eggs” Argument

One argument cloud skeptics are making is that too much of the internet is in one provider’s hands. In other words, all of our eggs are in one basket. Technically that is not the case, thanks to Microsoft Azure, Google, VMWare, and many other public cloud providers. Certainly many, if not most, of our eggs are in Amazon’s basket: Forrester Research has reported that Amazon earns 42% of cloud market revenue. Despite that, the majority of sites, including AWS-based ones, worked fine during the now-infamous #AWSoutage because they weren’t entirely reliant on Amazon’s storage service. That allowed those sites to stay in communication with end users, explaining via email, Twitter, or other means that specific services were offline.

By contrast, businesses that don’t use the cloud are likely to have more downtime than Amazon. If an internet connection fails at a self-hosted firm, then all of their websites, apps, email, and even VoIP phone service will fail, but end users would have no idea why there was a failure and no way to reach the business.

Ultimately, placing all, or most, of our eggs in the public cloud basket seems to be our safest bet for continuous and consistent performance.

The “Shared Experience” Argument

The other side of the coin—arguing in favor of critical mass operations in the cloud—is that because so many sites were affected, the outage became common knowledge. When people noticed that Slack, Trello, and many of their favorite apps stopped working all at once, they jumped to the conclusion that something was “wrong” with the internet.  Much like the days of three-channel television (with all of America tuning in to the same episode of M*A*S*H at once, for example), the AWS outage fostered a shared experience. Clearly, not all shared experiences are good. But, there is safety—or at least moral support, as illustrated by the quantity of clever tweet memes—in numbers. In the immortal words of Troy in High School Musical, “we’re all in this together.”

The Bottom Line

Whether you run your business in one or many cloud environments, the real danger lies in a lack of visibility. To mitigate and prevent the potential for downtime, you need to know what the state of your infrastructure is. Customers expect consistency—and organizations need to maintain that at any cost.

CloudCheckr can proactively alert you to potential downtime, cyberattacks, performance bottlenecks, and other potential threats. Over 380 best practice recommendations help avoid these issues and mitigate risks. Try CloudCheckr free for 14 days to see how we can optimize visibility and control for your cloud.