Cloud-based data storage is critical for applications to behave as expected. It’s also a common source of security vulnerabilities—data is the target of many hacks and unsecured data can be vulnerable to both internal and external hacks.
One of the advantages of the cloud is the ability to choose the type of storage that is best suited to your application’s specific needs, rather than making your application work with whatever storage option a storage admin had available. However, making efficient use of the storage options requires understanding the advantages and disadvantages of each type of storage. It’s also essential to consider storage provisioning an integral part of the application development process, rather than as an afterthought.
Storage is also often the least-portable part of an application, so if you follow a multi-cloud or hybrid cloud strategy—or simply want to avoid vendor lock-in—storage is a critical part of that strategy.
There are also storage-related operational challenges. Storage, unlike compute, degrades over time. Data storage ultimately is connected to a physical appliance, and as it is written and rewritten its capacity degrades. There can also be physical contaminants that cause data degradation. If you’re running a dynamic, containerized application in which clusters detach and reattach from storage resources, every attachment and detachment point is an opportunity for something to go wrong. A data storage strategy isn’t something that can be handled once and then ignored—storage resources need to be continuously monitored for cost-effectiveness, performance, and security vulnerabilities.
AWS vs Azure vs Google Cloud: Know Your Storage Options
AWS offers Amazon Simple Storage Service (Amazon S3) for object storage, Amazon Elastic Block Store (Amazon EBS) for block storage, Amazon Elastic File System (Amazon EFS) for file storage as well as disaster recovery, archive, and backup storage services. On Microsoft Azure, object storage is called Blob Storage, while block storage is called Azure Disk Storage. Azure Files provides file storage, and Azure likewise has separate storage options for archives, backups, and disaster recovery that assume the storage won’t be accessed frequently.
Whether you’re using AWS, Azure, or Google Cloud, the advantages and drawbacks to each type of storage are similar. File storage operates like the digital version of a file cabinet, in the same way, you store documents on your personal computer—organized in a logical hierarchy. File storage can handle just about any type of data and is easy to navigate, but difficult to scale.
Block storage chops data into blocks, and spreads those blocks strategically around multiple environments. When needed, the data blocks are identified by a unique identifier and reassembled. Block storage can be expensive and can only be connected to one instance at a time. Cloud vendors often offer block storage in HDD (Hard Disk Drives) and SSD (Solid State Disks) depending on throughput requirements.
With Object storage, the data is stored as ‘objects’ and multiple servers and clients can connect to the same object storage container, using its web address. Object storage can handle detailed metadata and it scales easily, especially in the cloud, making it cost-effective. However, objects can not be modified, which means object storage is not a good fit in situations where data will need to be adjusted/rewritten frequently. It also doesn’t work well with traditional databases. You could use object storage to host a static website, but any time you need to change the contents, you upload a replacement version, similar to using an FTP (File Transfer Protocol) server.
There are also many vendors who offer software-defined storage options that add a layer of abstraction between the cloud provider’s native storage and compute. Using a software-defined storage layer is often necessary to make data storage portable between cloud providers and attain operational control over data storage.
Securing Your Cloud Storage
1. Be Ready with Redundancy and Availability
Computing instances often have “ephemeral” local storage that disappears when the instance goes away. For permanent storage, data needs to be written to block storage, or better yet, object storage. Additionally, cloud vendors offer long-term, lower-cost versions of object storage for older backups and other data that doesn’t need to be accessed frequently. Data that can be easily reproduced, such as image thumbnails, can be stored with lower redundancy for an even lower cost.
2. Ensure Data Security
Data security concerns are often cited as a reason not to move business applications to the cloud. In reality, data is more secure in a cloud environment versus an on-premise environment. Regardless of where data is stored, however, there are established best practices to make sure it is as secure as possible. It’s also important to remember securing data requires not only managing data at rest but also data transfers and data that is in use by applications.
3. Get Permissions
Just as it’s important to limit access to compute resources, access to data should be controlled on a least-possible-access principle. This means using Identity and Access Management tools to control what types of roles can see and manipulate your data. As with compute, the best practice is to control permissions through groups rather than individual users. This makes it easier to avoid errors in the data access permissions.
While it is sometimes reasonable and even necessary to make a storage bucket public, companies should use caution with this approach and only do so when there are legitimate business or technical reasons for making the storage bucket public. Keep in mind that once a bucket is made public its contents could be shared via the internet. There may be legitimate reasons to make a storage bucket public, but this is perhaps the most common cause of security breaches in the cloud.
4. Unique Bucket Naming
Bucket namespace is publicly visible, even if the contents are protected. So you shouldn’t use any sensitive information as part of the storage bucket name. A more common pitfall is giving the storage bucket a name like “client-cc-numbers.” Don’t give a potential hacker a roadmap to where sensitive information might be located with overly descriptive names for your data resources. And do not rely on security through obscurity. Any hacker with a decent computer can use brute force techniques to guess the URL of a storage bucket. A safer bucket name might be “z9-tmp-3a8add-diag-eol”. It’s relatively long, which helps against brute force techniques, and the name does not look valuable to a hacker.
5. Enable Encryption
All of the major cloud providers offer server-side encryption. There are two ways to set encryption up: it can be done, by default, to everything stored in the cloud, or set up to encrypt only certain types of files or to be handled manually. AWS allows encryption to be turned off, whereas Google and Azure encrypt all data on the server side and do not make it possible to disable encryption.
When encryption is enabled, AWS, Azure or Google Cloud encrypt the data before storing it in the relevant object, block or file storage option (or backup/disaster recovery system).
This doesn’t mean your data is automatically safe. These default encryption services only work on data at rest, not data as it moves around. When you are moving data, it should be encrypted in transit as well as once it reaches the destination.
In addition, when the cloud provider encrypts your data, the cloud provider holds the encryption keys as well. If you want granular control over the encryption keys—like using different encryption keys for different objects stored in the same bucket—you need to use additional encryption techniques. Also, if you need to maintain control over the encryption keys, either because you don’t trust the cloud provider not to hand over the keys if under subpoena to do so, or in order to comply with privacy regulations, the default encryption services will not be enough.
6. Monitor Data Security
Like all aspects of cloud security, data security is not a one-and-done affair. Data degrades, it moves, it attaches and detaches from clusters and is accessed as the application runs. Ensuring data security—as well as other operational concerns like continued performance and storage optimization—require monitoring your data, your permissions, your encryption, your transfer protocols, and more. Many data breaches have happened because of ‘leaky’ or inappropriately secured and encrypted S3 buckets. Monitoring for unusual usage, permissions’ changes, or encryption changes can help reduce the risk of this kind of breach.
Data storage and security in the cloud is not a one-size-fits-all approach. In fact, that is one of the attractions of the cloud. The ability to choose between file, block, and object storage and easily procure the best type of storage for the application can help reduce costs and increase performance. Securing data also requires understanding how your data stored, how it’s used by the application, and how it ‘travels.’ The good news, though, is that when best practices are followed, your data can be more secure in the cloud than in a data center, because AWS, Azure and Google devote far more resources to cloud security than the overwhelming majority of their customers.
Next Step: Check Your Buckets
Some cloud management software providers have only a single check for the most written about vulnerability of the past two years—S3 Bucket permissions. CloudCheckr has more than 20 distinct checks for S3 security. It’s not just a question of if buckets are public or private (a check we provide to the public for free with S3Checkr.com and BlobCheckr.com) but do you have permissions properly set for Read, List, Upload/Delete, View Permissions, Edit Permissions? Do those rules apply to Everyone or just AWS authenticated users? Are the buckets encrypted? Do they contain sensitive data? These variations and others result in a need for dozens of different checks. Learn how to how to properly secure your data with a live 30-minute demo or trying cloud management by CloudCheckr free for 14-days.