There are many reasons to make the move to the cloud, but one of the most common is scalability. What is scalability in cloud computing? Scalability means the ability to easily add or subtract compute or storage resources. In ‘the old days’ of on-premise data centers, scalability was incredibly costly, slow, and difficult to manage. Back then, scaling up meant buying new server hardware and disk arrays. Even after the purchase was approved in the budget and the order made, it would take months before the equipment arrived. Meanwhile, some of the companies’ highest-paid engineers would spend hours unpacking cardboard boxes with servers and storage inside, plugging them in and getting them hooked up to the system.
Article Cost Management April 18, 2019

Cloud vs Data Center: What is Scalability in Cloud Computing?

The Power of Scalability

 

There are many reasons to make the move to the cloud, but one of the most common is scalability. What is scalability in cloud computing? Scalability is the ability to easily add or subtract compute or storage resources. In ‘the old days’ of on-premise data centers, scalability was incredibly costly, slow, and difficult to manage. Back then, scaling up meant buying new server hardware and disk arrays. Even after the purchase was approved in the budget and the order made, it would take months before the equipment arrived. Meanwhile, some of the companies’ highest-paid engineers would spend hours unpacking cardboard boxes with servers and storage inside, plugging them in and getting them hooked up to the system.

 

Consequences of not having enough compute or storage resources are dire: First come performance issues, then users start getting error messages and getting locked out of the application.

 

That’s how resources were added to a traditional IT infrastructure. But what if you needed fewer resources? Sometimes scalability is erroneously used as a synonym for growth. In a real-world IT environment, demand isn’t steady. Even a thriving business might encounter times when there is more or less demand. Demand changes seasonally, weekly, and hourly. In a data center world, reducing capacity was almost never practical, so companies were left provisioning enough resources to cover their expected peak demand. In other words, an eCommerce site would need enough computing resources to handle Black Friday traffic, every single day. Utilization rates were obviously very low, especially because most companies would provision resources based on expected peak demand, plus some.

 

Simplifying Scaling Problems

The alternative is to provision just enough resources for daily use and not for peak traffic. Consequences of not having enough compute or storage resources are dire: First come performance issues, then users start getting error messages and getting locked out of the application. In a business setting, that equals lost revenue. Conversely, resources are not free. Over-provisioning can lead to ballooning IT costs.

The cloud has dramatically simplified scaling problems by making it easier to scale up and out while also making it possible to scale down and in. However, scaling continues to be a challenge, even in cloud environments. It’s also important to remember that all parts of your application need to scale, from the compute resources to database and storage resources. Neglecting any pieces of the scaling puzzle can lead to unplanned downtime or worse.

 

Cloud Scaling Strategies

There are two ways to scale: vertically or horizontally. When you scale vertically, it’s often called scaling up or down. When you scale horizontally, you are scaling out or in.

 

  • Cloud Vertical Scaling refers to adding more CPU, memory, or I/O resources to an existing server, or replacing one server with a more powerful server. Amazon Web Services (AWS) vertical scaling and Microsoft Azure vertical scaling can be accomplished by changing instance sizes, or in a data center by purchasing a new, more powerful appliance and discarding the old one. AWS and Azure cloud services have many different instance sizes, so scaling vertically is possible for everything from EC2 instances to RDS databases.
  • Cloud Horizontal Scaling refers to provisioning additional servers to meet your needs, often splitting workloads between servers to limit the number of requests any individual server is getting. In a cloud-based environment, this would mean adding additional instances instead of moving to a larger instance size.

 

In practice, scaling horizontally (or out and in) is usually the best practice. It’s much easier to accomplish without downtime—even in a cloud environment, scaling vertically usually requires making the application unavailable for some amount of time. Horizontal scaling is also easier to manage automatically, and limiting the number of requests any instance gets at one time is good for performance, no matter how large the instance.

 

Manual vs Scheduled vs Automatic

There are essentially three ways to scale in a cloud environment: Manually, scheduled and automatic.

 

  • Manual scaling is just as it sounds. It requires an engineer to manage scaling up and out or down and in. In the cloud, both vertical and horizontal scaling can be accomplished with the push of a button, so the actual scaling isn’t terribly difficult. However, because it requires a team member’s attention, manual scaling cannot take into account all the minute-by-minute fluctuations in demand seen by a normal application. This also can lead to human error. An individual might forget to scale back down, leading to extra charges.
  • Scheduled scaling solves some of the problems with manual scaling. Based on your usual demand curve, you can scale out to, for example, 10 instances from 5 pm to 10 pm and then back into two instances from 10 pm to 7 am, then back out to five instances at 5 pm. This makes it easier to tailor your provisioning to your actual usage without requiring a team member to make the changes manually every day.
  • Automatic scaling (also known as autoscaling) is when your compute, database, and storage resources scale automatically based on predefined rules. For example, when metrics like vCPU, memory, and network utilization rates go above or below a certain threshold, you can scale up, down, out or in. Autoscaling makes it possible to ensure your application is always available—and always has enough resources provisioned to prevent performance problems or outages—without paying for far more resources than you are actually using.  

 

Scaling and Cost Management

Scaling is one of the most important components of cloud cost management. Right Sizing instances, or choosing the correct instance sizes based on your actual application utilization, is one of the easiest ways to reduce cloud costs without affecting performance in any way. There are also some cost management strategies, like Reserved Instance (RI) purchases, that take away some of the ability to scale in or down, because you’re committing to using a certain amount and type of resources for one to three years. When you’re looking for ways to reduce costs, it’s important to understand your current usage patterns and utilization rates to make the best decisions about how to strike a balance between total scaling flexibility and cost management strategies like Reserved Instance purchases.

 

Here’s How Netflix Does It

Netflix, like the vast majority of companies, experiences dramatic variations in traffic. The media services provider has more than 139 million subscribers worldwide, and streams entertainment services for customers at different rates and times. For example, as people on the East Coast of the United States start coming home from work around 6 pm, traffic spikes. Handling these variations in demand is nearly impossible to do manually, and even scheduled scaling doesn’t allow for the granular scaling up-and-down that auto-scaling does.

Titus, a container orchestration platform developed by Netflix, uses AWS Custom Resource Scaling to auto-scale the entire application, including both compute and data resources. The process works by setting up an AWS CloudWatch policy configuration that establishes specific scaling actions based on CloudWatch threshold alarms. When one of the pre-configured thresholds is breached, the number of AWS instances is either increased or decreased automatically.

AWS Custom Resource Scaling actually started as a private feature for Netflix based on the company’s need for more robust auto-scaling capabilities, but it has been publicly available since July of 2018.

 

Conclusion

Managing scaling correctly is the key to ensuring you always have enough resources without over-provisioning and wasting your cloud budget. The ability to auto-scale is one of the most attractive parts of moving to a cloud environment, and when used correctly can ensure you’re only paying for the resources you actually use. As you figure out the best strategy for managing scaling, it’s important to understand your historical usage patterns, how RI purchases affect scaling and whether manual, scheduled or automatic scaling is best for your use case.

 

What’s Next

Implementing a multi-cloud scaling strategy doesn’t have to be complicated. With comprehensive cloud management by CloudCheckr, you can take the guesswork out of managing your cloud infrastructure and free up resources with dynamic automation. Try CloudCheckr free for 14-days or sign up for a live 30-minute demo today.

Subscribe to our Blog
Sign up now to get more great content.
TRY CLOUDCHECKR FREE FOR 14 DAYS!
Learn how CloudCheckr can help you optimize and automate your cloud.
WANT TO SEE CLOUDCHECKR IN ACTION?