Somewhere in the IT multiverse, a perfect balance has been achieved between demand for IT services and installed system capacity.
Unfortunately, that isn’t our world.
IT systems operate in swing periods of idle capacity and overloads, as the ebb and flow of demand is influenced by various internal and external factors. For example, peak periods such as Black Friday and Cyber Monday can cause a significant strain on computing resources required to support global e-commerce shoppers looking for the best deals.
Statistics from Cloudflare in 2023 showed a 27% increase in traffic through their network from the previous year on these days. As enterprises implement their digital transformation strategies and develop new products, the growth in transactions and data requires IT resources that are able to handle the increase — without impacting performance or user experience.
Daily HTTP Requests for Cloudflare, 2023
Gartner defines scalability as:
“The measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands.”
In the technology space, scalability is one of the main selling points of migrating to the cloud versus maintaining on-premise data centers. An organization that acquires cloud services is given a promise of accessible resources that:
Can be ordered and provisioned over a short time period to address growing information processing needs.
Can also be released when the organization does not require them.
This flexibility means that enterprises do not worry too much about tying their hard-earned capital in IT infrastructure and systems that may not match fluctuating demand.
Sometimes the terms scalability and elasticity are often used interchangeably. But are they really the same thing?
Of five essential characteristics of the cloud computing model defined by NIST, one is rapid elasticity. This is where capabilities can be elastically provisioned or released to scale rapidly outward and inward commensurate to demand. The general agreement is this:
Scalability is viewed from a load handling perspective.
Elasticity is considered as the description of the speed of response to demand.
Indeed, the AWS glossary defines scaling as the outward or inward change in size, configuration of make up of a logical group of compute instances.
There are two main approaches that are used in describing scaling in a cloud computing environment: vertical scaling and horizontal scaling.
Vertical Scaling (scaling up) involves the upgrading of the resources of the existing virtual machines to cater for increased demand. Components that can be upgraded include:
CPU
Memory
Storage
Network throughput
Examples include virtual machines and compute resources which can be resized to accommodate performance requirements.
Horizontal Scaling (scaling out) involves increasing the number of computing instances in a logical pool i.e. replication due to increased demand. Examples of horizontal scaling include:
Load balancers, which distribute traffic across multiple instances.
Kubernetes, which orchestrates containers.
The decision on what approach to take is mainly driven from the application architecture, as applications that can be easily distributed across multiple servers (such as stateless microservices) are more likely to be catered for by horizontal scaling. Other parameters include:
Traffic demand
Costs consideration
Resource efficiency
Performance requirements
From an uptime perspective, we can say this:
Horizontal scaling is more suitable as it does not require taking an existing server offline for upgrades.
In contrast, where resource intensity is key, then vertical scaling becomes the more preferrable approach.
(Learn all about load balancing for microservices.)
Combining the two approaches results in a third hybrid model, i.e. diagonal scaling. This starts as vertical scaling, but once the resources are capped, then horizontal scaling kicks in.
This approach is deemed to be good for organizations who face unpredictable demand — hence the need to be able to respond in an agile and flexible way without restriction. However, it is obviously costlier and has higher operational complexity compared with the previously mentioned approaches.
Automating of scaling is usually the preferred approach for horizontal scaling. That’s because it does not involve disruption of services running on existing instances.
Autoscaling adds virtual machines to a group of instances and deletes them based on traffic as well as other configured parameters. For example, on Google cloud, the autoscaling parameters that come into play include:
CPU utilization: The percentage load that the CPU is handling over a time period.
Throughput: The limit of requests per second that can be handled effectively.
Latency: How long a request stays on a queue before being processed.
Instance count: The number of minimum and maximum instances in a logical group.
For databases, there are two main approaches to scaling:
Replication involves creating of copies of the database, where the copies are a replica of the original (primary), and data is synchronized across all copies starting from the primary.
Partitioning/sharding involves two parts: dividing the database into multiple parts, and distributing data based on an agreed strategy. This approach introduces more complexity and overhead in managing data that is spread across a cluster.
(Related reading: distributed systems and distributed tracing.)
The main benefit of scalability is assurance: you want to assure your business of the reliability of IT services you’re delivering, both to internal stakeholders and end-users, customers, and prospects.
By planning the right capacity to address demand and performance requirements, and being able to respond smoothly to changes in traffic, the quality of IT services offered to the organization remains in line with expectation leading to improved customer satisfaction.
Whenever incidents occur, scalability supports high availability as instances are spun up quickly with similar configurations to handle the service requirements. This is a form of self-healing: new instances are created that are not affected by any disruption affecting existing instances.
Other benefits include:
Cost effectiveness. The organization does not need to tie down capital in investing in infrastructure that is not utilized. Scaling ensures that demand is responded to with just the right amount of capacity, that can be quickly reduced when the demand dissipates.
Disaster recovery. Where horizontal scaling is spread across geographically distributed zones, the probability of downtime totally crippling an IT service is reduced.
Even where scaling is automated, do not assume that configuring scaling is a one-time set-and-forget activity. IT and system administrators must constantly monitor and analyze traffic trends and end-to-end application performance metrics in order to select the most optimal scaling metrics for their systems.
The right metrics will depend on a situational basis — which is precisely why you need to constantly review and optimize the scaling configurations.
Investing in observability tools is a wise option. By aggregating metrics and logs, alongside additional data, these tools can predict potential bottlenecks or failures that can impact application performance and therefore require optimization of scaling parameters.
Splunk is proud to be recognized as a Leader in Observability and Application Performance Monitoring by Gartner®. View the Gartner® Magic Quadrant™ to find out why. Get the report →
Learn more about Splunk's Observability products & solutions:
Some organizations have chosen to outsource the scaling headache to cloud service providers by adopting serverless computing. Applications built on serverless infrastructure have the benefit of automatic scaling, since the backend is fully managed to handle whatever traffic is generated from user transactions.
But beware: serverless alone is not a magic bullet to addressing scaling challenges, as the wrong application design could lead to certain functionality not scaling in tandem, thus causing bottlenecks. So, admins must regularly monitor application performance against the set limits, and initiate optimization when required.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.