Cloud-based services have changed how individuals and businesses get things done. That doesn’t mean it’s all positive — there are some tradeoffs and compromises that come with cloud services and the internet.
One major tradeoff is speed. For instance, if your website fails to load within three seconds, 40% of your visitors will abandon your site. That’s a serious dent for anyone doing business online.
The culprit here is latency. Even something so small as 20 milliseconds of latency adds up to 15% to page load times, risking a loss of visitors to your website.
So, in this article, I’ll look at latency. We’ll drill into why it’s so difficult to measure latency. Fortunately, as you’ll see, there are still guidelines in place to minimize latency. Let’s get started.
The word “latency” means a delay in a system or a process. In computer networking terms, latency in data transmission refers to the time taken for a data packet to reach from a source to a destination. This includes:
Latency is a simple concept to understand, but not always easy to control. Apps and workloads that run in the cloud are the perfect example.
Cloud-based services run in district geographic locations: the app you’re using today is likely pinging a data center hundred or even thousands of miles away from your physical location. To process large volumes of data workloads — which requires energy-intensive computing resources — the data is transmitted in real-time over the Internet to cloud data centers.
The communication delays, however, can hurt: as communication moves across backend servers and client devices, you might see slow or no loading. Every TCP/IP client-server communication can add a delay of up to 600 milliseconds. That’s a subpar user experience.
(Related reading: network monitoring & NOCs: network operation centers.)
To quantify the benefits of cloud computing against its inherent limitations, we first need to understand how latency metrics are measured and disclosed to a business customer.
Cloud vendors generally don’t disclose proprietary latency metrics performance data. This is partly due to:
Consider a standard service level agreement (SLA) for a cloud service: the SLA outlines latency performance ranges for cloud data centers in particular geographic zones. The cloud service itself may be distributed across several computing resources within the data center. Furthermore, the resources are dynamically allocated. Application components and data workload are not tightly coupled to the underlying hardware.
Cloud vendors may dynamically allocate workloads to different network zones — not necessarily to maximize performance, but to optimize a range of business metrics including cost and risk management.
From a user perspective, however, any suboptimal bandwidth and data rate allocation can have a significantly diverse impact on performance (degradation).
This brings us to the second problem: measuring latency is challenging. This is particularly because of a lack of standardized metrics that provide sufficient granularity into latency related performance. For instance, latency can be measured in terms of…:
These metrics contribute to latency at different stages: during propagation, processing and queuing.
Users may choose to measure latency using statistical models. In order to make this work, customers need deep visibility into network computing operations — a concept known as observability. A SaaS or PaaS service may not be accompanied with the exhaustive network logs required to accurately learn network parameters using a statistical model.
;
Furthermore, important relevant information may also be missing due to its proprietary nature: the network path connecting cloud resources and a client-endpoint. Users may resort to some probing techniques to continuously monitor network latency performance. For example:
An inherent problem with statistical evaluation techniques remains: for any active and real-time monitoring system, the distributed nature of the underlying infrastructure is unlikely to produce an accurate assessment of latency performance. That means users may not be able to accurately forecast the performance degradation they can expect and determine how different cloud service components are affected.
See how Splunk Real User Monitoring shows the performance and health of the user experience of your application. This video demonstrates how to find latency issue.
It may be difficult to capture an accurate representation of how latency can impact the performance of your cloud-based services — and, consequently, impact your user experience. You can adopt the following guidelines to reduce the latency and/or its impact to the end-user:
host your apps in datacenters with close physical proximity to your end users. Use content delivery networks (CDN) to:
Train your statistical models accurately with real-time data streams. The data should contain:
Because latency performance and its impact to the user experience evolves continuously, you’ll want to monitor it continuously.
Design your network to route user requests through an optimal path, which minimizes network hops and maximizes Quality of Service (QoS) metrics. Use tools such as load balancing to dynamically route data workloads.
(Learn about load balancing in microservices.)
Design your data platform to ingest real-time data streams at scale — such as by using a data lake for efficient and scalable storage of raw data. This allows users to preprocess and analyze required portions of the data when required.
Use predictive scaling capabilities to provision computing resources proactively. By accurately forecasting usage spikes and demand curves, you can plan for cost-effective cloud services according to QoS expectations such as:
Latency is one thing that Splunk Observability can absolutely help with, whether your website, your cloud services or your own networking. Learn more about Splunk Observability.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.