Picture the lanes on a highway. The number of lanes determines the maximum traffic capacity of the highway at any given instance. However, a variety of factors determine how fast traffic can actually go from point A to point B across the highway — despite the maximum number of lanes.
This is exactly how network traffic behaves, too.
Let’s take a look at network traffic and congestion, including the many contributing factors that determine how your network can handle traffic — especially during high-traffic periods.
Network traffic is simply how much data is moving across a computer network at a given moment. It’s a point-in-time number: the traffic right now may be more, less, or the same as it was 30 minutes ago.
Traffic data is broken down into smaller segments of data, known as data packets. These data packets are sent over a network, where the receiving device reassembles it. When these moving data packets get slowed down, your network traffic slows.
Network uptime and network speed are the backbone of nearly every business today. No matter your industry, network systems down is a problem you want to avoid.
(See how Splunk helps you deliver great customer experiences, especially when traffic spikes.)
So, let’s talk about one term commonly associated with network traffic: bandwidth. (Bandwidth is only one part of your network traffic and congestion problems, and we’ll talk about others shortly.)
Network bandwidth is the maximum capacity of a network to transmit data across the network’s path — logical or physical — at a given time. It is measured in bits per second (bps).
A theoretical and fixed parameter, network bandwidth corresponds to the maximum capacity of a network. This measure may include the protocol packet overhead involved in communication protocols necessary for secure, robust, and reliable data transmission, such as:
Error correction codes
IP identifiers
For cloud-based services, network bandwidth is allocated as part of a service level agreement. A cloud-based service may measure network bandwidth based on either:
Egress, the outbound traffic flowing out of a cloud server
Ingress, the inbound traffic flowing into the cloud server
Routing within and outside of the cloud network may depend on a few factors, including your service level agreement (SLA), configurations, and the resource allocation in your network architecture.
(Related reading: the OSI model for networks.)
You can, and should, measure how your traffic demands and usage patterns align with the allocated network bandwidth.
As the information flow in the network increases beyond the available network bandwidth, packets begin to drop. This is known as data packet loss. Packet loss occurs due to network congestion, which may happen at a state lower than the allocated network bandwidth.
By definition, network bandwidth is a fixed parameter constant and cannot be increased without upgrading the underlying resources. These resources include:
Hardware devices and communication infrastructure
Network architecture and configurations
Another piece is that network bandwidth may be limited due to factors beyond your control.
For example, an outside adversary attacking your network with a DDoS can flood your network with traffic, fully capturing the available network bandwidth. As a result, any new traffic requests to your servers are denied, queued, or rerouted.
Assuming a constant network bandwidth that does not scale dynamically according to the traffic demands, incoming data packets may also be lost. (This is why your network congestion management strategy should include DDoS detection mitigation capabilities.)
In our highway traffic example from above, network bandwidth equates to the number of lanes available. The lanes are an important, but fixed, factor — and those lanes alone cannot tell you how well traffic is moving at any given point.
Let’s look at additional factors that contribute to network congestion, too.
Network capacity is described in terms of parameters such as:
Network bandwidth
Data rate
Throughput
These terms may be used interchangeably, but can have vastly different implications for your actual SLA performance.
Data rate is the volume of data transmitted per unit of time–and we can think of this as the network speed. Like bandwidth, data rate is also measured in bits per second.
Unlike network bandwidth, data rate does not refer to the maximum data volume that can be transmitted per unit of time. Instead, data rate measures the volume of information flow across the network, within the maximum available network capacity.
Throughput is the volume of data successfully transmitted between the nodes of the network per unit of time, measured in bits per second. Throughput accounts for the information loss and delays that ultimately show up as:
Packet loss
Network congestion
Jitter
Latency
Throughput is often used together with network bandwidth to describe network capacity, though beware the differences:
Network bandwidth is a theoretical measure of network capacity.
Throughput tells you how much data can actually be transferred.
Network latency refers to the time it takes for information to travel between the source and destination in a communication network. Delays are caused due to:
The distance between network source and endpoints
Network congestion
Packet processing time
Protocol overheads
Propagation and routing delays
The transmission medium
Quality of Service (QoS) is the network’s ability to optimize traffic routing for:
End-user experience
Network performance
QoS planning involves policies and algorithms that determine how specific packet data and traffic are processed and delivered in the context of the available networking resources such as network bandwidth, capacity, switching performance, network topology, and service level agreements.
Network utilization is the percentage of available network bandwidth utilized per unit of time. While the network capacity may be high, limitations — like network congestion, bottlenecks, and capacity issues such as packet loss — may prevent total network utilization.
This is often used as an indication to design the network architecture, switching topologies, routing policies, and QoS algorithms, such that network utilization is maximized at all times.
It is also important to understand that network utilization comes as a trade-off against other parameters, such as:
Power consumption
Cooling supply
Device maintenance cycles
For this reason, network utilization and capacity planning requires strong stakeholder buy-in and executive support.
As discussed earlier, bandwidth is a fixed parameter that alone will not improve your network congestion. However, there are plenty of network optimization techniques to explore:
Creating network subnets with strategically installed routers, switches, and modems
Scheduling software updates and storage backups during off-peak hours
Using traffic shaping, traffic policing, and load balancing
All of these techniques can assist in streamlining data flows and decreasing traffic/network congestion.
(Related reading: network performance monitoring.)
Splunk is a leader in monitoring and observability. Whether you need to monitor your network from the NOC or you want complete visibility across your entire tech stack, Splunk can help. Explore the Splunk Observability solutions portfolio.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.