Network performance monitoring (NPM) and application performance monitoring (APM) are both key pillars of an overall performance and reliability management strategy, especially when dealing with complex, distributed infrastructure across cloud-native environments. NPM and APM also complement each other, in the sense that NPM can serve as an additional source of truth and observability for application performance.
Although there is some overlap in the tools and methodologies behind NPM and APM, they’re distinct processes that focus on different data sources and metrics.
Let’s take a look at how NPM and APM compare and where they both fit within a performance management strategy tailored to cloud-native environments.
Network monitoring means monitoring networks for trends or signs of problems. Using techniques like packet capturing and streaming telemetry, NPM can measure information like:
Traditional on-premises network performance monitoring was relatively straightforward in that most environments included just one network to monitor. In cloud-native environments, however, there are often multiple internal networks to monitor, as well as at least one public-facing network interface that is accessible to any hosts on the network.
In addition, because network configurations change constantly as container IP addresses are updated, load balancers redirect packets and traffic flows change, cloud-native network data lacks the contextual information that comes with traditional, packet-focused networking monitoring. This makes the ability to observe data flows between source and destination services is especially critical in distributed environments.
In other words, teams need fine-grained, low-level visibility into all network traffic — internal as well as public-facing — that flows within or between parts of an application, containers, microservices, processes, and users. Capturing information from every connection and every process is the only way to understand how the complex traffic flows within a distributed environment add up to overall network performance.
In all these ways, NPM for distributed environments very much differs from NPM for monoliths.
Application performance monitoring refers to the monitoring of applications for signs of performance issues. APM typically focuses on data such as:
APM may also include monitoring of how much CPU, memory, and other resources an application consumes, and how those metrics change over time.
(Read about more APM metrics.)
Like NPM, APM was simpler in the days of monolithic applications. In the cloud-native, microservices-oriented world, APM is more challenging not only because there are more services to monitor, but also because effective management requires the ability to correlate data from each individual service in ways that deliver visibility into the overall state of the system.
Additionally, in a microservices environment, it’s critical to be able to analyze all transactions, rather than just sampling some and extrapolating from there. Complete trace and span data enables high cardinality, AI-directed troubleshooting. And in order to understand how performance trends impact the end user, modern APM must be able to use no sample, full fidelity ingestion of all front-end traces to track how backend applications components interact with frontend services for every transaction.
The similarities between NPM and APM are relatively obvious: both types of monitoring provide insights that can help teams anticipate problems that may have negative consequences for the user experience. An application that becomes unavailable due to a load-balancing problem on the network is just as bad from the user’s perspective as one that fails because it runs out of memory or becomes overloaded with requests.
There is also some overlap in the type of metrics that NPM and APM focus on. For example, the golden signals of monitoring — latency, error, traffic and saturation — are important data points for understanding the health of both your network and your application.
Beyond this, however, there is little common ground between NPM and APM. Despite the partial overlap in metrics described above, most metrics are unique to one type of monitoring or the other.
The way you collect NPM and APM data is also different. Modern approaches to NPM rely on techniques like using eBPF at the operating system level in order to collect network data that would otherwise be impossible to “see,” especially in cloud environments. In this way, site reliability engineers (SREs) can trace traffic flow between microservices within a distributed system in order to determine whether the network is the cause of a degradation in application performance.
In contrast, APM focuses on traces and scans, especially transaction traces and other data that is collected directly from a running environment using tools that peer inside the application as it processes requests.
Because NPM and APM deliver visibility into different parts of your environment, they are not an either/or proposition. You need both in order to gain a full understanding of what is happening in your environment.
As noted above, critical disruptions or degradations to the end-user experience can be caused by faults in either the network or the application. Both types of monitoring are necessary to safeguard against these risks.
What’s more, the ability to correlate network data and application performance data is often crucial for understanding the root cause of an issue. For example:
NPM and APM serve as crucial complements to each other, with network monitoring offering an additional source of observability that can help to contextualize application performance trends and pinpoint the source of problems, even if those problems don’t stem from the network itself.
So, while NPM and APM may be somewhat useful individually, they deliver the greatest value when they are used in tandem as part of an end-to-end observability strategy that takes data from all layers and resources within your environment and allows you to compare and correlate relevant trends within it. That’s how you achieve true visibility, especially into complex cloud-native environments where the root cause of surface-level problems is rarely obvious from one type of monitoring alone.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.