Love it or hate it, many organizations have Microsoft Windows as part of their infrastructure. They usually operate a series of Windows services like:
Although surveys report that the market share of businesses using Windows is smaller than that of businesses using Linux, many organizations still use private Windows servers that are not accessible over the internet.
Therefore, organizations that choose to utilize Windows infrastructure components will have to set up proper observability and alerting agents to monitor their performance. Since the leading observability and monitoring tools are targeted primarily for Linux, it can be difficult to identify an effective and reliable way to monitor Windows infrastructure.
In this article, we will explain the best ways to monitor Windows infrastructure, including:
Monitoring, of course, is only one part of a successful infrastructure resiliency strategy.
The key to monitoring any OS infrastructure (whether Linux or Windows) is to utilize an instrumentation agent that works along the kernel without preventing it from working under load. You don’t want to insert a clunky piece of software that sabotages performance or creates memory leaks.
Before installing an open source tool to capture system information, you should explore the Windows server toolkit that came bundled with your purchase. For example, you can use the following tools to capture system metrics:
In addition, there are several other noteworthy tools that monitor and expose useful OS metrics and performance counters which can then be exported into an agent or a remote service:
Sysinternals Suite is a suite of tools designed to host advanced system utilities and technical information. It was written by Mark Russinovich, but because of its high quality and comprehensiveness, it is now offered by Microsoft as a separate download. These tools can dissect your Windows performance metrics and give you a detailed view into each one. Some of the most notable tools in this suite include:
Psutil is like a Swiss Army Knife for retrieving system information and utilization counters. It’s written in Python, so it can be used for both Linux and Windows machines, and it takes full advantage of the language’s flexibility.
Once you have identified the best ways to collect and monitor your Windows infrastructure metrics, you want to create monitors that display them — ideally in a single pane of glass.
(Understand the four golden signals of monitoring.)
The most basic counters are the ones that map to actual hardware or available resources. At a minimum, you want to have a list of all available CPU cores, memory statistics and network bandwidth counters.
Windows events are detailed records about the system, security and application notifications that are stored by the Windows OS. These are useful for tracing reliability issues within infrastructure environments. When monitoring these events, you want to be able to filter them based on their severity and schema.
You should collect information about:
This information is useful for troubleshooting concurrency issues with your apps.
In certain cases, it’s critical that you store data in disks and perform disk IO operations. You want to make sure that your storage is unobstructed and that you detect storage failures or disk fragmentation issues before they become problems. Useful metrics to monitor in this category include:
These are Windows services that run as background processes with no direct user interface (otherwise known as daemons). These are critical, because if they fail, then most of the other external services will also fail. Be sure to monitor and check the status of these services as well as the corresponding event logs in case there is a failure.
If you are operating Windows servers for cloud-native workloads, it’s critical that you set up observability agents as well, so that you can measure and examine the internals of the system proactively. Splunk Enterprise will help you take this to the next level by providing complete visibility into what’s happening in your business and utilizing advanced AI and machine learning models to provide intuitive visualizations.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.