If I asked you to describe Splunk, you’d likely reply with something about it being really good (the best!) at gathering and searching logs. You’re right! But while that’s true, you may not know Splunk is also tops at gathering and analyzing metrics. Putting the two together is very powerful; logs (events, more generically) and metrics go together like cookies and milk! The Splunk App for Infrastructure (SAI) is a great place to get started with metrics for infrastructure monitoring. SAI joins those metrics with log events in a single interface — which means no more monitoring with one tool and troubleshooting with another. SAI is fast, cheap, and easy, and super useful for consolidated infrastructure monitoring.
This series will build your comfort with SAI use cases and capabilities, and will walk you through getting started on various platforms. If you’re already using Splunk in your organization for security or log-monitoring use cases, it’s really easy to add performance metrics and create new value with the data you already have. Even if you’re starting out fresh with infrastructure monitoring use cases, SAI is a great way to get started monitoring in Splunk.
But what if you already have a metric-collection tool for your infrastructure? Why look at SAI as well? SAI is all about providing the easiest route to what you need, without going overboard on features you likely won’t use and which might slow you down. Most other tools don’t join metrics and events together side by side. And most other tools take longer to get going. Most importantly, SAI taps you into the power of the Splunk platform, so the work you do today will certainly be useful to new questions you need to ask of your data tomorrow. Not having to redo prior work is a powerful value multiplier.
If you’re looking for even more power in your infrastructure monitoring, take a look at SignalFx, the industry’s best metrics-streaming platform, built to accelerate your cloud-native journey with real-time problem detection and directed troubleshooting. SignalFx can integrate back to Splunk to combine both worlds together.
Splunk has long been able to ingest metrics (as strings). However, several years ago Splunk also built a dedicated time-series database (TSDB) — referred to as the metrics index — to process structured metric data. Metrics are first-class participants in Splunk. The Splunk metrics engine crunches data exponentially faster than sending that same data as unstructured events. In other words, metrics stream into Splunk and show up on dashboards or alerts within seconds of being dispatched.
SAI is powered by this metrics index and thus enjoys excellent performance with large volumes of metrics.
Metrics aren’t only faster to process, they’re also “cheap” because they take up significantly less disk space and have much smaller license usage than if treated as events. Metric data ingest measured against Splunk license entitlement is capped at only 150 bytes per event. Plus, beginning in Splunk version 8, metrics may contain multiple values, further streamlining performance and efficiency. Metrics also require significantly less storage (50% or better) than storing that same data as events (and customers routinely report even better results). So basically, for time-series data, metrics are the way to go!
The Splunk App for Infrastructure is available to all Splunk users AT NO COST. Coupled with low ingest and retention costs, SAI is highly efficient.
SAI was built for infrastructure admins whose goal isn’t to learn how to be Splunk ninjas. (No disrespect to all you Splunk ninjas! But not all of us have the time to learn SPL ;). IT admins and sysadmins just want an easy button to monitor metrics and logs so they can get back to other tasks that fill their day. SAI is a prescriptive app that guides you through the process of onboarding and using data without requiring lots of doc reading or education courses. It’s meant to provide value in the first few minutes you use it!
SAI is ready to use to quickly monitor the following infrastructure platforms:
You can also ingest metrics from other sources into SAI as well, though you’ll need to do some of the connection work yourself.
How does SAI make life easier for infrastructure admins? Here’s a quick 1-2-3 drive-by to familiarize you with the capabilities. Additional posts in this series will take you deeper.
If you’re more of a listen-not-read learner, you can also see SAI in action first hand here:
1: Onboarding Infrastructure Data
First, as soon as you install and open SAI, you’ll find the app is structured with an easy-to-follow “add data” workflow that walks users through the simple steps of onboarding data for supported platforms. No reading endless documentation. You can get started in mere minutes feeding data from your systems.
Metrics are primarily fed to SAI via the Splunk HTTP Event Collector (HEC), a high-throughput collection engine that makes sending many types of data across networks easy and safe. SAI works with the open-standards agent collectd for Linux/Unix/MacOS, and perfmon for Windows. So long as data is formatted properly, other sources can be used as well. If you prefer to send metrics instead using the Splunk Universal Forwarder and existing Technology Add-ons (TA’s) such as *nix or Windows so you don’t have to redeploy all your collection agents, stay tuned for a post in this series to learn how to bring those sources to SAI as well.
2: Investigate Your Infrastructure
SAI is built with preconfigured dashboards that require no configuration. That’s part of the fast time-to-value you can expect.
Once you’ve onboarded some data, you’ll start with a handy overview page called “Investigate.” Tile colors indicate status, which can be toggled red/green by various metric thresholds.
The SAI homepage allows you to group entities/hosts by tags (referred to as dimensions). In the below example, over 50 entities are sorted into 4 dimension-driven groups. As new entities appear with the same dimensions, they will automatically be included in the groupings.
As you drill down from the overview page, individual hosts/entities can also be monitored easily via ready-to-use dashboards that provide the usual charts and graphs that show standard metrics such as CPU, storage, memory, and network. No configuration required.
3: Deeper Analysis
When you’re ready to start diving into deeper investigation, the “Analysis” tab page provides a simple point-and-click interface with no special search syntax to compare data. Simply grab metrics, logs, or alerts from the list and add them to your canvas to create the perfect dashboard. You can build dashboards with only a few clicks, including advanced layouts such as time shifting, filtering, and statistical aggregations. Plus, once you’ve built the infrastructure view you like, you can easily export these views to traditional Splunk dashboards for further refinement or integration with other data sources.
One of SAI’s powers is to provide instant access to performance metrics and log events side by side with no special configuration. In the example above, if you were to click on the “mysqld” chart, you can search for correlated events that accompany performance metrics, looking for any strings that might be important (such as, to verify that application errors were linked to the disk becoming full.)
In the Analysis chart above, there’s an alert threshold in the first box. SAI has an easy alert-management interface to quickly build alerts for threshold violations you specify. Then use Splunk’s triggering capabilities to open a ticket, send an email, or notify a team on Slack.
If you’re still spending too much time writing scripts, configuring agents, or flying blind with your core infrastructure, give yourself a break and try SAI! Everything has been built to get you started quickly.
----------------------------------------------------
Thanks!
Chris Kline
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.