While the data you send into Splunk Infrastructure Monitoring is visible in dashboards and charts, you can also extract past or streaming time series data that has been sent to Splunk. You can extract “raw” data (metrics and their values), as well as data that has been processed by Splunk analytics. You can also choose between extracting past data and extracting data as it is being streamed to Splunk Infrastructure Monitoring.
You might want to extract data from Splunk Infrastructure Monitoring for a variety of reasons. Some of the most common use cases include:
There are three common ways to extract data from Splunk Infrastructure Monitoring: by using SignalFlow, Splunk's streaming analytics API; by using the /timeserieswindow endpoint in the Splunk API; or from the Splunk UI. Each method is summarized below. To use SignalFlow or /timeserieswindow, you must have an access token for the Splunk API. For more information, see Authentication Overview.
Splunk Infrastructure Monitoring provides a language called SignalFlow that is primarily used to describe computations for Splunk's real-time analytics engine. In addition, SignalFlow also provides client libraries. In this document, however, we are using the SignalFlow command-line interface (CLI) for the Splunk v2 API to provide examples of how the “publish()” method might be used. The CLI outputs historical or streaming data in text format to a live feed (or to the screen), to a simple graphical display, or as CSV-formatted text.
Advantages
Disadvantages
The /timeserieswindow endpoint in the Splunk API outputs raw metric data in JSON format.
Advantages
Disadvantages
While viewing the Data Table tab in a chart or detector in Splunk Infrastructure Monitoring, you can download a CSV file containing the data displayed in the table. Also, when viewing a chart, you can export the most recent 100 datapoints to a CSV file.
Advantages
Disadvantages
Note: Because downloading data from the UI is self-explanatory, we won’t be discussing it in detail later in this blog.
The nature of your data requirements can help you decide whether to export data by using SignalFlow, by using the /timeserieswindow API, or by downloading it from the Splunk Infrastructure Monitoring UI. Use the following table to determine which technique to use. In almost every case, SignalFlow is the preferred option.
If you want to | then |
Export streaming data | Use SignalFlow |
Export a portion of the data behind the chart you are viewing onscreen | Export chart to CSV (exports the most recent 100 datapoints) or, from the Data Table tab, export to CSV (exports the data listed in the data table). |
Export data with a relative time range (e.g. last 15 minutes) | Use SignalFlow |
Export raw data (no analytics applied), for a specific past time range, using a default rollup and resolution | Use SignalFlow or /timeserieswindow |
Export raw data (no analytics applied), for a specific past time range, at a rollup or resolution different from the Splunk defaults | Use SignalFlow |
Export data with analytics applied in a way that isn’t reflected in a chart (see note below) | Use SignalFlow |
Note on exporting data with analytics applied: For example, you might want to export the 5-minute moving average of a metric for the past hour. You don’t need to build a chart that displays a rolling average and then export the data; you can apply those analytics as part of your SignalFlow command.
This section summarizes the syntax for using the SignalFlow CLI (command-line interface) to extract data from Splunk Infrastructure Monitoring, based on more detailed information here. The advantages and limitations of using SignalFlow are described above.
Note: The SignalFlow CLI is not an officially supported tool. It is intended to be an example of how to use the SignalFlow analytics language part of the signalfx-python library.
The following syntax summary is taken from the SignalFlow CLI interactive help “($ signalflow --help)”.
usage: SignalFlow [-h] [-t TOKEN] [-x] [--api-endpoint URL] [--stream-endpoint URL] [-a START] [-o STOP] [-r RESOLUTION] [-d MAX-DELAY] [--output {live,csv,graph}] [--timezone TZ] [program] SignalFlow Analytics interactive command-line client positional arguments: program file to read program from (default: stdin) optional arguments: -h, --help show this help message and exit -t TOKEN, --token TOKEN session token -x, --execute force non-interactive mode --api-endpoint URL override API endpoint URL --stream-endpoint URL override stream endpoint URL -a START, --start START start timestamp or delta (default: -1m) -o STOP, --stop STOP stop timestamp or delta (default: infinity) -r RESOLUTION, --resolution RESOLUTION compute resolution (default: auto) -d MAX-DELAY, --max-delay MAX-DELAY maximum data wait (default: auto) --output {live,csv,graph} default output format --timezone TZ set display timezone (default: US/Pacific)
When you invoke SignalFlow, you will see the prompt ->. You can then enter a SignalFlow program (even across multiple lines) and press “” to execute the program and visualize the results. Press ^C at any time to interrupt the stream, and again to exit the client. To actually extract data, you use the “publish()” API.
In this example, we are streaming live data directly to the screen.
$ signalflow -> data('jvm.cpu.load').mean(by='aws_availability_zone').publish()
To see current parameter settings, use the . command ( press “”).
-> . {'max_delay': None, 'output': 'live', 'resolution': None, 'start': '-1m', 'stop': None} ->
To set a parameter, use “.<parameter><value> ”:
-> .start -15m -> .stop -1m -> . {'max_delay': None, 'output': 'live', 'resolution': None, 'start': '-15m', 'stop': '-1m'}
In this example, we are using the commands in a program named program.txt to extract non-streaming data from 15 minutes ago to 1 minute ago, and outputting it in CSV format to a file named csv-to-plot.csv.
$ signalflow --start=-15m --stop=-1m --output=csv < program.txt | csv-to-plot
When you use SignalFlow, the data is processed using the full capabilities of the Splunk analytics engine, which includes special handling of jitter and lag in data arrival times. There are two reasons that the analytics engine is waiting to process the computation.
The first is "max_delay", which is the amount of time we wait for delayed data before processing analytics. If not specified or set to None, the value of "max_delay" is determined automatically, based on Splunk's analysis of incoming data. To avoid delays in getting data from SignalFlow, set the "max_delay" parameter to 1s. This means that even if data is delayed, Splunk Infrastructure Monitoring will process the analytics after 1 second, without the missing data.
$ signalflow -> .max_delay 1s
If you want to set “max_delay” to a longer period of time, make sure that your "stop" value is an amount of time, before now, greater than "max_delay". For example, if you want a "max_delay" of 30s then use a "stop" value of -31s or earlier.
-> .max_delay 30s -> .stop -31s
For more information on max delay, see Delayed datapoints.
The second reason computations might be delayed is related to job resolution. SignalFlow must wait to the end of the current resolution window before making its computation. For example, if the job resolution is 300000 (5m) and the "stop" value is “None” (or not specified), SignalFlow will wait until it has all data points from the current 5m time window before performing any computations.
To avoid delays, make sure your "stop" value is an amount of time, before now, greater than the job resolution. For example, if you a looking at data from a few months back, the resolution may be 3600000 (1h). In this case, use a "stop" value of -1h or more.
-> .stop -1h
For more information on resolution and data retention policies, see How Spunk Chooses Data Resolution.
This issue can also be related to max delay. Instead of using a "stop" value of "None" (or not specifying a value), set the "stop" value to -1m.
-> .stop -1m
Our support team will be glad to help you. Send us a message here.
See here and here to learn more about working with SignalFlow.
This section summarizes the syntax for using the /timeseries window endpoint in the Splunk API, documented in full here. The advantages and limitations of using /timeserieswindow are described above.
Parameter | Type | Description |
query | string | Elasticsearch string query that specifies metric time series to retrieve. |
startMs | int | Starting point of time window within which to find datapoints, in milliseconds since Unix epoch |
endMs | int | Ending point of time window within which to find datapoints, in milliseconds since Unix epoch |
resolution (optional, default is 1000) |
int | The data resolution, in milliseconds, at which to return the data points. Acceptable values are 1000 (1s), 60000 (1m), 300000 (5m), and 3600000 (1h). |
In the following example, curl is used to extract data for the metric “jvm.cpu.load” from 3/13/17 13:15 to 3/13/17 13:20, at the default resolution (1000ms).
curl \ --header "X-SF-TOKEN: YOUR_ACCESS_TOJEN" \ --header "Content-Type: application/json" \ --request GET \ 'https://api.signalfx.com/v1/timeserieswindow?query=sf_metric:"jvm.cpu.load"&startMs=1489410900000&endMs=1489411205000'
In the following example, the same data is being extracted, but at 5-minute resolution. If the metric data was sent to Splunk Infrastructure Monitoring more frequently than once every 5 minutes, the returned data will be rolled up using the default rollup for the type of metric (gauge, counter, or cumulative counter). In this case, the average of the values received during every 5-minute period will be returned.
curl \ --header "X-SF-TOKEN: YOUR_ACCESS_TOJEN" \ --header "Content-Type: application/json" \ --request GET \ 'https://api.signalfx.com/v1/timeserieswindow?query=sf_metric:"jvm.cpu.load"&startMs=1489410900000&endMs=1489411205000&resolution=300000'
When extracting data using /timeserieswindow, there are situations where expected data isn’t being returned, even though the request is syntactically correct. Causes and workarounds for these issues are discussed in the following sections.
There are two cases in which your request might return no data; that is, returned data looks like this:
{"data":{},"errors" : [ ]}
If you see this error, check to make sure there is actually data in SignalFx in the specified timeframe.
If you are asking for a given metric and set of dimensions, /timeserieswindow will look for all of the time series that matches that query across all time *first*. If the number of matching time series includes more than 5,000 time series, it will return a subset of the total time series, with no regard to whether there is data in the timeframe you’re asking for.
One way to work around this issue is to add “sf_isActive:true” as another filter in your query. This will return only the time series that are currently active (have reported at least 1 data point within the last 36 hours). This may or may not be appropriate depending on the nature of your data and how you are sending it in.
If using this filter won’t work for your situation, you need to break up the query to ensure that your response doesn’t contain more than 5,000 results. For example, suppose you were asking for "sf_metric:content.global" AND "type:billable" and seeing only a subset of results. You could break the query into two queries:
Break up your original query as granularly as necessary to ensure that the results will match no more than 5,000 time series.
Get visibility into your entire stack today with a free 14-day trial of Splunk Infrastructure Monitoring.
----------------------------------------------------
Thanks!
Barbara Snyder
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.