Astronomy and Summary Indexing

By Nimish Doshi

I had the pleasure last week of viewing Saturn’s rings at Rutgers University’s observatory. It was my first time actually seeing the rings through a professional telescope and the planet does look like what we often see in text book pictures. After the viewing, I started thinking that astronomy records a lot of data that needs to be indexed for search and aggregated for reports. I asked the professor conducting the tour if he had any logs for astrometry data and he took out his paper notebook to show it to me. Obviously, in Splunk terms, that was not what I was asking to see.

In seriousness, the professor told me that optical telescopes, radio telescopes, and spectrometers can generate over a 1 TB of computer data per day. I think much of this data may be photo related to do trend analysis of observed readings, but the rest is the usual time series data that does require searching, analytical investigation, and reporting. Since this is unstructured time series data generated by software, Splunk could easily be used to do what it does best for this use case: index, analyze through search, and present aggregated reports.

For instance, suppose, we have the following data

Fri May 21 22:34:40 EDT 2010 star=n14532 1.01 Fri May 21 22:35:40 EDT 2010 star=n14532 1.00 Fri May 21 22:35:40 EDT 2010 star=n32344 1.62 Fri May 21 22:36:40 EDT 2010 star=n14532 0.99 Fri May 21 22:37:40 EDT 2010 star=n32344 1.60 ...

The last number in each series represents the observed magnitude (an object’s brightness) of different stars in this computer generated log file. I could index this data into Splunk and plot the relative average observed magnitude by star with a simple search command.

sourcetype=starlog|timechart avg(observed_magnitude) by star

This would end up looking something like this:

Average Observed Magnitude

With two stars and very few events, this isn’t terribly exciting. However, from real calculations, with billions of galaxies and trillions of stars, the volume of data becomes challenging to manage and our simple time chart search command becomes a handy mechanism to analyze and plot the graph in the same manner.

The next question is what if you wanted to perform the same calculations over a 30 day period, where 8 billion events have been recorded? Computing the average observed magnitude of thousands of star with billions of raw events is not going to be an instantaneous search no matter what technology you use. Fortunately, Splunk ships with a feature called summary indexing that will solve this problem quite easily.

Summary Indexing

A summary index is an index of an existing index. It contains a time series aggregate summary of prior calculations that have occurred from data in another index. In our example, if we were to schedule a search to run every hour that takes the average observed magnitude of the all events that have been indexed in the last hour, this aggregate hourly readings can be placed in the summary index. For a 30 day period, we can take an average of the existing averages that have been recorded in the summary index and the search results will be magnitudes (pardon the pun) faster than going through all 8 billion events at once. Allow me to walk through our example to explain this through practice.

First, create a summary index with Splunk Manager. Splunk ships with an index called summary that can be used out of the box for this. Then, in our example, save the search

sourcetype=starlog|sitimechart avg(observed_magnitude) by star

to use the last hour for earliest time and have it scheduled to run every hour to save its results in the summary index. Let’s call this search “Summary Timechart for Stars” Notice that timechart has now been called sitimechart in the example. This tells Splunk that only the aggregate results of this search will be returned and saved to the summary index. All reporting commands such as top, timechart, chart, rare, and stats have a si prefix to be used for this purpose.

Now, if we want to find the average observed magnitude of our events for a 30 day period, we would simply run the following search:

index="summary" search_name="Summary Timechart for Stars"|timechart avg(observed_magnitude) by star

Your results and corresponding report will come back quicker as we are now taking an average of averages in layman’s terms. The concept of summary indexing is much more comprehensive than this and I encourage you to read the documentation for further details. Because of the sheer volume of data produced by astronomy, summary indexing is a great way to increase search performance. This could be applied to any large collection of data that is indexed and aggregated.

Nimish Doshi

Nimish is Director, Technical Advisory for Industry Solutions providing strategic, prescriptive, and technical perspectives to Splunk's largest customers, particularly in the Financial Services Industry. He has been an active author of Splunk blog entries and Splunkbase apps for a number of years.

Global Impact 2 Min Read

Creating Equitable Global Health: Global Impact Partnering With Ersilia

With Splunk’s help, the Ersilia Open Source Initiative will be able to scale its work to make science accessible to all.

Global Impact 2 Min Read

Bridging the Data Divide with the Atlantic Council

Splunk's Chief of Social Impact, Kriss Deiglmeier, shares the highlights from her participation in the Atlantic Council’s GeoTech Center virtual panel on the data divide and how both emerging technology and its stakeholders can influence the fourth industrial revolution.

Global Impact 4 Min Read

London to Paris Cycle for Prostate Cancer - Day 2

Our Splunkers are officially two days into their epic journey from London to Paris, and what a day it was! With sore legs, aching muscles, and a few bumps along the way, the riders are now in full swing with the end in sight as they navigate the beautiful countryside of France. Read more about the events of Day 2 here.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.