It was not an easy task to represent the logic of our charts in SignalFlow. You might imagine starting from program text, then building a simple UI to represent that. It’s also straightforward to imagine starting from a clickable UI, and then designing a program language to do exactly what that UI does. Instead we started from both ends, and developed both the Splunk Infrastructure Monitoring UI and SignalFlow to be closer to one another. By the end, they could be used interchangeably to represent the same chart. This posed some unique challenges, but yielded significant rewards.
A monitoring system that allows users to create their own charts and alerts needs to balance the need for a usable interface with the need to store user-created content in an efficient and manageable format. Most monitoring systems solve this problem by choosing early on either a clickable user interface (UI) or a Domain Specific Language (DSL) for performing tasks like creating charts. Each type of interface is optimized for a different kind of use case, and provides distinct benefits for usability in the application itself, as well as in interactions with the API.
You might choose a clickable UI to make it easy for anyone to create charts intuitively, without the need to read documentation or learn a new language—this is the approach we began with at Splunk, and taken by others (for instance, Datadog). Interfaces like these often merge the concept of what to draw (like which metrics, grouped by what functions) with how to draw it (like what color the lines should be). As long as the chart doesn’t involve much more than unprocessed metrics (limited analytics, limited transformations) it’s not all that challenging to represent it using a simple JSON format. But because the approach is balanced towards simplicity, the need for more sophisticated content can push such a UI beyond sustainable limits.
Others in our space, such as Wavefront, began by first creating a domain-specific language for specifying charts. The DSL approach allows creative expression of data queries and analysis in ways that are difficult to represent in a clickable UI. This is a straightforward way to enable common programmatic use cases like automation, monitoring-as-code, integration, and custom metrics.
Starting with a domain-specific language drastically simplifies the design of API endpoints for even complex operations, as much of the complexity in defining what to draw is tied directly into a construct that looks much like programming language (represented simply as a string). However, it’s often very time-consuming for a user to learn and implement successfully.
Tools that rely mainly or exclusively on domain-specific languages for charts can employ any number of strategies to help the customer learn how to use the language, sometimes providing a “query builder”-type interface. But this doesn’t solve the critical problem for the customer: products driven by domain-specific languages require a large up-front investment of time before the user can get any value.
To summarize:
Splunk effectively had two parallel paths for customers to visualize metrics: both a point-and-click UI for ease of use, but also a domain-specific language for efficiency. Customers could point and click through the UI (which still used the same private, internal representation it always had), or use the chart v2 API and define the metrics and analytics in that chart using SignalFlow. This mirrored the twin approaches taken by monitoring products generally.
However, customers who liked the point-and-click UI could only create and edit their charts through individual UI actions; they weren’t able to take bulk actions or publish their content widely across their organizations. If they wanted to use the new Chart V2 API to manage their charts at scale, they were stuck using only v2 charts, which could only be edited as SignalFlow, not as the same point-and-click UI they were used to.
To bridge this divide, we altered the UI for v2 charts to translate point-and-click UI actions into SignalFlow, then stored all the point-and-click configuration as SignalFlow configuration. In the end, customers can choose whichever interface they see fit, and seamlessly move between the two in Splunk.
The basic premise behind chart conversion is parsing SignalFlow into its various constituent blocks, and then determining whether each of those blocks can be represented in the UI.
Our current version of SignalFlow was built on the principle that it should resemble the Splunk UI. Take for example the following SignalFlow code:
A = data('cpu.utilization').mean(by=['aws_availability_zone']).publish(label='A')
In this example, variable 'A' is a data block with a metric argument 'cpu.utilization' and a mean aggregation grouped by aws_availability zone.
The Splunk UI will read this as an assignment of a stream, comprised of a data block and a mean method, which is something it can display.
The UI then converts the SignalFlow to its local state, which it uses to maintain interactivity, and add some other functionality (i.e styling information). The end result of this operation is a Chart Builder plot. From here, the web application uses our existing editor code as needed until it is time to save. If the SignalFlow contains any unrepresentable syntax, then it reverts back into the text mode that users are familiar with today.
As we mentioned in "Convert Splunk Charts to and From SignalFlow for Monitoring as Code," the web interface can now be used as a teaching tool to familiarize users with SignalFlow. If you attempt monitoring as code by using SignalFlow with a tool like Terraform, Splunk Infrastructure Monitoring will also now be able to convert your templates automatically, provided they’re formatted according to the SignalFlow style guide.
Having to choose between a stable, powerful API and an accessible UI is no choice at all. In Splunk, you get a product for metrics visualization that’s accessible, user-friendly, and based on exactly the same powerful APIs that we provide for programmatic access.
Get visibility into your entire stack today with a free 14-day trial of Splunk Infrastructure Monitoring.
This post features contributions from Rebecca Tortell, Kevin Cheng, and Aaron Sun
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.